Many gaming communities over the years have reached out to thank emulator developers for their efforts. Emulators are an important part of many classic game communities and give players access to features like netplay multiplayer, modding, and savestates, while also opening up the doors to enhancements not possible on console. Sometimes it's simply more convenient to use an emulator that runs on your desktop, tablet, or phone rather than to dig out and hook up the original console every time you want to play one of your favorite games. However, it's important to state that our relationship with gaming communities is mutual, and without the help of players and fans, there's no way we could handle maintaining a library of thousands of games.
In this Progress Report, the gaming communities were the direct catalyst to many of the changes. They went on difficult debugging adventures, caught small issues that would be invisible to anyone who wasn't extremely familiar with the game, and even came up with patches to make games friendlier to emulator enhancements. All of these contributions, even if it's not code, are appreciated and help make Dolphin what it is today.
So, without further delay, let's get started with the August Progress Report! Enjoy.
Inazuma Eleven GO: Strikers 2013 is the final Wii release in the beloved Inazuma Eleven soccer series. Featuring tons of characters, special moves, and a lengthy RPG story mode where you build up your soccer team, level up characters, and defeat rivals, there's a lot to love about the game. So much so, that it's built up a cult following around the world, despite only releasing in Japan. There are fan translations, an active tournament scene featuring cross-country clashes, and even a "World Cup"!
With full support on Wiimmfi, Inazuma Eleven GO: Strikers 2013 is still played online via Nintendo Wi-Fi Connection to this day, and many users choose to use Dolphin. It's one of the few Wii games left where you can actually find matches today, especially if you go to one of the community Discord servers. Everything works great in Dolphin, with one annoying caveat: Dolphin could not sync with Wiis. Upon attempting to block Hissatsu (special moves) or tackling, the game would detect a problem and rollback to a point before the problem happened, usually dropping the ball into a stationary position. This meant scoring with the powerful special moves was impossible and tackling your opponents wasn't an option. The game was unplayable.
One of the world cup matchups that the community was doing was France VS Japan, which presented a problem. The French community primarily uses Dolphin, while a majority of the Japanese players solely play on console. This meant that players would either have to swap to an unfamiliar setup or matches would have to be cancelled due to this bug. The users across the communities decided enough was enough: they were going to get to the bottom of this issue no matter what it took. Inazuma veterans, including players Obluda, AS, GalacticPirate and many others, joined together to try to track down an absolutely nightmarish emulation bug.
What made this issue such a problem was the particularly specific conditions that were required to reproduce it. Normally when examining potential CPU bugs, you'll want to do things like pause emulation, attach debuggers, examine registers, and do other technical things to watch exactly what is happening. And in addition to all of that, developers often rely on Dolphin's interpreter as a sanity check for the JITs in order to determine what kind of bug was being dealt with. In the past, tools like these alongside hardware tests have helped developers find differences in calculations that once caused replay desyncs in games like Mario Kart Wii, F-Zero GX, Super Smash Bros. Brawl, and many others.
Inazuma Eleven GO: Strikers 2013 is a similar problem but with a constraint that makes it very difficult to examine. In order for the bug to manifest, the emulator had to be on Wi-Fi connected to a Wii. This meant that Dolphin had to be running close to full speed and couldn't be paused for long periods of time, or else the connection would be lost. This essentially cut off most ways we would normally use to examine and test a CPU related bug, and made even checking if the issue happened on the interpreter a non-starter. A full instruction bisect through typical means simply wasn't realistic either, as that would also lower performance too far. And while everyone suspected this was a JIT bug, we couldn't be sure as there was no way to verify if switching to interpreter fixed it!
Despite all of these hurdles, the Inazuma community pressed on. Instead of relying on a conventional bisect, they went on the painstaking journey of falling back small groups of instructions to interpreter at a time. Combined with a fast enough processor, they could keep the game full speed while slowly testing each interpreter version of the instructions to see if one of them fixed it. Eventually, they proved successful when going through the JIT Floating Point instructions. By disabling a group of them, they were able to fix the desync while just hovering close enough to full speed to stay connected to the Wii. With this lead, JosJuice chipped in to help guide users into bisecting the remaining instructions and they landed within the floating point Fused Multiply–Add (FMA) instructions. Developers were a bit skeptical of this bisect, as both the x86-64 and AArch64 JITs have been put through the gauntlet. They should have been bit perfect by this point, as confirmed by hardware tests and the many games with replay files.
Others joined in and one developer even imported the game in order to verify what was going on. With the Inazuma Eleven GO community's assistance, we were able to see what was going on first hand and confirm the instruction bisect. Something was definitely going wrong in Dolphin's FMA calculations. The problem was that there were still no conclusive problem that could be found, even though we knew what was broken. After staring at the issue for what felt like weeks, JosJuice finally figured it out: The problem turned out to be from differences in precisely when negation occured. Let's get technical.
"Negation" is changing the sign from positive to negative or vice versa on demand. This is typically bundled in duplicate versions of FMA instructions for efficiency, i.e.
madd (multiply-add) has a negated variant
nmadd which should, in theory, have the opposite sign of
madd. However, different CPU architectures can apply the negation at different points in the calculation, changing the result. PowerPC's
nmsub (negated multiply-subtract) instructions negate at the end of the the operation, with the equation -(A * C ± B). This makes sense, so of course nothing else does it this way.
x86-64, in its infinite wisdom, negates the multiply operation result then does the add or subtract. The equation for this is -(A * C) ± B, which is very different than the PowerPC version and is not compatible at all. However, past Dolphin developers discovered a clever workaround for this issue. All we had to do was simply swap ADD and SUB for these instructions. Just by doing the opposites, we could get the results that the game's PowerPC code expected.
AArch64 flips the table. The AArch64
nmadd equation is -(A * C) - B, which is exactly x86-64's
nmsub equation! Having an add instruction subtract is a curious decision from ARM, but thanks to this we didn't need to do our SUB swapping trick as AArch64 already did it for us, allowing PowerPC's
nmadd to map directly to AArch64's despite very different equations. But if that wasn't weird enough, AArch64's
nmsub equation is (A * C) - B, which isn't negated at all. Yet AArch64's
msub is negated, for whatever reason, so we used that instead. We've learned not to question it.
These tricks are clever, and have proven to be very accurate. However, as Inazuma Eleven GO: Strikers 2013 proved, they were not perfect. booto discovered that due to differences in negation ordering between the equations, this method breaks when all inputs are zero — PowerPC's
nmsub would give -0, while x86's
nmadd and AArch64's
msub would give +0. Whoops. We don't know exactly what Inazuma is doing to rely on this unusual behavior, but this is the bug that was causing its desyncs when Dolphin and Wiis were mixed on Wi-Fi Connection.
While Inazuma was handled, further inspection revealed more possible issues. The tricks above gave the same results as console no matter what we threw at them... when they are performed by normal math. On actual hardware these equations are being performed via floating point math, so our good friend floating point rounding errors crops up once again. By performing the negation at different points in the equations, the floating point calcuations were rounding differently from the PowerPC originals. Usually it was fine, but at the extremes of rounding, such as rounding toward infinity, it would round differently enough to create a different floating point result. We're not aware of any software encountering this issue yet, but it is best to fix discovered flaws like this if we can.
To solve for all these quirks, we now just use the standard
msub instructions (and
nmsub on AArch64), and negate them afterwards ourselves with dedicated negation instructions. This simple change resolves all these edge cases and allows Dolphin to play against real Wiis in Inazuma Eleven GO: Strikers 2013! While there is an extremely small performance hit from using two instructions instead of one, we assure you, you won't notice. Probably. Surely there isn't a game being dumb with basic FMA negation instructions to such a degree as to cause a noticeable performance hit. SURELY. (link to next month's Progress Report here)
Without the amazing Inazuma community doing a nightmarish instruction bisect with the strict condition that they couldn't slow down performance too much, this issue may never been fixed. If you're looking to try out this game, there's still an active community, tournaments, and plenty of guides for this stylized soccer game. We must specifically thank Inazuma France for reaching out to us.
When testing the online for Inazuma Eleven GO: Strikers 2013, Mario Kart Wii, and other WiFi supported games on backup servers like Wiimmfi, it was noted that the initial connection had some very large stutters. Since online features were being tested anyway due to the Inazuma Eleven WiFi issues, it gave the perfect chance to test sepalani's change to asynchronously handle domain name resolution. By performing the operation on a separate thread, the stutters when connecting to a Wi-Fi enabled game are completely alleviated.
5.0-14810, 5.0-14848, and 5.0-15105 - GameINI: "Heavy Iron Studios" Games Quality of Life Changes by The Community and Developers¶
As a developer of licensed games for sixth generation consoles, it's hard to have a better legacy than Heavy Iron Studios. One of their games, SpongeBob Squarepants: Battle for Bikini Bottom is a cult classic that was popular enough to see an HD remake on modern consoles, and their other games range from mostly competent to pretty good. Overall, they're fun games that used the IP well and made for enjoyable experiences with well known characters.
As these license games have a rather large fanbase, users have wanted to play through them in Dolphin with the many enhancements that emulation provides. Unfortunately, there were some limitations. Dolphin could emulate these games correctly... but the most powerful enhancement it offers just didn't work well. Raising the Internal Resolution brought forth severe graphics issues, essentially locking these games to native resolution. This isn't some kind of bug in Dolphin, either! Most of Heavy Iron Studios' games are built in a way that makes it so they can't actually be played in higher resolutions due to tricks used during rendering, at least not without a little bit of trickery from users and emulator developers.
This issue goes back to the first game that Heavy Iron Studios released on the GameCube — Scooby Doo: Night of 100 Frights. It does the same techniques as Battle for Bikini Bottom with one key difference. This game sets up the projection matrix incorrectly; it's set up for a 639 by 479 output instead of the actual 640 by 480 output. Because of this, when the game copies a 256x256 chunk of the framebuffer out of memory, it ends up going through the incorrect projection matrix when being copied back in and ends up slightly malformed to a 256.401 by 256.534 chunk instead. The easiest way to imagine this bug is to compare it to photoediting. Imagine copying out a portion of an image, resizing the rest of the image down very slightly, but then copying back in the chunk you copied out earlier without resizing it, and then scaling the full image back to its original size. This isn't a perfect analogy, but it at least gives you an idea of the types of visual issues this causes. The thing is that the GameCube's native resolution is so low, these slight imperfections don't result in any visible issues. It's only because Dolphin allows for higher internal resolutions that this bug can manifest in a visual manner.
By the time SpongeBob Squarepants: Battle for Bikini Bottom rolled around, Heavy Iron Studios noticed this bug and corrected for it. Not only did they fix the Projection Matrix to be 640x480, they also added a horizontal and vertical 1/512 (half a pixel) offset to both position and texture coordinates. This makes it so things do line up better at higher internal resolutions, except now the textures ends at 513/512 instead of 1.0. This means that the EFB copy wraps around and grabs parts of the very edge of the screen. If we break down how the image is rendering, it's actually pretty easy see how the higher resolutions break things.
Users who saw this issue nicknamed it the "blue box" issue, as most commonly it'd be duplicating the parts of the skybox. Though Vertex Rounding can fix the position coordinates and thus repair shadow rendering, it also makes the fact it's grabbing from the very edge of the screen much more obvious since the texture coordinates were still offset.
This texture offset made it so there wasn't really anything more Dolphin could do to render things better outside of extremely ugly per-game hacks. Users took things in a different direction then, messing with the game itself rather than trying to adjust the emulator to handle this case. Their tool of choice was the Action Replay code, which allowed them to modify the game directly in memory. Several years ago, users on an issue report for this very issue discussed the idea of using action replay codes to improve the situation. Users even posted some codes that partially rectified the situation, with Disorderly actually posting the exact address that would end up the catalyst to fixing this game.
Unfortunately, these solutions were way ahead of their time. Dolphin lacked the Vertex Rounding hack at that point, and also had rounding errors that made the game render incorrectly even at native resolution. Because they had to dance around multiple issues at once, the codes became extremely complicated and had tons of downsides. This misdirection caused just about everyone to not realize just how close they were even back then.
It wasn't until earlier this year that developers became aware of a solution that could fix this game's rendering issues without breaking graphics when a single line Action Replay code appeared on Dolphin's Battle for Bikini Bottom Wiki Page. The key was simply putting everything together. Fixes to Dolphin's internal rounding over the years, plus the Vertex Rounding Hack, and this Action Replay code finally had the game rendering cleanly at higher resolutions.
However, relying on users to find a code on the Wiki, enter it into Dolphin's INIs or action replay menu, enable cheats, and also make sure the Vertex Rounding hack was enabled without any hints or instructions was rather unreasonable. After trying out the code for himself, JMC47 wanted to make the user experience seamless. However, the question was if there was a way to make this enhanced experience easy to access without compromising the authentic experience for those looking to play the game at native resolution without hacks.
Vertex Rounding is automatically disabled at native resolution, so that could be made a default setting without concern. While we couldn't enable an Action Replay code by default, since it was just writing a value to memory it could very easily be converted into a game patch. The only thing we had to make sure was that the game patch didn't cause issues at native resolution. In order to do that, Pokechu22 was brought in to analyze the patch's effects with Dolphin's FifoAnalyzer. In the end, it was deemed harmless and was converted over from an Action Replay code to a game patch that could automatically be applied at boot.
For the sake of completionism, JMC47 also adjusted the patch to work with the PAL version of the game, so that no matter which version you play, you'll be able to play at higher resolutions in the latest development builds!
That would have been the end of this story, but there was a bit of bitter taste left in everyone's mouth. SpongeBob SquarePants: Battle for Bikini Bottom may have been the most popular game made by Heavy Iron Studios and fixing it would have appeased most players, but the other games that they made were still broken. Are those games less important simply because there are fewer people who want to play them?
There was another bigger factor than that. Because those games were less popular than Battle for Bikini Bottom, there was no magical code that existed for them to fix the offset. Or so we thought. Surprisingly enough, another of the games, The SpongeBob SquarePants Movie also had a code to help with higher resolutions on the Wiki! Figuring it wouldn't be a big deal, JMC47 decided to port the code into a patch as well. However... something was amiss.
The code for The SpongeBob SquarePants Movie didn't remove the offset from the executable, it outright disabled the problematic EFB copies from ever rendering! While this did work in removing the artifacts, it also removed a ton of special effects from the game. This wasn't a workable game patch that could be enabled by default. Yet, these games seemed so similar that JMC47 was convinced that a proper patch was possible. In order to do this, he enlisted the help of Pokechu22 once more to look at how the game was rendering with and without the action replay code. Together, they confirmed for certain that this code was a dead-end for actually addressing the core problem. But during this investigation, Pokechu22 mentioned that one reason why Battle for Bikini Bottom likely saw a more detailed code wasn't that it was just more popular, but that it included debug symbols on the disc in the form of an unstripped ELF executable. For those that don't know, debug symbols are extremely useful for reverse engineering, as it'll break up functions and give them names directly in the code. It makes understanding what you're seeing much easier.
The developers left the debug symbols in their earlier game, but would they really leave it again in the sequel...?
It turns out that every single Heavy Iron Studios game from this era has an unstripped ELF on the disc. This turned what would have been an annoying reverse engineering project into something that even a novice could handle with enough time and effort. With the guiding hand of Pokechu22, JMC47 learned how to wield Ghidra, a software reverse engineering tool, along with plugins developed for it specifically for analyzing GameCube and Wii executables. It was a painfully slow process of figuring out how things worked, but Pokechu22 was able to help provide a lead. The offset this time wasn't the same as in Battle for Bikini Bottom; EFB copies were now half the resolution, but otherwise the rendering process was the same as the previous game. Pokechu22 was able to derive what the value was in memory and JMC47 started patching out every occurrence of the floating point number directly until he found the memory address that fixed the rendering. Once that was done, he made a new Fifolog for Pokechu22 to confirm it was working and ported the code to the game's other regions.
...That's still not the end of our story, however. SpongeBob Squarepants: Battle for Bikini Bottom and The SpongeBob SquarePants Movie may have been the two most popular titles with these issues, but Heavy Iron Studios employed this same technique in most of their games. JMC47 knew the exact steps on how to find the offset EFB copies and fix the issue thanks to the work done for The SpongeBob SquarePants Movie. While none of their other GameCube titles are nearly as popular, there was no reason outside of laziness not to go through and make the patch work for all of the known games with this issue. There were no base codes to work from this time, but at this point he didn't need them.
Over the next few weeks he collected and went through every known title and revision with the issue and developed patches for each one. So now titles like The Incredibles, The Incredibles: Rise of the Underminer, and even the various 2 in 1 SpongeBob/Incredibles releases now have patches to make sure they run properly at higher resolutions.
And now all of these games work at higher resolutions, not through any kind of improvement in emulation, but through modifying the games to render in a more friendly way. While Dolphin usually doesn't ship patches or enhancements on by default, we also realize that that a lot of our users enjoy Dolphin's ability to run games at higher resolutions. In this case, we couldn't do much more to emulate the game better, but by changing the game itself, we were able to get things rendering nicely at higher resolution. And remember, having the patches enabled do not compromise the game's rendering at native resolution. Thus, in the latest development builds we've enabled these patches and Vertex Rounding by default.
In the latest development builds, we've also disabled Dual Core by default for these games after JoeyBallentine and the speedrunning community for SpongeBob Squarepants: Battle for Bikini Bottom notified us of instability issues with the game. Because these games are fairly lightweight and don't require many strenuous features, this shouldn't be a problem for most desktop computers or high-end Android devices. But if you're willing to deal with occasional crashes, you can always re-enable Dual Core for this title in the game properties page.
Over the years, we know that there have been many players disappointed with the many problems that Dolphin had with Heavy Iron Studios' games. Now might be the perfect chance to give these classic games another playthrough. We think you'll like what you see.
When they're not being pestered to help reverse engineer other games, Pokechu22 has been known to dive into some of the weirdest graphical glitches afflicting Dolphin. This one caught their eye on the issue tracker as it was an issue with literally no lead. Shadow the Hedgehog, a game that absolutely every single Sonic fan loves, had issues with rendering eyelids, especially during certain cutscenes. Fortunately the tester provided a Fifolog of the bug, so Pokechu22 analyzed the Fifolog and found that the eyelids had a texture coordinate of NaN (Not a Number). As this seemed incredibly wrong, they decided to play back the Fifolog using the Hardware Fifoplayer and found something very interesting.
It seemed as though real hardware was automatically correcting for NaN in some way. Pokechu22 continued poking at the issue on console, eventually determining that the console was interpretting NaN texture coordinates as either "1" or "-1". After a thorough analysis of values to see which was most correct, they added a condition to Dolphin's graphics emulation to convert NaNs to 1 In order to remedy the issue for now.
There are still some oddities around how the eyelids render in Shadow the Hedgehog, but this greatly improved the situation for now. Unfortunately, if you're using D3D11 or D3D12 Ubershaders, we can't exactly emulate this behavior. D3D11/12 automatically optimizes out Dolphin's attempts to use isnan to check for NaN values in shaders, no matter how much we try to tell them that we really need to know if this is NaN or not. Because of this, the eyes remain broken on D3D11/12 when using Ubershaders for the time being.
5.0-14829 - PowerPC: Implement Broken Masking Behavior on Uncached Writes by JosJuice with help from eigenform, delroth, phire, marcan, segher, Extrems, and Rylie¶
It's hard to find a bigger All-Star cast of GameCube/Wii emulator developers and reverse engineers for one change. What brought them together wasn't some massive bug in Dolphin or some problem affecting hundreds upon hundreds of games. What brought them together was a strange hardware bug, and figuring out how to test it and eventually emulate it. This hardware bug has actually been known for some time, but it was ignored as there was no use case for it in any retail game... until now.
It's well known at this point that the N64 Zelda games are incredibly broken. With Arbitrary Code Execution (ACE), players literally write some code into memory and have the game execute it to let them go straight to the credits. Thanks to this, speedruns in the N64 version of Ocarina of Time have fallen to under ten minutes. Unfortunately, so far ACE isn't possible on the Virtual Console versions of Ocarina of Time and Majora's Mask, which has left players searching for alternatives and new ideas.
One exciting development, originally discovered by MrCheeze, is known as LightNode SRM. This is a more powerful method of Stale Reference Manipulation (more commonly known as Use-After-Free or UAF as a software vulnerability) that actually works better on the Virtual Console releases, and it's pretty fast to do. It works in both Ocarina of Time and Majora's Mask and is now used in the fastest route in OoT Any%!
So what exactly is LightNode SRM and why does it work better on the Virtual Console versions of these games? Let's first focus on the glitch itself. The core component to triggering the glitch is to find a way to get Link to carry an "actor" (game entity) that has been unloaded. One of the easiest methods to do this is to change rooms while in the grab delay state while using a classic superslide.
Once Link is carrying nothing, that's when the fun begins. He's actually writing three words and two halfwords into parts of memory that used to belong to the actor but are now being reused for other stuff. Even if you performed this glitch on most actors, you'd still only be constrained to overwriting parts of the actor heap, which contains things like the enemy and item data. These are useful enough to mess with, but a way to escape the actor heap would break things way open. That's where LightNode SRM comes into play!
LightNode SRM is named literally because it relies on the lights in the game. The way the game handles lighting sources that load/unload dynamically is with a doubly-linked list of all such sources currently loaded. One of the examples of this are those torch lights throughout the game.
The key to this is that every time an actor with a dynamic lighting source unloads, the associated LightNode is removed from the linked list:
node->prev->next = node->next; node->next->prev = node->prev;
The pointer to the light node (
node in the code snippet) is stored in the actor instance data, so it can be overwritten by doing SRM and changed to point to a region of memory that speedrunners control. This means that
node->next can be anything they want and point to whatever they wish to modify, even if it's outside of the actor heap!
In essence, this gives the speedrunners the ability to do arbitrary RAM writes in both games! Because
node is manipulated into pointing to Link's respawn coordinates, the write address and the written value are ultimately controlled by Link's position. On the Nintendo 64 hardware, what you can do with this is a bit more limited than on the PowerPC consoles simply due to how the CPU behaves in a few key situations. In Ocarina of Time, this doesn't end up mattering as much because the filename is aligned and is close enough to access from the linked list. By getting the LightNode pointer to read the filename, you can setup a payload with the name of your file.
Things are a bit different in Majora's Mask, the filename too far away in memory thanks to its use of the N64 Expansion RAM meaning things are a bit more complicated. Here the GameCube/Wii's ability to do unaligned writes come into an even bigger role. This gives players much greater control over what values they can write into memory, and allows them to realistically control writing floating point values into RAM. This made the GameCube and Virtual Console versions of these games able to do things not possible on N64 hardware. Since the Virtual Console versions are faster and more stable than the GC N64 emulators, speedrunners continued to focus on them.
The first LightNode SRM route came into fruition in Majora's Mask with an Any% route that didn't quite take any World Records but still showed the potential of this new technique. Not too long afterwards, LightNode SRM would make the news as it allowed for the first sub 7 minute run of any release of Ocarina of Time! This route only works on GameCube/Wii, due to another CPU behavior that causes a phenomenon speedrunners call DoubleWord Write (DWW) and QuadWord Write (QWW). We'll get into this a bit later as the reason for this is a bit complicated.
In terms of speedrunning, these techniques open up a ton of new options, especially for non-Any% runs. Arbitrary Code Execution is so powerful that it is banned in many categories in order to keep definitions simple and keep the routes interesting. Once you're able to execute your own code, you can do anything. Note that the ban on ACE includes using LightNode SRM to overwrite code, but because it can be used to overwrite data, which is perfectly legal, it has huge implications in tons of different categories.
As with any major glitch, players sought to push it further through emulation. This time, they wouldn't be using an N64 emulator, as the Nintendo Wii's Virtual Console releases were the best suited for the speedruns. Unfortunately, Dolphin wasn't quite up to the task as shown in the video above. While all of the basic steps of the trick worked, the values that were written into RAM by LightNode weren't quite the same on console and Dolphin. The reason why might be a bit surprising: they were relying on a feature that worked in Dolphin but was broken on console.
That's right, this was a trick that relied on accurate emulation of Uncached Writes.
On the Nintendo GameCube and Wii, there is a hardware quirk that causes unaligned uncached writes to misbehave. This hasn't been a major issue for Dolphin as all retail software has stayed away from actually using such writes. There was very little reason to put forth a ton of effort to emulate what was a dead feature on console. However, remember that CPU feature that was mentioned just above that caused DWW and QWW? That was the notoriously broken unaligned uncached writes! Their broken behavior allowed them to write over larger parts of memory than would be otherwise possible. And this was the key; in the credit warp route showed above, they actually wrote over two adjacent entrance records to account for two different credits cutscenes in Termina Field!
Unfortunately, this is where things fell apart for Dolphin. These uncached writes behaved correctly in Dolphin, meaning the broken duplication behaviors were not emulated and the second entrance record was not written. Speedrunners from Majora's Mask approached Dolphin developers with the question: Can you get this working?
With an issue report filed and speedrunners ready to test things, JosJuice started developing a hardware test to figure out exactly what was happening. Unfortunately, things grinded to a halt almost immediately. It seemed as though trying to verify the behavior on console resulted in a sudden crash. No matter what was tried, the uncached writes would freeze the console. More developers were brought in to try to figure out what was going on. The console shouldn't freeze; we knew that because the speedrun already proved that this should at least work. Slowly, more developers caught wind of the problem and soon enough there was an all-star cast of reverse engineers from throughout the years diving into this strange issue. Multiple angles were tested, with some developers trying to reproduce the conditions of the game, and others trying to simplify the test further to see if something was wrong elsewhere in the test.
No matter what happened, the console would stop at the first uncached write.
After a few days of messing with things, eigenform cracked the code. Almost all homebrew is built upon libogc, which is a homebrew library for interfacing with the GameCube and Wii hardware. It makes writing, compiling, and testing homebrew a lot easier by providing APIs for interfacing with the hardware and tons of example projects. eigenform decided to write a test that didn't use libogc. While this was a much more difficult way of going about things, it ended up verifying that the problem was in libogc itself. With tons of developers already on-hand, it only took a few minutes to figure out that the supposed freeze was libogc forgetting to clear an interrupt and thus handling the same interrupt over and over again forever. This bug has now been fixed in the latest versions of libogc as a result of this testing.
Once the hardware tests were working, JosJuice was able to test and verify exactly how uncached writes worked on console and implemented their broken behaviors into Dolphin. We asked the speedrunning community to make sure everything was working correctly and they came through, showing off Majora's Mask's Virtual Console Any% route working correctly in Dolphin.
That isn't to say that nothing used it, in fact, we've known about one particular homebrew that relied on it as a means of security. Closed source versions of the Homebrew Channel have a ton of protections and strange behaviors to ensure it is being run in a responsible manner. One of the features that they relied on was this broken uncached write behavior. We didn't bother messing with it too much as, back in the day, Dolphin was wrong in so many ways that it wasn't even close to getting it to boot. But years of fixes have changed the landscape, and with support for unaligned uncached writes, closed source versions of the Homebrew Channel are finally showing signs of life in Dolphin. In fact, they'll even make it to the menu... albeit with a few quirks.
Some of our users may be wondering how they've already been running the Homebrew Channel in Dolphin. Well, the latest version of the Homebrew Channel (referred to as the Open Homebrew Channel) removed all of the protections as a way to help Dolphin run it more easily. As most bad actors preying on the channel had long moved onto more modern consoles, the protections didn't do anything beyond cause headaches for Dolphin, which wasn't their intention. Still, it would be a neat accomplishment to actually defeat all of the trials of the older versions of the Homebrew Channel and get them to boot up without triggering any of the protection routines at all.
A lot of Dolphin's work around outputting frames has to do with getting things working with as little latency as possible. While many games make things simple by using the default Video Interface (VI) configuration and allow Dolphin to cheat past most XFB emulation and reduce latency beyond what's possible on console, many games do more complex VI configurations. This forces Dolphin to do the right thing and actually emulate the scanout procedure. Even when doing that, Dolphin still cheats a little bit by outputting the XFB copy with the settings it has at the beginning of a field than applying them throughout the process of scanning out the XFB copy. This saves ~16ms of latency, was thought to be globally supported... until WWE Crush Hour showed up.
This strange, strange game does a behavior no one has seen in any other retail title: it changes VI settings while a frame is being scanned out to the screen. Dolphin couldn't do anything about it because it had already completed with incorrect VI settings by the time the game had changed them. After four years of an incorrect bisect, ZephyrSurfer rebisected the issue, realizing that the previous bisect to Hybrid XFB was incorrect. This new bisect took them to 5.0-146 and the latency optimization.
Upon seeing this, phire notified everyone of this latency hack and explained how to work-around the issue. Essentially, Dolphin just needed to add an option to wait until the end of scanout to actually scanout the frame. This would result in a full frame of extra latency, so they suggested that this be implemented as a separate option just for this game. While they were too busy, the instructions they left were clear enough for Techjar to implement this fix. It's important to note that neither of these implementations are accurate; in order to do things correctly, Dolphin would apply VI configurations during scannout, which would be a lot more difficult without much, if any, benefit.
A new contributor to the project, K0bin noticed a missing memory barrier in Dolphin's D3D12 backend that could be causing issues with GPU Texture Decoding. While NVIDIA's drivers were unaffected (likely due to ignoring state transitions like they do image layout in Vulkan) turning on GPU Texture Decoding on AMD cards would cause serious problems and crashes. This is a one line change that fixes the oversight.
5.0-14821, 5.0-14833, 5.0-14897, and 5.0-15019 - Fix
dcbx Invalidation Performance Without Breaking Everything This Time by AdmiralCurtiss and JosJuice¶
If you've been following the development builds the last month, you might have noticed a trend with the
dcbx (data cache invalidate/flush/zero etc.) changes. Our goal has been to make large invalidations fast and correct, but getting both at the same time has proven to be tricky. At the end of the last Progress Report, we thought we had a simple fix to the performance regression regarding various data cache flush/invalidation instructions. Everything seemed to be working fine in both of the broken cases, and developers thought the saga had reached its end. How foolish we were.
After the Progress Report launched, issue reports started rolling in, with the most popular series affected seeming to be the Mario and Sonic at the Olympic Games series. Early analysis revealed that things were going very, very wrong. Dolphin's x86-64 JIT was misbehaving and double translating the address during
dcbx and thus jumping to the wrong flag. Masking was also broken in both of Dolphin's JITs, leaving quite a few games broken at the turn of the month.
JosJuice and AdmiralCurtiss teamed up to quickly fix the JITs and get things working again as soon as possible. JosJuice focused on fixing the masking behavior in the AArch64 JIT and AdmiralCurtiss made some bigger changes to the x86-64 JIT to make sure that Dolphin would always pass the
dcbx instructions the correct effective address.
With their quick work, everything was running correctly again within less than two weeks. However, these fixes came at a rather ironic cost. These changes erased the performance gains from the last Progress Report, meaning that Arc Rise Fantasia and the other games that invalidate large areas of memory at once were yet again having performance problems. In the simplest terms, things weren't in a very good place and there was some work to be done.
AdmiralCurtiss dove back in to see exactly what was so slow and found the smoking gun. Super Mario Sunshine's initial boot, one of the worst case scenarios for
dcbx, was invoking the page translation code 134 million times. What's even weirder is that most of what the game is invalidating isn't even mapped to physical memory! Doing all of this as individual calls is just grossly inefficient, so something had to be done to batch things to reduce the overhead. So we took a cue from how idleskipping works. Idleskipping is a feature in Dolphin's JIT that looks for loops of code that are designed to just burn off excess CPU time. Instead of emulating these null operations, we can detect them and skip them, while adjusting the downcount and other things to greatly increase CPU performance.
dcbx isn't a null operation like idleloops, by detecting a loop of them AdmiralCurtiss made it so that instead of doing hundreds of millions of individual calls, Dolphin statically analyzes the situation to batch them into one bigger dcbx, and then adjust the downcount and other things so that this optimization ends up equivalent to the individual calls in terms of what the emulated environment sees. This trick restores performance to where it was before in these games without breaking anything else.
Both the x86-64 JIT and the AArch64 JIT are outfitted with this new optimization, with the latter being implemented by JosJuice. And now things should be both fast and correct, rather than us constantly bouncing between the two like we have the past couple of months.