This month, the story can't be anything else but CPU optimizations and fixes, after Fiora decided that if the code is in the JIT, she will make it faster. Nothing is safe from her. Since the end of July, Dolphin's JIT CPU core has seen a 26% performance boost in the Dolphin Benchmark. That is not a typo.
On the accuracy front, we've got some nifty changes that fix bugs going back to the beginning of time for Dolphin. Some ancient audio bugs bite the dust, some floating-point accuracy are ported into the JIT from the SoftwareFP branch, and we found out that some games are doing things they really shouldn't be doing. If you see a change that affects a game you're playing, remember that all of these changes can be found in the latest development builds!
Audio tends to be a secondary mission for young emulators. This is because most games are still playable without audio output, whereas no video output would make almost any game unplayable, and as such, early development tends to just get audio barely working then leave it alone while more important tasks are handled.
Dolphin is not really a young emulator and as it continues to mature each and every defect becomes more and more noticeable. While booto isn't known as an audio guy, he took up the fight to fix regressions uncovered by the DTK Audio merge last month, and now he's fighting to fix up other bugs in the Direct Memory Access (DMA) shared between the DSP and CPU.
Thanks to accurate emulation of those features, Pokemon Snap (VC) has audio for the first time in Dolphin's history, and Harvest Moon: Magical Melody's audio finally plays at the right rate, without needing any kind of specialized settings.
The result of booto's work is obvious: an unprecedented amount of small and big audio errors spanning both HLE and LLE audio have been fixed over the past two months. It wasn't easy and has led to many (short-lived) regressions and required much hardware testing and several merges, but in the end, the increase in accuracy speaks for itself.
Undefined behaviors are always a scary thing to emulate. In this instance, no one even knew that games could change the Graphic Quantization Register (GQR) reserved bits, let alone use them for some kind of functionality. Even if a game could change them, it could be assumed that professionally developed games wouldn't rely on undefined behaviors in the GameCube and Wii, right?
Not one, not two, but at least a half-dozen games decided that these registers were too tempting to ignore!
In Dirt 2 and Are You Smarter Than a 5th Grader?, this resulted in certain textures appearing extremely bright. In Turok: Evolution, Vexx, XIII and possibly others, it resulted in crashes. We previously thought these games required full MMU emulation, which works around their issues, but now they can run without any kind of special settings, resulting in great performance gains in those games.
Most interestingly is that Cel Damage somehow used these bits in a way that not emulating them caused severe collision detection issues! These are the few issues we've been able to find in such a short time; it is very likely that other odd behaviors in games will be affected as well.
This regression managed to sneak through as we begin the long overhaul on Dolphin's aged vertex loader. Without this caching, the emulator suffered a 20% performance regression in vertex heavy games such as Super Mario Galaxy, The Last Story and more. A simple reimplementation of the feature brings performance back to where it was without any issues.
Long, long ago, OpenGL was considered purely an accuracy backend with subpar performance. That all changed with the GLSL rewrite, when degasus modernized the OpenGL backend with tons of new features and performance boosts without sacrificing its higher accuracy. This work is also why we are able to support Android devices with our OpenGL backend.
Fast forward to now and both D3D and OpenGL are faster than ever, and, thanks to rewrites to VideoCommon, they also share a lot of code! Why is this important? Because some of the code that OpenGL copied from D3D was wrong! In this case, now both backends had broken depth matrix shaders! D3D9 was only unaffected because it was such an oddball and most of it's code was segmented off from video-common. Unfortunately, soon after the OpenGL rewrite, D3D9 was dropped for accuracy reasons, leaving many popular games with bugs that had no workarounds.
Enter KScorp, a newcomer to the notable changes, who has fixed the problem in both OpenGL and D3D in one fell swoop. Considering that depth is fairly important, this fixes quite a few prominent issues in popular games, such as The Legend of Zelda: Skyward Sword, Luigi's Mansion, and Metroid Prime.
A lot of users may have noticed that when using Memory Cards (not GCI folders) in GameCube games, Dolphin would freeze on close. This may be a relatively minor commit, but a lot of people and games were afflicted by the crash, and it was important to fix it as soon as possible.
When rewriting the PPC_Analyzer, Sonicadvance1 did a great deal of labor-intensive restructuring, requiring a great deal of time and commitment. Once it compiled and everything worked, he clapped his hands together and walked away from a job well done.
Unfortunately, he and everyone else failed to notice that an important piece of the PPC_Analyst was misplaced and was never getting called at all. Fortuitously, the missing JIT optimization only needed to be moved to the correct place – it was otherwise in working order! This fix will bring a small boost to CPU-limited games, mostly noticeable in Virtual Console titles.
It seems like every month or two we're waving off a longstanding feature. After months of consideration, the DSound audio backend was finally removed in a recent update. The newer XAudio2 audio backend does everything it could, but with less latency and better code internally. The only unique benefit DSound provided was native Windows XP support, but that hasn't been a valid point for Dolphin in months. In the end, XAudio2 and OpenAL are just better options and now the time of retirement is finally upon us.
Goodbye DSound, and thanks for the years of service!
Fiora is always looking for creative ways to optimize the JIT. This time, her eyes set on the PPC_FP. This change prevented non-floating-point data from being inadvertently corrupted when passed through floating-point registers. Unfortunately, the performance cost was steep in some games, but had to be done in this case. Fiora cleverly realized that only 0.01% of floats actually suffered from the corruption problem, and the rest could go back to using the fast path without any issues whatsoever. It's a simple solution that makes both performance and accuracy seekers equally happy.
In the June Progress Report, we showed a video demonstrating inaccurate physics emulation in Mario Kart Wii which results in replay files recorded on the Wii failing to playback correctly. But, with magumagu's then-new software floating point implementation, performance could be traded for accurate emulation of floating point instructions to get the replays to sync correctly.
As you'd expect, the software implementation was extremely demanding and would have caused extreme performance regressions in almost every game, even though 99% of them don't even require this level of accuracy. That and the fact that Dolphin had suffered from these problems since the very beginning meant there was no urgency to merge a solution. The work had been done; magumagu figured out how the instructions work and all their odd ways of rounding. It was only a matter of time before someone took up the challenge to implement them in a faster manner.
Enter JIT magician Fiora, who implemented both instructions more accurately in both JIT and Interpreter without the harsh speed penalties of Software FP. The thing that must be remembered is that without magumagu's SoftFP fork, this would not have been possible. Implementing the instructions quickly and accurately in Dolphin is hard enough, but imagine not knowing what "correct" actually means for that instruction!
Games that are confirmed to have perfect physics through replay analysis are Donkey Kong Country Returns, Super Smash Bros. Brawl, Mario Kart: Double Dash!!, Mario Kart Wii and F-Zero GX. We can also confirm that Super Mario Galaxy, Super Mario Galaxy 2 (Collision detection gravity changes), Super Mario Sunshine (Various collision detection) and The Legend of Zelda: The Wind Waker (Bomb Collision detection) have more accurate physics and collision detection, but without console replay support it cannot be confirmed as perfect.
Do note that this is only for the x86-64 JIT. JITIL does not have these changes as it still wouldn't be accurate enough even with them. JITARM32 also won't pass any of these hardware accuracy tests. Considering how weak phones and tablets are compared to PCs and how different the architecture is compared to x86, none of these changes really translate over. And even if they did, our loyal Dolphin mobile users likely would prefer whatever speed they can get over fringe accuracy benefits.
Most floating-point operations on the Wii/GC set result flags based on the result of the operation. As most games don't seem to care about this feature, it was only implemented in the interpreter, and the JIT would fallback to the interpreter for result flags in the few games that required them. But one well known engine aggressively uses FPRF in a way that breaks if they're not emulated: none other than the one behind the Super Monkey Ball series and F-Zero GX.
Falling back to the interpreter in order to support this is incredibly slow.
Adding FPRF to the JIT avoids the interpreter fallback and the speed hit. POVRay, the homebrew commonly used for benchmarking Dolphin, is almost entirely floating point operations, so it shows the benefits very clearly. Of course no game uses FPRF as much as POVRay, so the speed benefits will not be as extreme in games as it is in POVRay.
There is still a small performance hit however, and since very few games care about this, for now FPRF is not going to be enabled by default. However, if you encounter a game with some odd problems that don't occur in interpreter, try adding "EnableFPRF=true" to the game's INI and see if that fixes the issue. Thanks to this change, you can easily try it for extended periods of time without impacting that game's playability.
Savestates used to take roughly 570ms to make. At over half a second it is definitely noticeable, but not a big deal for most users. comex decided that it wasn't good enough and optimized the PointerWrap function, which is used for writing savestates. The result is that savestates now take a little over one frame (17ms) to save, with absolutely no drawbacks.
Frequent savers and TASers rejoice!
An Absolutely Relentless Unending CPU Assault, 4.0-2442, 4.0-2530, 4.0-2560, 4.0-2564, 4.0-2566, 4.0-2589, 4.0-2618, 4.0-2645, 4.0-2647, 4.0-2649, 4.0-2775, 4.0-2781, 4.0-2783, 4.0-2787 by FioraAeterna¶
And those listed there are the ones NOT already listed in the notable changes. Fiora is an absolutely relentless coder with one mission in mind: make Dolphin's CPU emulation better. Normally, any kind of CPU accuracy or speed increase would almost automatically make the notable changes list. The problem here comes at the rate as which she works; there is absolutely no way to test, verify, and make the content required to properly demonstrate all of her changes. Even if it were possible, no one wants to read a Progress Report that's 30,000 words long!
Rather than go through all of these commits and try to find games and situations where things are sped up, we decided to roll all these CPU optimizations into one and compare them on Dolphin's CPU Benchmark before and after to see how much of a difference these changes could make.
The Man Behind the Machine¶
Things have been running relatively smoothly for Dolphin for a while now. The code is improving, new features and fixes are coming in at a steady pace, and regressions are being found and handled as they come to light. There have been some huge changes made by some big names within the project, but one of the biggest names isn't even trying to make big changes. Despite this, he's one of the integral players of the project, and has had the most commits every month (Over 100 this month alone!) that Dolphin has been on GitHub. We're talking about none other than Lioncash.
Every big open source project needs a Lioncash of their own. He eats, sleeps, and then dreams about maintaining the Dolphin codebase. His numerous contributions span the entire project, making code cleaner and fixing warnings as he goes, usually with no change at all in behavior. When he makes changes and users see no functional difference, that means he did his job perfectly. He is the fifth-most-active coder across the entire 10+ year history of the project despite first committing in early 2013!
Without his merges and changes, everything about coding for Dolphin would be harder – big changes would take longer, and new talent would have a harder time getting acclimated with the codebase, meaning that we wouldn't see as many new developers. He even reviews a lot of the pull requests that other members submit, making sure that the new code being added to Dolphin is up to the heightened standards that are being set.
So, to you, Lioncash, thank you for your diligent behind the scenes service. May you someday merge a maintenance update that somehow breaks into the notable changes.
Special Thanks to Last Month's Contributors...¶
RachelBryk, lioncash, tilka, Sonicadvance1, degasus, Armada651, delroth, sigmabeta, neobrain, LPFaint99, magcius, booto, shuffle2, magumagu, Fiora, Parlane, jimbo1qaz, RolandMunsil, archshift, Rydroid, kamiyo, ChuckRozhon, KScorp, Linktothepast and comex for incrementing Dolphin 4.0-2356 through to 4.0-2824!