Optimizations seem to beget even more optimizations. It was big news when last month we got a nifty 26% boost in CPU performance, but somehow, two dedicated devs managed to top it this month. Not to be upstaged by Fiora , comex has dropped new features and two absolutely gigantic performance commits. By making tricky use of registers and native RET behavior, two of his merges alone result in a massive 16% performance boost to games.
Not to be outdone, Fiora has continued her rash of optimizations as well. If we were to include every single one this progress report may never end. So instead, she crunched some numbers with all the optimizations over the last two months put together.
- Dolphin POV-Ray Benchmark: 62% faster
- Sonic Colors: 39% faster
- Star Wars Rogue Squadron II: Rogue Leader: 103% faster
- F-Zero GX: 110% faster
- The Last Story: 38% faster
- Xenoblade Chronicles: 40% faster
Let's just admire that list for a moment. The Last Story is considered the most demanding game on Dolphin, requiring massive overclocks on even the strongest of machines. A 38% speedup is the difference between it being playable and choppy for users with powerful computers.
Star Wars Rogue Squadron II: Rogue Leader has a lot of problems, but MMU performance will be the least of them from now on. Fiora's optimization of how the JIT handles MMU games brings us huge speedups to every MMU title!
Of course, speed isn't everything for an emulator: Performance is pointless if the emulator does the rest of its job in a lackluster matter. Have no fear, we have new features and some critical bug fixes to go along with Dolphin's newfound speed!
All of the latest features mentioned this month can be found in the latest development builds available here.
Audio emulation is beyond tricky. Even with New-AX-HLE and even DSP LLE, there are still a lot of bugs to be found. This time, skidau caught a bug in audio loop point emulation causing a multitude of audio defects.
At 20 seconds, notice the music slowly become more and more garbled.
skidau's fixed loop-points eliminate this issue in both HLE and LLE audio.
This is actually a longtime issue in the emulator that developers have worked on before. Unfortunately, fixing one game broke another, and so it went on until just a few games were broken and it was deemed "good enough". But through hardware testing and experimentation, skidau tracked down the true source of these problems.
The key behavior that everyone missed is that the size of the audio data actually affects the loop point! Without that vital adjustment, it was impossible to get audio working in all games in Dolphin.
This issued manifested itself in a few different ways:
- FMVs can become desynced and have garbled audio - Mega Man X Collection and Pac-man World 2.
- Instruments can sound detuned - Skies of Arcadia Legends, Tales of Symphonia, and Pokemon Colosseum.
- Audio can completely desynchronize - Taiko no Tatsujin Series., Rhythm Heaven Fever Series
Thanks to the dedication of longtime Dolphin developer skidau, we can finally wave goodbye to this annoying audio issue.
Credit for this fix belongs squarely with our users. Logging shouldn't slow things down, so when it was first reported, we mostly turned the other way, figuring it was someone's computer acting up. But, when they remained diligent to find a solution, they reported that one of our logging options was causing slowdowns in audio streaming titles.
With this information, we had to check it out and indeed confirmed that the FileMonitor logging option was touching something that could cause a serious performance hit in certain games. The most noticeable game affected was 1080 Avalanche, which slowed down to about 40fps regardless of the computer running it.
Based on this, the decision was made to disable logging by default in development builds. Anyone can still activate the logs if they wish by clicking on the view button on Dolphin's menu bar, clicking the Log Configuration button, and activating all logs. Please remember to use and check logs for suspicious problems when reporting issues to our bug tracker.
Fixes issue 7616
OpenAL is notable for Dolphin because it is the only audio backend that currently supports time-stretching audio during slowdown. However, OpenAL has high latency, keeping most users away from it.
By updating OpenAL-Soft, skidau has reduced the latency in the OpenAL backend considerably, putting the backend much closer to the other options in latency. Combined with 4.0-2948, audio playback in this backend should be better than ever. Anyone suffering from choppy audio during slowdown should definitely consider OpenAL.
Everyone skips those pesky intros in Mario Kart Wii. They just take time, you've seen them a million times already, just skip skip skip. Unfortunately, with DSP HLE there was a crash bug during the intro to levels in battle mode. Since everyone skipped the intros it went unnoticed for thousands of builds!
Having done other looping work already this month, skidau realized he could move the loop check back to a post-loop condition, fixing this bug and making behavior accurate akin to DSP LLE.
Fixes issue 7627
In x86, you can access data at certain memory offsets (from -128 to 127) from a register with a one byte offset. Larger offsets require a 4-byte offset, which takes more space in the code.
In this commit, comex sacrificed a free x86 register to point right to the middle of the PPCstate structure. This means that a whole bunch of the common parts, like general purpose PPC registers, program counters and condition registers are now in that -128 to 127 range! The smaller codesize ends up greatly outweighing the loss of the one register.
The tweak results results in a ~8% performance boost in games due to a reduction in code size.
Sometimes cool ideas pop up out the blue and end up working really, really well. Wiimote Audio is one of those things that work much better in theory than in practice, even on the actual Wii. On Dolphin, though, Wiimote Audio is a mess thanks to the limitations we have with Bluetooth. For years, the idea has been tossed around to use Emulated Wiimotes to send that data to the speakers. This never really went anywhere because Dolphin lacked a dedicated audio mixer... until recently.
The Synchronous DTK Audio merger brought with it a proper audio mixer, and here skidau smartly uses that to mix Emulated Wiimote speaker data over onto system speakers. Without the limitations of Bluetooth and proper speakers, some of the sound effects actually sound quite nice!
Due to the nature of Real Wiimotes, it is not possible to redirect Wiimote audio from Real Wiimotes to the system speakers. This change only affects Emulated Wiimotes for the foreseeable future.
Considering there are zero commercial 64-bit Android ARM devices available, this commit may seem a bit ahead of schedule. But so was putting Dolphin on phones in the first place! Sonicadvance1 again shows himself as a forward thinker, beginning his implementation of another JIT that will bring unprecedented mobile speed to AArch64 processors, including the Tegra K1 Denver.
Thanks to the work of all Dolphin Developers to make Dolphin processor independent, adding support for new architectures in the future isn't such a pain. To put it simply, Sonicadvance1 ran into and corrected most of those problems when adding support for Android the first time around.
"MMU - Required ON" is something no one wants to see on the wiki page of their favorite game. Titles that require Dolphin's full Memory Mapping Unit (MMU) option have suffered slowdowns from the moment we began supporting them. Why?
These games make use of features that are very, very hard to emulate: two of them being exception handling and virtual memory. At best, an MMU game will play correctly and not crash at whatever speed your computer can handle. At worst? They won't run at all.
While there's much to be improved in MMU handling besides speed, Fiora decided to investigate whether there were any ways to improve performance for these games. It started out in typical Fiora fashion, where she started adding JIT support for a few instructions that previously fell back to the interpreter, making a few instructions more accurate.
While profiling, Fiora came to an epiphany: Much of the slowness that MMU games suffer from doesn't come from the MMU handling itself.
MMU emulation forces the JIT to generate massive amounts of extra code. The quantity of this code was so enormous that it overflowed CPU caches easily, forcing the CPU to spend vast amounts of time sitting around doing nothing, waiting for code to execute.
Fiora decided to exile much of this code, such as exception handling, to a "FarCode Cache", sitting nice and far away from the code being run so that the CPU didn't have to fetch it unless absolutely necessary. But Fiora learned the hard way that MMU games do not like being touched.
This isn't the worst thing we've done to a superhero. While Spiderman is a flying box, we once turned Batman into a flying crotch.
Every single commit in her branch had to be strenuously tested and checked. These fickle pieces of programming would not let even the tiniest of errors get past them. Many bugs had to be tracked down and the details of the FarCode Cache tuned again and again. The end of the road seemed to turn around another corner every time she thought she was at the end. For what was her struggles worth?
Star Wars Rogue Squadron II: Rogue Leader is up to twice as fast as before, and every other MMU game has gotten a boost of 30% or usually more! Even non-MMU titles seem to benefit from the FarCode Cache, often gaining around ~6% performance. Popular games like Spider-man 2 are now playable at full speed on powerful computers.
In this edition of yet another insane optimization idea that worked really well, comex brings to us this little gem. Like fastmem, his patch re-uses a feature of x86 CPUs to help emulate a similar feature from the PowerPC CPU on the Gamecube and Wii. Most modern CPUs track calls to functions so that they can guess where to go when the function returns; this feature is called the return prediction stack. Without this, returning from a function can be costly performance-wise -- like it was in Dolphin's emulation.
In comex's patch, when the emulated PowerPC performs a function call, Dolphin pushes the return address onto the x86 stack. Then, when it returns later, Dolphin compares the real return address against the one on the stack, and if they match it returns. Since calls and returns almost always line up perfectly in real programs, the fast path is almost always taken. From the perspective of the x86 CPU, we're just doing normal function calls and returns -- so it uses its return prediction stack to our advantage! Overall, this gives an amazing 8% performance improvement in most games. It may not seem like that much, but the optimizations like these quickly add up. Using the strenuous Fountain of Dreams stage from Super Smash Bros. Melee, we can see how the JIT optimizations affect a mainstay game since Dolphin 4.0.
Extra Credit: Try to figure out when Fiora and Comex started optimizing things.
Real External FrameBuffer (RealXFB) is an accuracy option that is used in situations where we need the absolute best XFB emulation. The vast majority of games don't even use the XFB, and can be emulated perfectly without it, but many niche titles, homebrew software, and Virtual Console games rely on it.
Firstly, RealXFB Scaling isn't support for RealXFB at Higher Internal Resolutions, but rather is for games that scale the XFB in odd ways before rendering it to the screen. A lot of titles that run at lower resolutions use this feature to make the game fill the whole screen. One of the best examples of this can be found in The Legend of Zelda: Collector's Edition when not using progressive scan.
This is a great feature that helps games and homebrew that require RealXFB to be played properly. But, this feature isn't done yet! The incomplete branch was originally made months ago by magumagu, but was only recently cleaned up by Sonicadvance1. What is incomplete? The branch was never updated to support D3D! The devs came to the conclusion that it would be better to have the games working properly in one backend rather than none and merged it in despite the missing D3D support. Considering that the proper implementation is there and just not implemented in D3D, this may be a fairly good task for newcomers looking to help improve the emulator.
This is one of those fun commits where one change affects multiple games in a multitude of ways. It's a stability improvement; fixing random crashes in games like Mario Superstar Baseball (though with random crashes it's hard to verify when they're truly gone!)
The second benefit is that two games get properly emulated audio for the first time! While they did work before at some point in Dolphin history, this was due to hacks that broke other games. As such, Burnout 2: Point of Impact and the obviously hugely popular Piglet's Big Game were the odd couple out and lost their music to accuracy fixes before 4.0. This makes everyone happy and fixes everything. If you were suffering from random crashes in a game, try a build newer than this one; you may be surprised!
Ever since comex brought to us a prototype Dualcore Netplay branch late last year, there's been a lot of clamoring about porting the feature into Dolphin proper. Netplay, for those uninitiated, is Dolphin's way of turning single console multiplayer experiences into online experiences. Unfortunately, this requires Dolphin to be perfectly deterministic and all players to be in perfect in sync. Originally, netplay required Single Core, LLE audio (not even on thread!) or no audio at all, along with a plethora of other settings that could muck things up.
But as Dolphin has improved in other areas, these improvements have benefited netplay as well, and the bizarre requirements have been dropping one by one. And now, thanks to Dualcore Determinism, users can use the Dualcore setting on netplay successfully in most games. This means that popular titles like Super Smash Bros. Melee, Kirby Air Ride, and the Mario Party series can be played online with dualcore!
Dualcore Determinism in Master out performs both single core and the old DC-Netplay Branch, while being more compatible than the latter.
There is a catch, however. As can be seen on the graph, Deterministic Dualcore is slightly more taxing than standard Dualcore. But it's still much faster than single core and even faster than the old Dualcore Netplay build while being much more compatible than the old system. DC-netplay 652 only worked in roughly 25% of games, while dualcore determinism will work in a high majority of titles.
- EFB2Ram Reads - If you need EFB2Ram for something like shadows, that doesn't seem to cause problems, but if the game depends on EFB2Ram reads for anything, the game is not guaranteed to sync.
- EFB Access to CPU - Same as above, if it's for graphical effects like blurring, it hasn't caused problems, but things where game logic depends on the feature may cause desyncs.
- F-Zero GX, Super Monkey Ball 1/2, and potentially others. The change in how Dualcore works causes them to fail.
Because dualcore determinism is much like Single Core, some games that suffer from crashes in dualcore will actually work properly without the risk of crashing or bugs in this feature. Games like Metroid Prime 2/3/Tri (black bar glitch) and Pokemon XD (Crashing) are noticeably more stable without needing to use single core mode.
Deterministic Dualcore is now activated by default whenever Netplay is used. In order to use it outside of netplay, you must manually add "GPUDeterminismMode fake-completion" to a game's ini file.
Fixes issue 6704
Calling All Qt Developers!¶
A few apt users and developers may have already noticed, but Dolphin is beginning the transition to supporting the Qt UI Framework. With this, Dolphins GUI could be greatly modernize with tons of new features. Plus, thanks to the superior framework, adjustments could be made to the GUI without things exploding.
Unfortunately, the devs have limited experience with Qt. That's why we would like some developers experienced with Qt to join the team and help. If you're interested in joining the team, please join our IRC Channel #dolphin-dev on Freenode.
JITIL's Near Death Experience¶
For a while this month, it seemed like another feature was coming to its end. JIT optimizations are great for the emulator, but, a lot of those changes also affect JITIL in adverse ways. Managing two JITs is harder than one, and almost no one even uses the experimental JITIL recompiler. As such... it's been broken on and off (mostly on) for the past few months as all of the JIT changes rolled in. Frustration boiled over, and the devs were prepared to finally cut JITIL from Dolphin after a quick deliberation.
JITIL has been one of those features that could be awesome, but hasn't quite panned out, at least not yet. It was meant to be a faster, more optimized JIT, but Dolphin didn't get to the point where an intermediate language (the IL) was necessary for optimizations. It mostly was useful for the fact that because it was so different, that it often could uncover flaws in both the JIT and even interpreter! It stayed around not because it was fast, but because it was handy for working around bugs.
With Fiora and magumagu fixing so many of the JIT and Interpreter problems, there didn't seem to be much more use for JITIL, and it wasn't running anyway... so it seemed that its time was finally up.
Enter Phire. Despite a busy schedule, he decided he couldn't just let JITIL die like this. He worked day and night to get JITIL back into running order, even if it just meant for some games. After his initial merge, he continued to push, fixing Wii titles, Idle Skipping, and even restoring support for Fastmem!
After all of that, JITIL was alive again, and nearly 50% faster. Yet somehow, within a week it was broken again by comex's BLR merge (it broke a lot of things; there are no less than three maintenance commits for it!) The reason for this? JITIL hasn't supported fastmem for over a year, so comex didn't notice that he needed to adjust the BLR commit (which adjusts fastmem) to take JITIL into account. A quick fix, and finally, JITIL is back in working order. Until someone inevitably breaks it again.
Last Month's Contributors...¶
Special thanks to all of the contributors that incremented Dolphin from 4.0-2826 through to 4.0-3469!