In programming users usually don't see or care about what's going on on the inside all that much. All those boring code optimizations may make things easier for the developers and slowly improve the emulator, but hard-to-quantify changes are not exactly exciting. This month was full of those, with several hundred changes yet very little the general user would find interesting. Nevertheless, in the sea of code improvement, there are some real treasures: big performance improvements, some ancient bugs squashed, regression fixes, and some exciting new features to boot.
This one is a little bit awkward. Last month, a fix was merged to address a 30% performance regression on certain older NVIDIA GPUs due to coherent mapping being enabled needlessly. Turning it off caused no performance hit on newer NVIDIA video cards and sped up the older ones considerably. This seemed like a win-win situation, and everyone shook hands over a job well done.
Unfortunately, no one bothered to test AMD GPUs, because there is really no logical reason why Coherent Mapping would make them go faster, right? It turns out that disabling coherent mapping caused a 30% performance regression to AMD video cards now. Not wanting to add GPU specific code, Sonicadvance1 and degasus took some time to try to figure out a solution to make all video cards happy.
In the end, they determined that AMD's Pinned Memory extension for buffer streaming did not suffer the slowdown, and was even faster than Buffer Storage for AMD graphics cards. This simple change results in up to an 8% speed increase on AMD graphics cards compared to before the performance regression! Because NVIDIA cards do not even have the pinned memory extension, there is no plausible way this could make them slower again.
While the DTK Audio Rewrite fixed quite a few issues, it also brought out a rather annoying one. After the merge, users began noticing very subtle crackling that seemed to affect every game, especially in high frequency sounds. The rewrite made it so that multiple mixers could be enabled, rather than just one before.
It turns out a missed static variable caused resampling issues between the DSP mixer and DVD mixer, resulting in the static that was plaguing games. A quick cleanup fixed the issue and restores crisp audio without losing the benefits of the rewrite.
Fixed issue 7463
Memory Cards are often taken for granted in emulators. Saving just works, and the cards rarely fill up. Even if they do, you don't need to go and buy a new one, you can just make a new file. Unfortunately, in Dolphin, there was one known game that crashed when attempting to save. That game was none other than The Legend of Zelda: The Wind Waker! However, it only affects Japanese version of the title. Despite the issue living in a prominent game, it went unfixed for over four years with no solution in sight.
Enter newcomer TotalNerd, who had been maintaining his own repository for some time. One of the odd features that he had implemented in his repository was a cosmetic change to force memory cards into taking longer when saving. Through a bit of serendipity, it was discovered that the Wind Waker soft-lock no longer happened in his fork!
That would be all the motivation TotalNerd needed to make a more accurate implementation of memory card writing and reading. His cosmetic change turned into a huge leap in GameCube Memory Card write accuracy that fixes saving inaccuracies in several games!
Fun fact: A lot of these write issues in games can be reproduced on console by using a Max Drive (and Pro) memory card. The speed of this third party memory card is so fast, that sometimes games will outright hang because they saved faster than the developers expected possible. Will Dolphin someday emulate the ability to use a brokenly fast memory card? Probably. If you're interested some performance numbers on various memory cards, we compiled a list here. The test will have some overhead, and is not guaranteed to be accurate to reflect actual data written to the memory card, just the transfer rate.
Fixed issue 2284
Some enhancements to the emulator aren't quite so clear as others. For instance, anti-aliasing immediately removes jagged lines and will clearly show up in screenshots. However the advantages of Exclusive Fullscreen are much more subtle; it yields absolutely no difference when comparing screenshots, but this is very much an enhancement along those lines. It brings several benefits that will make playing games much more enjoyable.
- Decreased input latency: By having much more direct control over output, users will get visual feedback faster than before. In fighting games like Super Smash Bros. Melee and racing games like F-Zero GX, the difference is immediately noticeable.
- Less GPU Overhead: Being able to simply display the frame on the monitor is more efficient, and will result in a small performance increase when limited by GPU power.
- Smoother Visuals: Everyone hates when a game is running full speed but it still doesn't look quite so smooth. Exclusive fullscreen allows for Dolphin to bypass the window manager, meaning that frames are much less likely to be dropped when running full speed. Do note that this places full control of vertical sync (vsync) onto Dolphin, allowing for vsync to work better when enabled.
For an example of how Dolphin looks smoother, we can take a look at the seminal classic, Mega Man X; as featured in Mega Man X Collection. The SNES game generates a transparent invulnerability effect by turning the sprite off and on every other frame, unless every frame is rendered perfectly the effect is lost. You can see the difference in the animation below, which has been slowed down to better display the issue.
These subtle improvements may not seem like much alone, but when combined they produce a huge difference in quality for users. Almost every game can benefit from smoother visuals from racing games to adventure epics. With full control of the screen, much like the consoles, Dolphin takes another step toward accuracy rendering games.
Not everything is about accuracy, though. Emulators tend to have some pretty wild enhancements, and Exclusive Fullscreen brings back a big one to Dolphin...
- 3D Monitor Support: Long time users may remember that D3D9 featured a limited form of 3D Vision support. It worked on some computers, didn't on some, and completely broke things for others. One thing that had to be done for most 3D solutions was to give Dolphin exclusive fullscreen. Armada651's exclusive fullscreen support is the 3D Vision Option done without a hack. As such, 3D Monitor support is back and better than ever. And it isn't limited to just 3D Vision anymore, other 3D systems like Tri-def are confirmed to be working.
One final thing to note is that Exclusive Fullscreen is a Windows concept and is only natively supported in the D3D backend. Research is also being done to see if other platforms have similar features that Dolphin can benefit from. Even with just Exclusive Fullscreen in D3D, half a dozen issues have cropped up since the merge and been fixed!
The DTK audio rewrite fixed up a lot of problems in how Dolphin handled streaming audio. Unfortunately, the old code was designed to work, and by removing some of the hacks and assumptions with more accurate code, music that changed tracks and lopped in Tony Hawk's Pro Skater 4, Crazy Taxi and more were completely toast.
For booto this just wasn't good enough, so he delved in the games themselves and studied what they were expecting to figure out what Dolphin is doing wrong. After much struggling, not only were the regressions fixed, but games that never had working streaming audio like Pacman Fever were working like a charm!
Fixed issue 7445
And then on the very eve of the Progress Report, two huge merges hit changing everything.
PPC_FP was a bevy of fixes for CPU emulation that improved compatibility with a ton of games on the JIT. Unfortunately, the non-SSE4.1 codepath used by early Core2Duos and Phenom II CPUs (along with older processors) was bugged by an oversight. The Invalid bit on the x87 floating point unit is sticky, so once a NaN value goes through the code for CPUs without SSE4.1 all future floats are mutilated. It's not exactly a pretty sight.
4.0-2350 - Power PC Flag Optimizations by delroth (Interpreter, JIT-x86_64), magumagu (JITIL-x86_64) and Sonicadvance1 (JIT-ARM)¶
This collaboration between calcmaniac84 and delroth is the big one. It's a difficult change that requires each CPU core to be addressed on its own. Poor Sonicadvance1 decided to drop the JITIL-ARM core rather than implement it in a second JIT.
On the x86-64 side, delroth handled the JIT and magumagu did JITIL, ensuring that none of the CPU cores would be lost on that front. It should be noted that this optimization would have killed the 32-bit JIT completely if it were still in Dolphin. Even if it was implemented in the 32-bit JIT, tons of instructions would need to fallback to interpreter, making 32-bit an even more broken experience versus our 64-bit builds.
For the technical side of things, we go to delroth's commit message
While X86 architectures have a similar concept of flags [to the PowerPC], it is very difficult to access the FLAGS register directly to translate its value to an equivalent PowerPC value. With the current Dolphin implementation, updating a PPC CR register requires CPU branching, which has a few performance issues: it uses space in the BTB, and in the worst case (!GT, !LT, EQ) requires 2 branches not taken.
After some brainstorming on IRC about how this could be improved, calc84maniac figured out a neat trick that makes common CR operations way more efficient to JIT on 64 bit X86 architectures. It relies on emulating each CRn bitfield with a 64 bit register internally, whose value is the result of the operation from which flags are updated, sign extended to 64 bits. Then, checking if a CR bit is set can be done in the following way:
* EQ is set iff LOWER_32_BITS(cr_64b_val) == 0
* GT is set iff (s64)cr_64b_val > 0
* LT is set iff bit 62 of cr_64b_val is set
An Odd Trend¶
We've been seeing a lot of new bug reports and questions on the forums about the games not running the correct speed. As an emulator of a modern console, we're more than used to reports about the emulator not going fast enough, but it's strange to see issues about games running 120 fps, or audio running faster than the game itself.
The removal of framelimit by audio ended up revealing the answer. Clever users were using it in conjunction with the vbeam speedhack, in order to make audio sound better at lower framerates in games that shouldn't be compatible with the hack. Synchronize audio by frame limit wasn't meant for this; it was more a holdover from the days of asynchronous audio, when many games had serious problems on audio that could only be worked around by running the games at odd framerates. When the DTK Rewrite made the last of Dolphin's audio synchronous, the feature was considered nothing more than dead weight.
This was not a recommended use of the vbeam speedhack; locking games into their wrong framerates will cause hangs, crashes and all kinds of other oddities. For instance, the intro of F-Zero GX would often show cars flying off the track in all kinds of strange ways. While some users enjoyed the benefits the trick provided, audio framelimit will not return, and the vbeam speedhack itself is being considered for removal.
There is no ill-will toward the users trying to improve their experience with Dolphin on hardware that fails to maintain fullspeed. Instead, we believe that it shows a potential need in the emulator - ways to improve audio at lower frame-rates without compromising accuracy. Methods to achieve this are being considered by the devs, and we just ask for patience as they figure it out. In the meantime, perhaps try the OpenAL Audio Backend, which has time-stretching built into it.
Alternative use of Dolphin: Tool Assisted Superplays¶
While most users tend to use an emulator to just play games, emulators also serve other purposes within the community. One of the more prominent secondary uses are Tool Assisted Superplays and Speedruns. A lot of the people who play on Dolphin have used tools to enhance their play, such as savestates and slowdown to get around difficult parts of game.
The people at TASVideos take this to the next level in order to create some of the most impressive gameplay videos that go far beyond the realm of human play. Using features like savestates, frame by frame, slow-motion, and input recording, TASers create the ultimate playthroughs of favorite games old and new.
So what happens when one of them gets their hands on a game like Super Smash Bros. Melee? This happens.
Special Thanks to Last Month's Contributors...¶
zhuowei, RachelBryk, lioncash, tilka, Sonicadvance1, degasus, workhorsy, Armada651, delroth, sigmabeta, neobrain, phire, LPFaint99, magcius, booto, TotalNerd, shuffle2, moshekaplan, RisingFog, Shadoxfix, magumagu, Fiora and Parlane for incrementing Dolphin 4.0-1997 through to 4.0-2354!