The Progress Report has come and with it some major changes and decisions. However, before we get into new things, we need to go over an ongoing change as we've seen some users struggling. In the last progress report, we updated our project solutions to Microsoft Visual Studio C++ 2019. We thought there would be no issues at the time, after all, Microsoft says that VS2019 runtimes are forward and back compatible with VS2015 and VS2017, however, it turns out that is not always the case, and we definitely encountered one of the incompatible scenarios. Over the past two months, we've seen many reports of users encountering "VCRUNTIME140_1.dll was not found" errors and not knowing what to do. So just as a reminder, if you encounter MSVC or VCRUNTIME errors, install the latest x64 Microsoft Visual Studio runtimes from Microsoft's website (direct link). Even on updated versions of Windows, you may be missing the latest runtime as these runtimes are not distributed through Windows update for whatever reason. We hope this clears up any problems users were having regarding these issues.
With that, we've got a lot to get through from the past two months. From unintentionally stumbling into an Achilles's Heel of the Zen CPU architecture and tanking performance to supporting a brand new environment with Windows on ARM support, we're going to run the gamut of big features, decisions, and fixes. So without further ado, let's get to it.
The past couple of years have been a bit of an experiment. With the advancements with emulating the GameCube/Wii's Memory Management Unit (MMU) in both efficiency and accuracy, we decided to indefinitely enable it full-time. In testing, the performance impact of having it enabled was inconsequential, and it had the benefit of allowing Dolphin to emulate strange game bugs and crashes more accurately. The biggest benefit, however, was that doing things that would cause the game to read out-of-bounds memory would no longer crash Dolphin, and instead be handled by the game's crash handler.
Unfortunately, while overall performance wasn't affected, there was a cost to enabling the MMU in certain titles. In games like Metal Gear Solid: The Twin Snakes and True Crime: New York City, because of how they load from ARAM (Auxiliary RAM in the DSP that games can use), having full MMU emulation enabled would greatly bloat generated code, which then would fill Dolphin's JIT cache. Once Dolphin's JIT cache overflowed, Dolphin would clear it, during which the game would stutter. An example of this in Metal Gear Solid: The Twin Snakes, was that every time an enemy spotted you in a new room, it would cause a large stutter. True Crime: New York City crashes in the intro in Dolphin, but if you use a savefile to get past that, it is by far the worst game ever with overflowing the cache. Large, unavoidable stutters would plague the game every few seconds of driving! With MMU emulation disabled, Metal Gear Solid: The Twin Snakes stutters disappear and True Crime: New York City only stutters once in a while from cache clears.
This brought us to a rather difficult decision. MMU emulation is undoubtedly more accurate, and we had no measurable slowdown at a glance, we learned over the years that there were titles negatively affected. The good news is that the impact is limited to titles that take advantage of Dolphin's MMU Speedhack when MMU emulation is disabled. The MMU Speedhack essentially bypasses the need for MMU emulation in cases where we know which regions the game maps the ARAM. Rather than translating everything, we just outright tell the game it is valid memory and it works without any further complications. All of these translations ended up causing a much larger code impact than we anticipated, resulting in these unanticipated flushes and stutters.
In order to fix this issue, Dolphin's JIT (Just-In-Time Compiler) would need a serious revamp and modern solutions. However, retrofitting said modern solutions into Dolphin would be a fairly monumental task. The main way to fix this is to simply use the MMU speedhack, which technically can be enabled along with full MMU emulation by simply setting up the extra BATs that make up the MMU speedhack. However, we chose not to do that for two reasons. One: this would be a new configuration that hasn't seen much testing in the past. While we don't believe it would cause any issues, we can't be entirely sure. Secondly, by disabling it by default on the x86-64 JIT, we now have the default settings that matter for savestates the same between Dolphin on ARM devices. Unlike Dolphin's x86-64 JIT, Dolphin's AArch64 (ARM 64-bit) JIT lacks optimizations that makes it so MMU emulation is free to enable, meaning on those platforms we did not enable it by default. As a consequence of disabling it on x86-64 platforms, savestates should be compatible without users having to fiddle around with settings.
Weighing those options, we've decided to disable MMU emulation by default while adding it to the GUI so that users wishing for accuracy can easily enable it. Dolphin will continue to enable MMU emulation automatically for titles that absolutely require it via GameINIs regardless. The new MMU emulation setting can be found in the Configuration > Advanced Tab.
5.0-11409 - Platform support for Windows-on-ARM64 and 5.0-11455 - DolphinQt: Support compiling on ARM64 by stenzek¶
We have a Windows build with our x86-64 JIT, and we have an Android build with our AArch64 (ARM 64-bit) JIT. So you might be thinking, Windows 10 is now available for ARM systems with the cleverly named "Windows 10 on ARM", so to support that all we'd have to do is simply take the Windows build and swap in our AArch64 JIT. Trivial, right? Yes, actually, it was very easy. Sorry to disappoint, but there will be no essay explaining nightmarish exceptions this time! However, we're still supporting a new platform, and there is a story here, so let's talk about it for a bit.
Windows 10 on ARM hasn't really been very interesting up until now: the platform was problematic and ran on hardware that we already knew well. For example, the best Windows on ARM machine until late 2019 was the Lenovo Yoga C630, a SnapDragon 850 powered device. Sure the device has good hardware, with great build quality and very long battery life, but the Snapdragon 850 is a mobile SoC, so the Yoga C630 comes with mobile drivers. They have improved over the years, but they are still the worst drivers we have to deal with and we still work around mobile driver bugs all the time. Windows doesn't alleviate these problems and actually makes them a little worse: Qualcomm is using Windows Update exclusively to distribute drivers so you can forget about using specific driver versions. And of course, most typical Windows software is not natively ported to ARM and instead runs through Microsoft's x86-32 (not x86-64, 32-bit only) emulation, destroying the C630's great battery life and what little performance it had. Which is the biggest issue with the C630: the SnapDragon 850 is just terribly slow compared to conventional Intel laptop CPUs. Plus we've seen and used the Snapdragon 850 many times before so we know how it performs, and we've already ran that SoC in Linux so "it's not Android" isn't a factor. Frankly, we had no reason to even look at the C630, and it was the best Windows 10 on ARM device. With boring hardware, bad drivers, and a troublesome ecosystem, we ignored the platform all together.
That changed with the arrival of the 8cx, a laptop-grade ARM SoC from Qualcomm. With CPU performance that compares well with Intel laptop CPUs, and GPU performance that greatly surpasses Intel iGPUs, it got our attention. And the first device to launch with it (though technically a lightly customized variant), the Microsoft Surface Pro X, is Windows on ARM exclusive device. It is locked down tight; months of reverse engineering have only barely begun to see progress. Eventually the Linux community will crack the Pro X, but in the mean time, this unique new class of ARM SoC was out in the wild, teasing us behind its Windows on ARM armor of shame. Despite its terrible drivers and ecosystem and all its other problems, we couldn't resist, we were just so curious about how Dolphin would perform on the 8cx! And so, with the nudge of a former Dolphin dev who was playing with the Pro X, stenzek decided to look into supporting Windows 10 on ARM.
It turned out to be quite easy. Swapping the AArch64 JIT into our Windows build was trivial. While there were some small fixes required to build for Windows ARM64, for the most part our code base was already compatible. Making our Qt GUI compile for Windows on ARM was a bit more hassle, but once stenzek figured out the correct environment script for Visual Studio and the right Qt configuration flags, it was pretty easy. There are some video driver bugs we haven't worked around yet, but it works, and it wasn't difficult at all. Our motivations might not have exactly been pure, but we now support Windows 10 on ARM!
However, there aren't any Windows on ARM builds present on our downloads page, and we aren't even making builds with our buildbot. There just aren't enough users for us to justify the space needed to store dev build after dev build. If you have a Windows on ARM device and want to try this build, follow our Windows build instructions, but set the platform to ARM64 before you build.
Shortly after the merger of Emulated Wii Remote Motion Passthrough, it was expected that Nunchuk Motion Passthrough would follow. After all, it would be natural to allow users to map a second accelerometer so that they could control both the Wii Remote and Nunchuk, right? Well, things got delayed, and that's partially because of how Dolphin's GUI handles controllers. Unlike Motion Passthrough for Wii Remotes, which makes sense to always appear, Nunchuks are one of many attachments that could be an extension. And because Nunchuks are the only extension with motion input, it'd be weird for the Motion Input tab to be available when configuring other extensions. So in order to simplify things for people configuring Motion Passthrough, the Nunchuk Motion Passthrough page only appears when the Nunchuk is the selected extension.
With that sorted, users can now pass through their controller's motion sensors data into their emulated Nunchuk as well as their Emulated Wii Remote! Multiple devices are supported for Motion Input, so you can even dual-wield Dual Shock 4's for a fun Wii Sports boxing experience.
Sometimes you don't notice a problem until it's fixed, and sometimes that's made worse when different hardware configurations mean that we can't see what many of our users see. There has been a longstanding complaint from a vocal portion of the userbase that Dolphin had extremely poor frame pacing with 30 FPS games, while 60 FPS games were fine. This wasn't huge cause for concern: 30 FPS games are not going to be as smooth as a 60 FPS games, and frame pacing can greatly vary by hardware and other factors. However, after some investigation, it was revealed that 30FPS titles have very bad Presentation Frame Pacing in Dolphin.
We've talked about frame pacing before, though that was specifically about frame times. With vysnc off and our speed limiter disabled, that analysis was about catching stutters and rendering issues without caring about what happens on the display. Presentation Frame Pacing is all about that, specifically the pacing of frames as they appear on the screen after Dolphin renders it and the video driver ships it and everything else. On a 60hz display, the screen only refreshes every 16.66ms, so we need to make sure a frame is ready for the display every 16.66ms. If a frame is too late, even by a millisecond, the driver will send that frame in the next 16.66ms slice, or just not send it at all. 30FPS games on a 60hz panel complicate this a bit. Ideally, we'll be sending a frame every 33.33ms, or every other refresh of the panel, and the driver will fill in the blank. However, if a frame is a little late, i.e. 34ms, then the 33.33ms window will be missed and that frame will be sent on the next 16.66ms slice, for a total of 49.99ms. Then to maintain sync and display all the rendered frames, the next frame will be pushed out early in the next 16.66ms slice. This results in frame pacing with a mixture of 33ms, 49ms, and 16ms frames, and it feels bad, far far worse than the framerate would suggest. And this is exactly what we observed when we looked into 30FPS games in Dolphin.
It was worth investigation, and once Stenzek was able to alleviate the issues for these users in a test build, it became a much higher priority to fix. His first attempt at fixing this was to manually have Dolphin change the monitor's framerate to the exact framerate the game was running. This meant 59.94/29.97 Hz for NTSC games and 50/25 for PAL games that didn't support PAL60. For monitors that supported every framerate, this improved framerate fidelity significantly, especially in PAL games. However, there were drawbacks: Dolphin was rather unstable when changing the monitor's refresh rate, it was only working in Windows, and many monitors simply didn't support the framerates in the first place. While we would someday like to use that solution, Stenzek came up with a short-term solution that fixes some of these problems with none of the drawbacks.
As a proof of concept, Stenzek used his Playstation 1 emulator, Duckstation to test a frame pacing solution that involved padding 30 FPS games. Because it had a simpler rendering pipeline, he was able to more quickly test and verify that padding improved fidelity. Once it was confirmed to work, he brought the feature over to Dolphin and had it tested by users suffering from poor framepacing. The results carried over nicely and were verified by Dolphin's own frametime output.
So, to our many users suffering from framepacing issues in 30 FPS games, there's a new option in the hacks menu called "Skip Presenting Duplicate Frames," which, when enabled is the old behavior. When it is unchecked and V-Sync enabled, the framepacing issues in 30 FPS games are significantly improved. Do note that this feature is unnecessary when Dolphin's Immediately Present XFB Copies is enabled as Dolphin already displays every XFB copy with that setting. Given that Immediately Present XFB Copies is a hack and does not work in many games, this solution is still very important as it gives us a correct way to achieve smoother output across a wider variety of titles.
The one downside to this solution over the original solution that was being worked on is that this does not help PAL region games that run at 25 FPS or 50 FPS running on a 60 Hz monitor. Outside of having a monitor that supports 50 Hz, there isn't much you can do with conventional monitors. However, if you have a quality variable refresh rate display and use exclusive fullscreen, you should be able to run any game Dolphin can handle without worrying about frame pacing issues. In fact, a quality VRR display doesn't actually need any of the stuff implemented here for a smooth experience as they have their own frame-duplication, anti-tearing, and anti-ghosting features.
Improving Dolphin as a Wii emulator has been a very long road with many hiccups and milestones. At this point, a lot of the biggest flaws and hacks that Dolphin once relied on for handling the Wii NAND are finally gone. Replacing them is tested and verified behaviors in this batch of changes from Leoetlino. The biggest benefit of rewriting, reimplementing, and unit-testing all of these functions is that Dolphin should be far less likely to do something that would corrupt your NAND or Virtual SD card while allowing games that rely on more exact emulation to work correctly.
While cleaning up something that works for 99% of Wii software may not seem all that exciting, these changes do directly improve the compatibility of two titles and make another one much easier to maintain. The biggest one is the Wii game Disney's Bolt, which requires the file system module to report the list of files in order of creation. It expects the most recently created save file to be the first item in the list, and that assumption will always hold true on a Wii. However, on Dolphin, this was not only not guaranteed, but outright impossible under normal circumstances! And when the game pulls something that isn't the most recent save file, it will fail to read your save and sometimes even softlock!
While this seems like an innocuous side effect of programming for set hardware, it must be remembered that this game is from our good friends at Avalanche Software, the people that famously made the entire Disney Trio of Destruction. Those games employed anti-emulation techniques specifically designed to prevent them from running in Dolphin, and we only recently were able to get them working. This game was long before that, and it was released only a few months after Dolphin went open source. We have ample proof that Avalanche Software was already upset about homebrew and emulation thanks to a crude message found hidden in the data of 2007's Meet the Robinsons, however it's very unlikely that this is anti-emulation behavior.
And our proof to this is that Nintendo themselves relied on this same behavior in older versions of the System Menu. In afflicted versions, the System Menu would ask IOS (the ES module specifically) for a list of installed titles, including games, system software, and channels. The ES module would then figure out what titles are installed the same method that Bolt uses, by simply getting a full list of folders that are stored on the file system. On a real Wii, the system menu is one of the first titles to be installed and never the most recent one, so it cannot appear as the first item on the list on console. However, on Dolphin users are free to install things in whatever order they want. If the System Menu appeared at the wrong time, it could softlock much like Bolt. The reason more users didn't run into this behavior over the years is that Dolphin developers implemented an undocumented hack to work around this issue... and this hack contradicted the behavior that Bolt required!
Leoetlino worked around this issue by storing the correct order of files and various metadata in a FST file. This file keeps track of creation order and other file metadata in order to prevent issues with the metadata missing or being incorrect. This allows Bolt to work, along with the older system menus and even unexpectedly fixes the Wii Photo Channel, which Leoetlino admitted to not even targeting to fix.
As for games that actually care about more specific metadata, that's limited to our friend Dragon Quest X. These changes should make maintaining a NAND compatible with the game much easier. Back when we were testing this and recording videos of Dolphin connecting to its server before Wii support was cut off, it was very common for doing anything else to cause the game to stop working. Every time we wanted to setup things, we started from essentially a fresh NAND to reduce issues, and even then it was sometimes problematic. Thanks to these changes, anyone who wants to play Dragon Quest X without serious issues should be able to without needing too much in the way of extra setup. Just make sure you have a NAND of the proper region and all of the required IOSes installed.
On one final note, Leoetlino also provided extensive unit tests for file system emulation to help prevent regressions if further changes are needed in the future.
There are numerous small optimizations that don't make it into the Progress Report. We are very grateful for them; each one may be small, but they snowball into genuine improvements into how Dolphin runs. This month, optimization pro MerryMage optimized our Double2Single floating point conversions. It's now cleaner, more efficient, and uses newer and faster instructions. It's a very small optimization, so we can't graph it or proclaim how it makes a certain game better in certain results, but Dolphin is indeed better because of it. Unfortunately, something unforeseen happened that made this little optimization worthy of the Progress Report. After this optimization, issue reports from AMD Zen processor owners started streaming in about how very specific situations in games were much slower.
As part of the optimization, Dolphin now uses the instruction PEXT if CPU support is detected. This fairly new instruction is a bit manipulation instruction that allows Dolphin to more efficiently move bits around during the Double2Single conversion. However, unknown to any of us, it turns out that PEXT is extremely slow on AMD Zen and Zen 2 architectures. People have recorded these instructions taking up to 289 cycles, versus just one cycle on other CPU architectures. The reason these instructions are so slow is that they are not directly implemented on the CPU itself, but were instead implemented in microcode. Microcode acts as a translator, using many CPU cycles to convert non-native instructions into low level μops that the CPU can run. That is effectively emulation of those instructions. No wonder it is so slow!
Yet despite terrible performance, Zen still supports PEXT, so Dolphin detected it and used it for Double2Single. Any time a game performed Double2Single floating point conversions, performance would plummet. Ouch.
To rectify this, we now have an explicit exception so that PEXT will not be used on Zen. While we were at it, we also added an exception for the related PDEP instruction that we use in our vertex loader. We have had no reports of slowdown with this, but PDEP is also microcoded on Zen and also performs badly, so it seemed prudent. Please note that other CPU architectures are not affected by this change; to our knowledge all other architectures either support PEXT directly or not at all.