Even though the Wii's official library is mostly set, both the GameCube and Wii are entering a new golden age as a popular environment for randomizers, full-game mods, incredible cheat codes, and much more. Stalwarts like the Super Smash Bros. Brawl Mod, Project M have been around for years, but now there are many other communities around various games breathing new life into them. You can find codes to help balance games like Mario Party 5, content mods for Kirby Air Ride that add tons of new rides and hundreds of songs, and trackpacks for Mario Kart Wii that add hundreds of custom tracks to the game. Wiimmfi's also provides their own backup Wi-Fi servers for many unmodified games and their Mario Kart mods!
While most of these mods can be enjoyed on a hacked Wii, many users rely on Dolphin in order to play them. Emulating these mods can be quite the challenge, as they often will do things in ways that game developers would not. Assumptions that Dolphin makes can often be broken and certain features that mod developers use can be extremely slow or downright unreasonable to emulate. In the case of Wiimmfi's Mario Kart Fun Packs, the mod creators have put in work over the years to improve their experience in Dolphin and even support emulated users playing alongside console users online... so long as you're willing to dump and use your Wii's NAND. Earlier this month, a slight change to Wiimmfi's online networking broke Dolphin support without affecting real Wii Consoles. Not wanting to leave their emulated users high and dry, they reported the bug to us.
delroth quickly took up the mantle of investigating the bug with assistance from the Wiimmfi team. Within a few hours, the cooperation paid off as the list of probable causes was narrowed down to one annoying feature: The Instruction Cache. Dolphin pretty much has no ability to emulate the GC/Wii CPU data cache and likely never will due to the performance implications, but Dolphin does have some ability to emulate the instruction cache, though it's best to avoid testing the emulator. This is normally not a problem with retail games because it's rather bad form for a game to rely excessively on cache quirks, unless they were intentionally trying to break an emulator. There are occasionally games that inadvertently rely on cache behavior, that's something to tackle on another day. Dolphin's emulation of the instruction cache is normally good enough and almost nothing relies on data cache.
Mods are different; developers are usually working on a blackbox and don't have the same level of familiarity with the hardware. Unless they specifically tested codes on both Dolphin and Wii, there's a chance they wouldn't even know something was broken. There have been many issues reported around mods that, while Dolphin is at fault, we really don't have any recourse for the users afflicted. If a mod doesn't care about running on Dolphin and uses dcache or perhaps another annoying feature, there isn't much we can do but shrug it off.
In the case of Wiimmfi's server, through cooperation from both sides, we were able to find the cache coherency issue and fix it serverside! Users who already have the latest version of the mod don't have to do anything except try to connect. If you're looking for a more detailed explanation of what was going wrong (as it's rather interesting,) you can find delroth's full writeup on the issue tracker.
In order to track down behavior like this in the future, delroth also added game quirk reporting to Dolphin's data collection service, so Dolphin will now automatically let us know what games are instruction cache sensitive in the manner that broke this particular mod. With that, we also have a lot of other exciting changes this month, so now it's time to dive into this month's notable changes!
You may have noticed that the Progress Report was a little late this month. This would be why - merged on the last day of November, MoltenVK support for Dolphin is a huge change that allows the Vulkan backend to run on macOS computers through their Metal API! To put into perspective how big of a deal this is, you have to understand that OpenGL support in macOS is nowhere near as good as other environments. They're stuck on the eight years old OpenGL 4.1 and have no native support for Vulkan. OpenGL 4.1 lacks important features like Buffer Storage, which is incredibly important for performance, and SSBOs, which are used to emulate Bounding Box at a reasonable speed. Without these features, many popular games are near unplayable, including the Paper Mario series.
Those problems and many more make just using Dolphin an uphill battle. You're sacrificing depth accuracy, compatibility with certain games, and up to 35% of your performance the moment you choose macOS for your Dolphin experience. At this point, even Android has more modern OpenGLES and Vulkan drivers with superior feature sets, despite being far less mature. And with Apple finally officially deprecating OpenGL on macOS in 10.14, the writing was on the wall. If we wanted to continue to support macOS, we need to create an alternative. The only API that macOS will officially support in the future is Metal, Apple's new proprietary graphics API exclusive to macOS, so we had no choice but to support Metal in some form or lose all real support of macOS.
For the past half a year, stenzek has been toying with supporting Metal - he was even writing a Metal Backend for Dolphin at one point. Various issues with Metal along with the increasing maintenance cost of yet another backend caused him to eventually abandon that project before it was finished. Instead, he sought out another way of supporting macOS that would come at a lower maintenance cost to Dolphin through MoltenVK!
MoltenVK is a translation layer: it allows software that uses Vulkan to run on top of Metal, so our Vulkan backend can run on MacOS! While that may sound trivial compared to making a backend, it was still a considerable challenge. Dolphin's shaders can be anything but ordinary, and MoltenVK is still under active development, so we ended up as a sort of torture test and ran into a lot of issues at first. A lot of the past few months, MoltenVK support has been mostly done with only a few trailing issues waiting to be fixed upstream. By the end of November, enough of the issues were fixed that many games were playable and stenzek finally merged the longstanding pull request.
How does MoltenVK Compare?¶
Users may be assuming that using Vulkan with all the modern features supported on other operating systems would be a huge win. And they're right, MoltenVK greatly outperforms OpenGL in many of the most taxing situations we could craft for testing. Compared to native Vulkan support, there is some overhead thanks to going through a translation layer. In some cases and older hardware, even macOS's crippled OpenGL support may outperform Vulkan, but for the most part, Vulkan should be the superior option going forward when it comes to performance. On a modern macOS computer, it can be the difference between a game being utterly unplayable and nearly full speed!
Vulkan on macOS is very new, so there are still some limitations and bugs leading to spectacularly broken titles and missing features. MSAA is explicitly not supported and GPU texture decoding will not work. Enabling either of these options may cause Dolphin to crash. Games that rely on LogicOps are currently broken, resulting in very broken rendering. We also know that Super Mario Sunshine has some minor rendering issues. MoltenVK also has improved frame-pacing, though the situation on macOS is still not great. More on that later.
Yet another major feature from stenzek, except, this time everyone gets to reap the rewards. Deferred EFB Copies is an optional mode to help improve performance of games that require Store EFB Copies to RAM. Normally, when a game issues an EFB copy, Dolphin immediately encodes the EFB to a temporary texture, "idles" the GPU, and copies the encoded texture data from the GPU (may be in RAM or VRAM, depending on the driver) to the emulated console's RAM. The main reason that Store EFB Copies to RAM games are so slow is that this "idle" step takes ages. GPUs don't like working through piecemeal, they prefer being fed large batches of work and crunching through it without the CPU impatiently waiting for them to finish.
This isn't that much of a problem when a game only does one or two EFB copies per frame. But we wouldn't be here talking about this if games didn't break the norm. In fact, Super Mario Sunshine can create nearly 70 EFB Copies per second during certain effects. This is why even the strongest of computers tend to lag during the transition to the main map when hitting the Z button. But Super Mario Sunshine isn't alone among offenders, with other games like Xenoblade Chronicles sometimes hitting 30+ copies in a single frame. Unlike Super Mario Sunshine, Xenoblade Chronicles only uses Store EFB Copies to RAM for a few cursory features, and thus isn't forced on by default.
Deferred EFB Copies gives Dolphin the ability to not immediately sync EFB copies with RAM on each copy. Instead, it can look for specific clues happening within the emulated console that tell it when the game needs the CPU and GPU threads synchronized. A full list of sync events can be found within the pull request writeup, but the important thing to remember is that it helps performance by greatly cutting down on the number of stalls per frame.
These numbers paint a very particular picture - Deferred EFB Copies solves a specific bottleneck and can greatly improve performance in the EFB2RAM games that run into it. In general, the performance differences won't be as dramatic, but can still make a rather big difference. For example, The Legend of Zelda: The Wind Waker only uses Store EFB Copies to RAM for the Pictobox. While it may sound silly to force on Store EFB Copies to RAM for such a minor feature, users were losing their savefiles when Dolphin would suddenly crash while taking pictures. One final note - all backends see very noticeable performance boosts in bottlenecked situations, but the Vulkan backend takes advantage of this feature much more efficiently due to our extra control over when the pipeline flushes.
Deferred EFB Copies can be found in the Graphics Hacks section of the GUI right next to Store EFB Copies to Texture Only. It is currently enabled by default, though it won't do anything unless Store EFB Copies to Texture only is disabled by the user or GameINI.
Lightning strikes thrice for stenzek as he brings us yet another major change after he stumbled upon a dangerous memory leak while working on Deferred EFB Copies. This leak has been around for a very long time, lying hidden while frustrating users with limited ram and long play times. The plain truth was that Dolphin's texture cache was leaking memory fast; sometimes enough to crash a game! While there were reports of a serious leak from multiple sources, no one could really track it down. stenzek wasn't even aware of a memory leak when he found it, he simply looked at the code and wondered if it was a problem. He fixed it and the test build was verified to stop the memory leak.
Players of various Virtual Console games may be hoping that this fixes a rather nasty memory behavior for those games, but that appears to be a separate issue. After fixing the above mentioned memory leak, a supposed second leak was brought to stenzek's attention. In afflicts titles like Super Smash Bros. (VC), Mario Golf (VC), and Mario Tennis (VC). It turns out this isn't exactly a leak - Dolphin does eventually invalidate the EFB copies ballooning up in VRAM, it's just not able to do so fast enough. Because EFB copies take up more space at higher resolutions, these games can outright flood the VRAM and crash Dolphin. While we've managed to figure out why this behavior happens, we don't have an effective way to fix it quite yet. For now, stick to 1x Internal Resolution when using these games.
Note: It is normal for Dolphin's memory footprint to slowly increase during emulation due to drivers caching shaders and Dolphin's various caches.
Obligatory Netplay Section¶
Month after month, the fixes and new features being added to netplay just keep piling up. While this is a testament to the hard work of many developers to whip netplay into shape, it also goes to show just how many problems were infesting the neglected feature. This month is particularly special as Emulated Wii Remote netplay may finally be nearing actual usability thanks to a few more major changes added. While Dolphin has technically supported Emulated Wii Remote Netplay for years, it takes very intricate knowledge of how the emulator works, incredible patience, and a little bit of luck to actually consistently set it up. Even developers are routinely left scratching their heads trying to figure out why a Wii Remote netplay session isn't working.
The fixes this month combine to maybe, kinda, hopefully make Emulated Wii Remote sort of usable without being a master Dolphin technician. If you still run into problems... that's really to be expected. Without further delay, here are the major netplay changes that hit this month.
You're probably thinking, "Didn't they just say only important changes would be noted? Is this blog some kind of joke?" While syncing the power button may sound like something completely silly, it's actually incredibly important for Wii netplay. Dolphin has a feature called Safe Shutdown, which more or less executes the Wii's shutdown behavior rather than simply pulling the plug. Safe Shutdown is used in order to protect users savedata from not being flushed, usually in various VC games that only push the savedata to NAND when a game is closed.
The important thing to note is that, as part of Safe Shutdown, Wii Remotes are disconnected by the emulated console. So, let's say the host of a netplay session is playing Super Mario Galaxy and won some kind of bet to make their friend play as the second pointer for several hours. At the end of this, the host stops netplay and Safe Shutdown begins... disconnecting the Wii Remotes. Because the act of shutting down isn't synced, the Wii Remotes remain connected on the client's side longer than they do on the host's side. With the client perpetually waiting for inputs, Dolphin deadlocks and must be force closed by the users via their operating system's preferred method.
Well, that's annoying, right? But Dolphin adds salt to the wound - netplay only flushes Wii savefiles to a permanent NAND at the end of the session! Because netplay never technically completed, the savefile is left in limbo. An experienced user that knows of this behavior can properly back it up before starting Dolphin again, but, if you don't, the save will be lost forever!
In order to allow Wii Remote Netplay to sometimes stop properly, the power button is now synced on Wii Netplay. If something does go wrong during shutdown or there is a desync, Wii Netplay can still hang, meaning your savefiles could still be at risk. Thankfully, just before the turn of the month, Techjar also fixed another bug with stopping netplay that allows Dolphin to safely end the session from this state. Previously, Dolphin would be deadlocked with no respite, but now, the "stop" signal works, allowing the netplay session to safely end even when things aren't quite perfect.
Note: GameCube savefiles are flushed on demand. Wii savefiles are flushed at the end of the session.
UnclePunch's first major contribution to Dolphin packs a rather big punch, but if you know who he is, it shouldn't come as much of a surprise that it's for netplay. UnclePunch is rather well known in the modding community for a slew of interesting mods and cheatcodes for various games. With many of his codes and mods designed for popular netplay titles such as Super Smash Bros. Melee and Kirby Air Ride, it's only fitting that his first major change makes using those codes on netplay much easier.
Dolphin's netplay works by completely synchronizing two instances of Dolphin at boot and keeping them in lockstep with identical inputs. In a perfect world, Dolphin will be deterministic and the players will stay synchronized throughout the netplay session so long as the same inputs are used. Unfortunately, there are many settings, bugs, and features that can make setting up netplay rather daunting for users unaware of how it works. UnclePunch eliminates one of the biggest offenders of causing issues for unknowing users. By allowing Netplay to sync cheat codes (both Action Replay and Gecko) from the host, users no longer need to worry about anything! When the new Sync Codes option is enabled, Dolphin will automatically handle sending the cheat codes across netplay and making sure the correct ones are enabled! A once complicated task is now as simple as checking a box!
Why isn't this feature just always on? Well, believe it or not, there are cases where you wouldn't want to sync codes on netplay... and some of those codes were actually written by UnclePunch! There are certain codes known considered Netplay Safe, meaning that players can have them on with minimal risk of having a true desync. These include codes to disable music in Super Smash Bros. Melee and the spectacular every player gets their own viewport Kirby Air Ride code. While these codes may cause desyncs with extended use, sometimes the benefits outweigh the risks.
Super Smash Bros. Brawl hacks, such as Super Smash Bros. Legacy TE tend to use Gecko OS as a way of loading the game. Unfortunately, Dolphin's Wii Save Sync would only sync the save for the initially loaded title. Considering that Gecko OS doesn't have a savefile, nothing would be synced and you'd be running without a savefile.
There were some hacks that could be used to remedy this situation, but Techjar decided to future proof Dolphin by adding the option to sync all Wii saves. Because this new feature can sync quite a bit of data (A Wii NAND can be up to 512MB), he also added a progress bar to let users know how far along the transfer is going.
Dev Diary - Putting a Mac through its Paces by MayImilae¶
So as we were buttoning down the Progress Report, a little surprise came our way when MoltenVK was merged. I just so happen to have a Mac as my daily driver, and well, kind of no one else does, so it fell to me to do the performance testing. "This is old hat for me!" I thought! JMC has supplanted me for most performance testing just from his amazing tenacity for it, but I used to be a main performance tester a few years ago, and I did a lot of testing for Ubershaders and such, so I knew just what to do! macOS had other things in mind though.
During my initial performance testing, I noticed that the FPS counter in Dolphin was fluctuating wildly. Like, jumping between 70 and 150fps! It was fluctuating so much that I couldn't get a decent read on the performance, preventing me from getting any proper performance numbers. So I turned to my dusty tester knowledge and dumped frame-time statistics from Dolphin to find out what was going on, and... Oh. I immediately knew that the Progress Report was going to be late.
Bad frame-pacing. REALLY bad frame-pacing! I could write a big long piece of what frame-pacing is, but I'll just give a very quick summary and redirect you to Digital Foundry's excellent article on the subject. Basically, frame-pacing is the consistancy of which frames are sent to the screen. If it isn't consistently meeting the 16.66ms frame-time required for 60fps, the GPU will just resend the old frame. The game may then catch up and send that frame and the next one in the next 16ms slice, but the GPU will only send the latest of them to the screen, so the user is missing out on rendered frames. Bad frame-pacing also just feels really bad, with a jumpy, jittery feeling as the game shows more or less frames randomly throughout a second, and it's really unpleasant to play. Also, seriously, just read Digital Foundry's article on the subject, they did an amazing job demonstrating this.
So with the Progress Report definitely going to be late, I set off to sort out this mess and try to get some reliable data. I set up a spreadsheet to keep track of statistics and started to keep track of all of the variables in the test. Here is the statistics of a test like the graph above, just to give you an idea of just how bad the situation was.
- Average Frame-time - 12.74623636
- Median Frame-time - 8.478
- Standard Deviation - 12.45519544
The standard deviation is higher than the median. This data is garbage. The frame-times are so bad that I couldn't use this performance data!
So I also did comparisons on my Windows desktop, just to be sure it was a macOS problem and not a (obvious) Dolphin problem. Here is that data, and how it compares to macOS.
Over the course of several days and LOTS of tests, I experimented with it and tried all kinds of settings, and I found that, in OpenGL at least, it worsens with higher internal resolutions. At 1x native it's still there, and still bad, but it was good enough to get performance data out of at least. But at 4x native it's the messy rubbish you saw in the graphs. Also if something else in Dolphin is using the GPU at the same time, like bounding box or EFB2RAM, then 4x native would give more consistent frametimes to the point where I could get useable data. Weird. But this is why the only 4x native performance results in the MoltenVK section above are those scenarios, everything else I tried at 4x native just gave nonsense.
Now I was certainly hoping that MoltenVK would just not have this problem and we'd just trumpet how amazing Vulkan is to everyone as a point toward using it. Buuut, MoltenVK has it too. It's better though! At 4x native, Vulkan on macOS has frametimes comparible to OpenGL at 1x native. Though interestingly, Vulkan doesn't really improve with lowering the internal resolution, but it's just at the best result OpenGL can give, but all the time.
So I'd love to end this with me and Stenzek coming up with some AH-HA! moment where we isolated what's going on, but um, that didn't happen. We still don't really know why frametimes are so bad on macOS. We have some guesses though, like, this is prooobably Dolphin fighting with macOS's compositor. In Dolphin on macOS, exclusive fullscreen is not a thing, and the vsync toggle in the graphics UI doesn't actually do anything! So we don't really have any good ways that we know of to work around the compositor and try that theory out. I'm certainly not a macOS developer, or really a developer of any kind really (I'm an artist), and Stenzek is a great graphics dev, but isn't a macOS dev. So we don't really have the resources or expertise to figure this out.
Well, if there happens to be someone who knows about macOS frame-pacing and the compositor reading this, we'd love to hear from you! Anyone can contact us with our twitter and IRC. We could really use some help in sorting out this issue!
For our macOS users out there, stick to MoltenVK (at least once the bugs are fixed) to minimize whatever this is. ...or just boot to Windows or Linux. That's really your best bet!