The Summer tends to consistently be one of the busiest times for Dolphin's development. While sometimes the question is what do we put into the Progress Report, during the summer months it's usually how much can we fit into the Progress Report? This summer's congestion was then compounded by us blog staff having a few things we've been planning coming into fruition. Still, the show must go on, and we're here... albeit a bit delayed.
As such, we've got a huge smattering of changes to go over and many smaller ones that we couldn't quite fit in. macOS users in general will be able to rejoice with the addition of a brand new Metal backend brought to us by veteran developer TellowKrinkle. They also brought their graphics expertise to improve things for everyone, greatly reducing the remaining causes of shader based delays/stuttering when using Ubershaders. If you're looking for an easier way to setup a wide variety of controllers, a new SDL2 controller backend has been added for all OSes, and even brings native motion control support without the use of a DSU server to non-Linux operating systems. We also have a wide variety of emulation fixes, more graphics mods added, and the long awaited SD card "folder" feature!
All of that and it's our job to write about it. We've got our work cut out for us.
5.0-16965 - macOS: Add Metal Backend by TellowKrinkle and 5.0-17206 - MoltenVK: Update to v1.1.11 by OatmealDome¶
Early on in Metal's life, Stenzek experimented with adding a Metal Graphics Backend to Dolphin. Unfortunately, due to Dolphin's quirks and unfamiliarity with the Metal API, the effort stalled out. However, in the midst of that struggle an alternative arrived - MoltenVK. As a translation layer between Vulkan and Metal, MoltenVK allowed us to support Metal through our Vulkan graphics backend, with little effort on our end. This proved to be a very successful compromise, and it may very well have saved our macOS support.
But, in theory, a native Metal backend would be faster than MoltenVK. Enter TellowKrinkle the architect of PCSX2's macOS port and Metal Graphics Backend. Having conquered Metal on PCSX2, they turned their attention to Dolphin and decided to try their hand at a native Metal backend. The results were immediately apparent.
Our macOS users were extremely excited by these improvements! In GPU heavy titles, such as the GPU focused Skyloft and Elyia tests, the difference was enormous! However, something felt off. Our native Metal backend was performing too much better. Based on our prior experience and general industry knowledge, MoltenVK is an excellent translation layer - a native Metal implementation should only be a little faster. But the new Metal backend was functioning correctly and we weren't going to argue with improved performance, so we cautiously continued forward. When we ran our new Metal graphics backend and MoltenVK through the tests above for this very Progress Report, it finally hit us. The MoltenVK results we were getting, as recorded in the graph above, were far lower than the previous times we've shown these tests. It varied substantially depending on the test (hence the difficulty isolating this), but in some scenarios MoltenVK was performing up to 45% worse than before! Something had gone wrong.
Now with some solid reproducible cases, we went to work bisecting to isolate this nasty performance regression. We traced it to 5.0-15474 - a MoltenVK version update. This was a regression in MoltenVK itself.
Normally the story would end here. We are a tiny volunteer team, we cannot be expected to debug another project! However TellowKrinkle went above and beyond and, using their Metal knowledge, worked with our testers to isolate the regression in MoltenVK. The end of their bisecting was KhronosGroup/MoltenVK@4371ef4, a change that contained a number of optimizations for Vulkan Descriptor Pools.
In Vulkan, descriptor sets are used to tell the graphics driver what resources (textures, buffers, etc) to make available for shaders to use. However, a program cannot create descriptor sets. Instead, the program needs to create a descriptor pool with a specified maximum number of sets, and then the pool itself allocates the descriptor sets.
Most programs implement this with a "growth strategy" where they ask for a small pool of a few thousand sets, then enlarge the pool as required. Dolphin however takes the simple route, and just asks for a pool of 100,000 descriptor sets. Though inefficient, by asking for a pool far larger than anything it can ever possibly fill, Dolphin never has to worry about managing the pool.
Once Dolphin allocates the tiny fraction of the pool's descriptor sets that it needs, it uses
vkResetDescriptorPool (link) to give the used sets back to the pool for the next draw. Importantly, Vulkan resets only the used sets, leaving the rest of the 100,000 sets untouched - Dolphin's approach only makes sense because of this detail.
And that is exactly where MoltenVK's regression occurred. In the optimizations for Vulkan Descriptor Pools, MoltenVK accidentally made
vkResetDescriptorPool reset ALL descriptor sets in a pool, regardless of what was actually used. Most programs wouldn't have any problem with this as they use as small of pools as possible, but Dolphin's 100k pool smashed into this issue.
vkResetDescriptorSets was now taking up a significant amount of CPU time, causing many types of GPU heavy titles to become CPU limited within MoltenVK!
Once we had all of this figured out, TellowKrinkle forwarded our results to MoltenVK. There, the author of the original optimization that caused this regression, billhollings, swooped in. They improved the implementation so it only reset descriptor sets that were actually used, and made a PR within a couple days of our report. Once that was merged into MoltenVK, we quickly tested it to confirm it fixed our issue and OatmealDome made a pull request to update Dolphin to the latest version of MoltenVK. The regression was resolved!
We'll save the full graph for later and focus exclusively on the regression fix for the moment. How much faster is latest master (as of this writing) compared to the version we showed earlier?
While we are focusing on MoltenVK right now, Metal also got a little faster on Apple silicon thanks to an unrelated AArch64 JIT optimization that was merged between our sample versions. Our data is a little messy because developers just keep making Dolphin better, how dare they.
In MoltenVK, the performance regression fix (plus that JITARM optimization) lead to noticeable performance gains across the board. But for the the GPU heavy titles that were strongly affected by the regression, such as Metroid Prime 3, the performance improvement is enormous! It took Prime 3 from completely unplayable to playable with occasional slowdown. All of the performance that MoltenVK had in older builds was back!
This regression fix also improves MoltenVK's performance in Bounding Box, especially on Intel graphics. However this section is already ginormous so rather than posting yet another graph, just click here if you want to see the chart.
Now with that nasty regression sorted, we can finally return to Metal. How does our shiny new Metal backend compare to MoltenVK now that it's a fair fight?
The theory holds true - our own native Metal graphics backend is faster than translating our Vulkan backend to Metal via MoltenVK. Most notable here is Skyward Sword and Rogue Leader, where Metal has a considerable performance benefit of +23% and +30%. Metal even takes Hoth to a reliably full speed experience for the first time ever on an Apple laptop! That being said, now that it is no longer held back by a performance regression, MoltenVK puts up a hell of a fight.
After reading all of this you might be thinking, "now that someone has made a native Metal backend, you're going to remove MoltenVK right?" Absolutely not, MoltenVK is here to stay. The MoltenVK performance regression was a huge reminder for us on why we have many graphics backends in the first place.
During the regression, many users came to us and complained that it was slower than before. Some even correctly bisected the regression in Dolphin! However, due to the complexity of the issue, no one was able to clearly give us a reproducible case. Without anything to compare it against, we simply looked at MoltenVK's performance in whatever we wanted to try, found it fine, and moved on. However, once we had the Metal backend and compared it to MoltenVK, we immediately noticed the performance disparity was beyond our expectations and began searching for the problem. The Metal backend is why we noticed the performance regression in MoltenVK! Relying on only one graphics backend for macOS (OpenGL doesn't count anymore) let this regression slip under our noses.
Having other graphics backends to compare against is essential for the maintenance of our graphics backends, and this is especially important on macOS where we have so few. The new Metal graphics backend is our own code, so any issues or regressions with it are on us to solve. We will be relying on MoltenVK to be the benchmark that we compare our native Metal backend against going forward. And it should perform that role excellently, as it is bringing our well tested Vulkan backend to macOS through a very well supported translation layer from a team that has earned our trust.
So while our new native Metal backend is faster, MoltenVK is here to stay. Together, they will help us deliver the most consistently reliable and performant experience that we can give to our macOS users.
Note: Due to changed (much improved) methodology, the Fountain of Dreams and New Pork tests are not directly comparable to prior results. All other tests maintain the same methodology.
5.0-16930 - Reduce Pipeline Compilation Stutter through Forcing FBFetch in Ubershaders by TellowKrinkle¶
Ubershaders was a revolutionary moment in GameCube/Wii emulation. It presented a hope, a solution against the problem that TEV changes were instant while generating/compiling new shaders on modern GPUs was not. Under certain circumstances, Ubershaders can completely remove all Shader Compilation Stuttering... but for some users, their limitations meant that they couldn't get a smooth experience.
On top of being demanding, newer graphics APIs have thrown a curveball at Ubershaders. Namely, they have pipelines to optimize rendering, and these pipelines will analyze what is being rendered and only use what is necessary to simplify shaders and increase performance. Because different GPUs and drivers have hardware and software support for different things, the pipeline compilation stuttering can vary depending on the driver. D3D11, through a mixture of not having pipelines + good driver support, was the only backend that completely had Shader Stuttering removed on certain graphics cards... until now.
TellowKrinkle showed up with the Metal backend and a deep dive in to how Dolphin was generating shaders. They identified some of the weaknesses in what Dolphin was doing and identified them during the development of the the Metal Backend and started coming up with solutions.
One such problem is that Dolphin actually had too many configurations to precompile every pipeline configuration. But, there was a way to mend that. By sacrificing performance and simplifying the Ubershaders to only use a single method for blending, we could actually precompile every pipeline configuration with some other changes down the road. To achieve this, FrameBuffer Fetch is now used for almost all blend unit configurations in the Ubershaders for GPUs that support this feature. Considering that Apple GPUs and some mobile GPUs support FBFetch, many of our users should be able to see a major difference.
This reduces the extra shader compilation stuttering that was happening on Apple and some mobile chipsets and further sets us up to reduce stuttering across all GPUs. TellowKrinkle has more changes on the way that should completely eliminate* Shader Compilation Stuttering on many GPUs that currently have issues in Vulkan and D3D12 when using Exclusive or Hybrid Ubershaders. Stay tuned.
5.0-16861 - ControllerInterface - Add Support for Native Motion/Rumble with SDL2 and Re-Enable SDL on Windows Builds by shuffle2¶
Back in July 2015, Dolphin waved goodbye to its SDL input backend. Back then, there wasn't much point for Dolphin to even be using it. It didn't really serve our needs on Windows or macOS, and on Linux it was essentially just an evdev wrapper that we could (and eventually did) implement ourselves.
Unfortunately for Dolphin, the vastness of SDL has caused a number of problems over the years, and has slowly been removed bit by bit. For example, in 4.0-1628 the SDL backend was removed from Dolphin on Windows due to numerous crash reports related to webcams. SDL was reading the webcam as a controller with thousands of buttons, crashing the emulator.
However, over the last seven years, a lot of resources have been pumped into SDL and it's greatly improved from when we waved goodbye. In 2018 it added a feature that was very interesting for us - Motion Inputs. Dolphin didn't yet have the emulated MotionPlus infrastructure, but we've kept our eye on SDL knowing that it might provide an easy way to get Gyro and Accelerometer while also being a stable cross-platform API that we could rely on much like Cubeb has become for audio.
In 2022, we've finally taken the plunge and re-added support for SDL, relying on the new and improved versions of the controller API.
On Linux and macOS, the situation is similar. This is especially important for macOS, where Motion Controls are a lot more annoying to setup. In fact, Wii Remotes can't easily be connected to any macOS version from Monterey onward due to an unknown bug with Apple's Bluetooth stack that seems to be rather low priority to fix. Some users have found success using external Bluetooth adapters or using a DolphinBar.
This is one of the longest awaited features that has been requested many times over the years. You can now set a SD Card Sync Folder that allows you to place files in that folder and let Dolphin automatically create an appropriately sized SD Card. This setup was inspired by how melonDS handles SD cards and was originally submitted by stblr in March.
Previously, handling SD cards in Dolphin was rather annoying, especially on Windows and Android. On Linux/macOS, it was a bit easier. There, you could mount the sd.raw file that Dolphin uses to simulate the SD card and edit files on it just like you would a real SD card. On Windows, this could only be accomplished by installing a third party program and is generally a bit buggy when trying to unmount virtual drives. On Android, you couldn't edit the sd.raw at all without access to another Operating System!
The new system isn't perfect, but it does simplify things quite a bit. Dolphin still uses the sd.raw file during emulation, but now you can choose a folder that Dolphin can use to build the sd.raw file. No more mounting, no more third party programs, just simply configure your sd card folder how you want it and let Dolphin do the work. Dolphin can also export the current sd.raw into this folder, if you've already been using other methods.
As a potentially added bonus, there is an option to automatically sync this folder at emulation start and end. This will keep your folder up to date if you're using it a lot, but we don't recommend using it on larger SD cards, as this process can take a long time on slower disks and cause a large delay in starting Wii emulation.
Due to the SD card code being rewritten, a few of the defaults have changed as well. Logically, it made no sense that the old SD card default path was inside of the Wii NAND and this gave us a good excuse to finally fix it. While this update won't override existing paths, all new users will have SD card paths moved into the "Load" folder along with other things like texture packs. In new builds, the default sd.raw name has also been changed to WiiSD.raw.
For those of you on Android, there is some good news and bad news. The good news is that this feature has been ported over by JosJuice in 5.0-16970. It works the same as in desktop builds with one exception, which is the bad news. Due to Scoped Storage, we cannot let you customize the SD folder location and accessing Dolphin's app data locations may be annoying. On most Android devices, the easiest way to do this would be by connecting it to a desktop and manually accessing the displayed directories for your device.
As a reminder on what this is, the GameCube and Wii use a unified memory model, where the CPU and GPU can edit eachother's memory at any point with no performance cost. However, Dolphin runs on split memory model systems where the CPU and GPU cannot access eachother's memory easily. To emulate the unified memory model, Dolphin duplicates the GC/Wii Main Memory on both the host's system memory and the host's GPU vram, and we sync any changes across both memory pools. This syncing is very costly, and it is why games that require Store EFB Copies to Texture and RAM are so demanding. Deferred EFB Copies is as an optimization to this process. Rather than syncing every single time the game changes something in memory, Deferred EFB Copies delays EFB Copies to the host system memory so it can batch many EFB Copies to RAM together for a sizable performance boost.
Unfortunately, an oversight in the implementation limited the performance boost this option could give.
When using Deferred EFB Copies, there was a system set up to not flush command buffers if multiple EFB copies were issued in succession, but it only ever checked whether it needed to flush when an EFB copy was issued. TellowKrinkle noticed that if a game is using a lot of EFB Copies in quick succession, Dolphin doesn't check to see if they need to be flushed during this burst of EFB copies. Not flushing for each EFB copy and then flushing at the end would normally be faster, but Dolphin was waiting far too long after the burst of EFB copies to actually start checking if things needed to be flushed, causing a sizeable drop in performance when the flush and GPU <-> CPU synchronization finally happened.
This is a rather particular scenario that shouldn't be that common, but one of the games that do hit this issue is rather demanding, making this a big optimization: Metroid Prime 3.
In a finale to the crazy EA Sports Active pink tinting issues, Pokechu22 has made the necessary changes to fix the remaining imperfections with color handling. While it's not as dramatic as the original issue, some of the colors of various objects were still wrong in Dolphin due to differences in rounding and some weird overflow behaviors that were left unemulated.
With this added, EA Sports Active now renders perfectly in Dolphin this plus Manual Texture Sampling. Quite a difference from just a year ago.
5.0-16838 - Add HLE Broadband Device by schthack and many additional fixes by sepalani¶
The GameCube Broadband Adapter (BBA) is another of those "What Ifs" of potential that hit the GameCube. Much like GBA connectivity there was a lot of potential for great things with the GameCube BBA, but very few games truly took advantage of it. In fact, there are only six total games that can use the Broadband Adapter for gameplay.
- Phantasy Star Online I and II (and I and II+)
- Phantasy Star Online III: Card Revolution
- Mario Kart: Double Dash!!
- 1080° Snowboarding Avalanche
- Kirby Air Ride
- Homeland (JP Exclusive)
Not all of these were created equal, however. Only the two Phantasy Star Online games and Homeland supported true online play. All the others only supported local play via LAN (local area network). Much like GBA connectivity, the feature saw limited use simply because of the expenses and setup required. For an eight kart game of Mario Kart: Double Dash!! with everyone getting their own T.V. you'd need:
- Eight GameCubes
- Eight T.V.s
- Eight Controllers (16 if you want two players per kart!)
- Eight Copies of Mario Kart: Double Dash!!
- Eight BBAs
- Networking cables and ethernet switches capable of handling this mess
Unlike Mario Kart DS, where people could just pull out their own DS out of their pocket and be in the game in minutes, setting up for GameCube LAN games was a huge endeavor. For games like Kirby Air Ride it was hard to justify the cost and setup when the split-screen option afforded identical features with the only downside being less screen real-estate and your opponents being able to see your screen.
Emulation didn't really make things easier or better. If you were willing to go through the daunting setup of creating a TAP adapter on your computer, it was possible to link emulators together. But due to Windows doing Windows things, this was mostly limited to running all of the instances on the same computer. This mostly defeats the point of BBA entirely! On Linux it was a lot easier, but it was still a lot more trouble than it was worth to get working.
This is where schthack comes in. For those that have been in the GameCube community for a long time, that name may seem familiar because of schtserv, schthack's Phantasy Star Online I and II replacement server that's been keeping the game's online service alive for many, many years.
While Dolphin technically could connect to schtserv, it was a less than ideal setup due to how difficult it was to setup on Windows and how fragile it was. While the XLink Kai BBA option avoided the need for a TAP adapter entirely, it required its own service through XLink Kai and is primarily used for tunneling LAN games through the internet, like Project Warp Pipe once did during the early days of the GameCube.
Dolphin's BBA emulation was mostly low level (LLE) and functioned very similar to how it worked on console. However, because of how exactly it behaved, it did not work on Windows without a TAP adapter, and didn't work on Android at all. On Unix based operating systems, you'd still have to go through setup with OpenVPN and virtual interfaces, making it quite a process regardless of your choice. In order to make things easier, Dolphin would have to translate what the BBA was doing to something that would immediately work on the host device's network. What we needed was an high level (HLE) solution, and schthack wanted to make it happen.
Getting it working was one thing, but actually cleaning up Dolphin's ancient BBA code was another. In fact, Dolphin's LLE BBA implementation originated from documentation from Whinecube, a rival GameCube emulator that ceased development back in 2006. The documentation (and implementation based off of it) were rock solid enough that it's remained almost untouched since it was implemented. Unfortunately for schthack, that also meant that some of the code needed to be modernized for modern Dolphin. Thankfully, sepalani, another network engineer, was available to help harden the networking code. The initial results were astounding.
That isn't to say things were perfect - getting all of the Dolphin instances to detect each other can be a bit annoying, especially on Windows. Cause Windows things. Thankfully, sepalani has continued to improve the HLE Broadband Adapter with a myriad of fixes. While sometimes there are occasionally still detection issues, especially when dealing with real hardware, usually any combination of Dolphin, GameCubes, and Wiis (running BBA emulation through Nintendont) can be connected together with enough dedication.
As a note, there are some limitations with one of the games that has nothing to do with the networking code. 1080° Avalanche is not compatible when playing against physical hardware. There is some kind of miscalculation in the physics that causes Dolphin instances to desync, ruining the session. While this bug has not been fixed as of this beta, there was originally a second bug involving AArch64 devices, causing them to desync with x86-64 devices! JosJuice quickly quelled that bug, meaning that theoretically even Android users could join in on the BBA fun... if they had BBA...
5.0-16950 and 5.0-16967 - Android: Add XLinkKai Broadband Adapter and HLE Broadband Adapter Options to GUI by codedwrench and JosJuice¶
If you didn't catch it, Dolphin on Android now has access to Gamecube Broadband Adapters! Thanks to work by codedwrench to add the XLink Kai BBA option to port everything over to the Android GUI, users now can easily access the settings. JosJuice took that framework and ported that over to support the HLE Broadband device.
This means that GameCube LAN play is now in the palm of your hands. A group of friends (with exceedingly powerful phones/tablets) can now just make sure they're on the same network, select a BBA option, and instantly be able to play any of the LAN supported games against eachother.
Sure, it's limited to just Mario Kart: Double Dash!! 1080 Snowboarding: Avalanche, and Kirby Air Ride, but that's fun in and of itself. If you can't gather everyone together but live nearby, setting up XLink Kai might be able to get you playing anyway. Note that these games have a fairly low latency tolerance, so we do recommend being on the same network if possible.
If you're looking to get your fix on for Phantasy Star Online I and II, the Android HLE BBA works the same way as the desktop HLE BBA.
If you use a lot of homebrew in Dolphin, you're probably someone who commonly swaps over to LLE audio. This is because a lot of Homebrew use special homebrew DSP microcodes called libasnd and libaesnd. Dolphin's DSP-HLE had no way to handle these microcodes!
Feeling a bit of deja vu? Same here. This is the same change as last Report except this time it's tackling libaesnd instead of libasnd along with a few more edge cases that cropped up over the years.
With this, most major homebrew should be supported in DSP-HLE. There may be exceptions, and some homebrew may opt to create their own microcodes. In those edge-cases, DSP-LLE will always be there to cover the holes.
For preservationists and those users who have bought Wii titles from the Wii U eshop, this is a rather important update. JosJuice has added direct support for the NFS file format that Wii games are distributed in on the Wii U eshop. Given that the Wii U eshop is going to end the ability to purchase new titles, your days are numbered on how long you can purchase these digital copies. For a full list of games available via the Wii U eshop, please check out this link.
While this list is fairly limited, given the price spikes of physical copies, this may be the cheapest way to get some of your favorite Wii games until the shop goes down.
As for the NFS format? There really isn't much interesting about it. It's a lossy format that removes garbage data. Adding support for it was not very difficult, and was mainly done for users wanting to test/use Wii U eshop dumps more easily.