As we hit the holiday season, our Progress Report might be considered a bit late. A two month report became a three month report as we realized just how much work we had to catch up on. While the usual summer burst of activity didn't come, it seems instead everyone poured their time in throughout the autumn months! There's so many features, performance improvements, quality of life updates, and more that had to be considered.
We're going to have to skip out on some of the smaller updates this time around because there are so many big hitters. For instance, if you hate shader stuttering, Dolphin's Ubershaders have gotten a new tool that helps smooth out issues on Vulkan, D3D12, and Metal thanks to Dynamic Vertex Loaders that help reduce/remove pipeline compiles during gameplay.
If you're on a weaker device that stays away from Ubershaders... maybe after these optimizations you might finally be able to make the leap. Raw performance in Dolphin is up across the board thanks to many optimizations to the GPU emulation thread (which is emulated on CPU). Because this optimization affects the very core of Dolphin, pretty much every game should be faster, with a few select games seeing improvements of roughly 50%!
If you're looking to play with friends, we have some good news on that front as well. Dolphin's "experimental" Wii Remote Netplay support has finally received some much needed attention that may help it break free of that experimental moniker in the coming months.
And, for our Android users, a lot of the performance improvements also affect tablets and phones, but we also have a special treat only for you. The Android GUI has also seen a huge overhaul that should make it easier to use and easier on the eyes. And for those having problems with particular games using features Dolphin can't reasonably emulate, we have a few presents from an old friend to patch them up.
We could go on and on, but you know what time it is. Please enjoy these Notable Changes!
And after all of the changes and features we mentioned in the intro, we neglected to mention our headline change for this report. It's almost fitting - WiiConnect24 came and went without leaving much of a mark in the gaming world. It was touted as a new way to be connected to the internet 24/7, with access to up to date information from the comfort of your couch. You could see the weather, the latest news, receive messages (with pictures!) and more through the WiiConnect24 service. Even if your console wasn't on, the disc tray would glow blue if you have a new message, a Wii Update, or something else awaiting you for the next time you booted things up.
By 2013, the unending march of technology had already consumed much of the usefulness of the service, and everyone expected it to slowly fade away. But to our surprise, on 13 April 2013 Nintendo announced the closure of WiiConnect24 in late June, only 44 days away. At the time Nintendo Wii Wi-Fi Connectivity Support was in its infancy. Our mastery of NAND and IOS features was simply not good enough to support WiiConnect24, and many of its channels didn't even run in Dolphin at the time. When WiiConnect24 disappeared, we were years away from being able to support it. As much as it pained us, we were unable assist in the efforts to preserve WiiConnect24.
However, there was another option for testing and preserving WiiConnect24 - the actual Wii. From the early days after WiiConnect24's closure, preservationists and reverse engineers have been chipping away at reviving many of the channels and features seemingly lost. For international/default channels, RiiConnect24 offers good support across a lot of channels. For players looking for some Japanese exclusive channels, WiiLink has been slowly building support for these obscure channels that offer up some neat experiences that never saw the light outside of Japan.
Some efforts have been made by the RiiConnect24 developers to bring their service to Dolphin, but these have been a bit hacky. They have workarounds to get certain channels working within Dolphin, though it required a lot of setup in order to prepare VFFs (Virtual FAT Filesystems) that the channels used to store WiiConnect24 data. It was not user friendly, as it was difficult to set up, had limited functionality, and could easily break at any time.
Sketch, WiiLink project lead and RiiConnect24 contributor, realized that direct support in Dolphin would be a boon to both projects. It would open up their replacement services to more users and also make it easier to debug issues. But of course, it was easier imagined than executed. Sketch identified three major challenges that blocked WiiConnect24 emulation: support for VFF files, implementing some of the missing IOS functions, and figuring out the many fields of nwc24dl.bin and what they do.
Figuring out nwc24dl.bin was a reverse engineering job. We mentioned that WiiConnect24 titles could download updates while the Wii was in standby mode. How often it checked among other important things is stored in nwc24dl.bin. Emulating the standby feature is thankfully unnecessary, as WiiConnect24 channels also check for updates while the actual Wii is running and when you load them. Because Dolphin doesn't support the Wii's background scheduler, the only check it could possibly support is on channel boot.
Once the many fields of nwc24dl.bin were reverse engineered and understood so it could be HLE'd, the next thing that needed to be handled was actually downloading the data needed for these channels. To do this, an IOS function called
DownloadNowEX needed to be supported. This is a download handler that can download anywhere from 1 to 32 files. If it is downloading more than one file in a single task, these files are called "subtasks" and there is a flag set in memory saying this download is a subtask and how many files it needs to download. Many of the channels take advantage of the subtask functionality, and the News Channel will download 24 subtask files when it updates!
These files that are downloaded now need to be packed into VFF files for use by the channels. Thankfully, Dolphin already has to handle the FAT filesystem thanks to SD cards and whatnot, so Sketch could get a jump start on support by copying code from there. But this wasn't an easy task, and actually supporting the VFF files required a little bit of finessing. The headers on the files were mostly undocumented and needed some digging into. However, once that was taken care of, Sketch streamlined the solution with AdmiralCurtiss's help.
Is the situation perfect? Absolutely not. Like we said, Dolphin doesn't support background updates, so on their very first boot most of these channels will error out. After that they should work normally as long as the NAND stays intact. The other thing to note is that Dolphin currently doesn't have a way to redirect WiiConnect24 requests. That means users will need to patch the channels to seek out the third party servers.
If you're into weird era specific features, connecting to these services is absolutely worth the time. Just to give you a taste of what you can find, here is a clip from Nintendo Week 7 Dec 2009, captured from Dolphin using RiiConnect24.
If you want to see more great videos like this, patch your Nintendo Channel today and check out the Nintendo Channel on RiiConnect24 with either your Wii or the latest builds of Dolphin!
Note: RiiConnect24 and WiiLink are third party services much like Wiimmfi. You should make sure that you trust any third party service before connecting to it via any emulator. However, both RiiConnect and WiiLink are open source, and were tested by developers during the development/testing of WiiConnect24 changes.
As we said in the intro, Wii Remote Netplay has been a frustrating, yet tantalizing, feature since its inception. The Wii has a ton of multiplayer games that either didn't support Wi-Fi, or had substandard Wi-Fi support leading to couch multiplayer being the superior choice. Dolphin has netplay that lets you take couch multiplayer online. Everyone wins, right? There's one problem: Wii Remotes.
Using Wii Remote Netplay is a test of frustration for all but the most hardened of users. A ton of pitfalls awaited anyone who dared try to use it.
- Wii Remote Netplay did not support automatic port assignment, which was completely the opposite of GameCube Controller Netplay. This meant each player had to configure the actual Wii Remote slot's profile to match what slot they were assigned to on netplay. To see how automatic port assignment works in Netplay now, please check out the netplay guide.
- Wii Remote Netplay would crash on close due a previously undetermined bug with Safe Shutdown on netplay. Some Wii games flush saves during "Safe Shutdown" so we couldn't just outright disable this feature on netplay.
- Wii Netplay has to export saves at the end of a Netplay Session if a user wants to permanently save them. But, because Wii Remotes would usually cause Wii Netplay to crash during safe shutdown, this meant that saves would get lost in limbo. While it was technically possible to restore them, it required expert knowledge and manual reconstruction of the NAND.
- If a Wii Remote disconnected during a netplay session, for any reason, Dolphin would crash. This is despite there being code designed to handle this exact situation. Unfortunately, this meant that games like Donkey Kong Country Returns and Dokapon Kingdom couldn't be played on netplay, as these games will disconnect various Wii Remotes during the menus.
- The Real Wii Remote setting will never work on Netplay. This is less of a problem nowadays thanks to the innovation of connecting physical Wii Remotes as emulated controllers and the "Wii Remote" controller profile. Some problems solve themselves.
These limitations and a confusing UI for setting up Wii Remotes on netplay meant that only extremely well researched users could actually use it. In order to fix all of these problems, AdmiralCurtiss decided to rethink how Dolphin would communicate Wii Remotes over netplay.
If it wasn't obvious by the list of problems, Dolphin's netplay was not designed for the complexity of the Wii Remote. Still, it was implemented because we figured someone would want to use it, even if it was a pain in the ass to setup and use. Unfortunately, it was hard to recommend to anyone but the most desperate of players because of all the hoops you had to jump to get there.
The initial implementation from RachelBryk was an experimental solution to enable functionality. It was meant more for experimentation and testing than general users, but once they saw it could work, users obviously wanted to use it. It was hard to provide help for the feature because it was so temperamental that most developers couldn't tell what was wrong. It got to the point where Wii Remote Netplay was one of the few features outright removed for the Dolphin 5.0 release in order to prevent users from thinking it was a fully working stable feature.
Dolphin's Netplay has evolved over the years. The advent of the desync checker revolutionized netplay, and allowed users to immediately know if something was wrong on boot. Other features, like the "Blank NAND" further stabilized Wii Netplay in general. Wii Netplay was on the rise, but one problem remained constant: The Wii Remote.
The problem with fixing this is that the Wii Remotes are so complicated to handle. They communicate over Bluetooth at a high polling rate of 200hz, have their own logic for attachments, come equipped with accelerometers and (optionally) a gyroscope, carry a speaker which can play 8-bit PCM audio, and even come with an infrared camera with a resolution of 128x96. While Dolphin doesn't enable it on netplay, Wii Remotes even have a small bit of EEPROM for storing Miis on them! Wii Remotes are weird. But worse of all, the Wii Remote's reporting mode can change at any time, and the sessions have to be perfectly synced in order to prevent a deadlock from the wrong inputs being received on one or more Wii instances.
Sometimes in order to make things work, you need to start fresh. In order to start the healing, AdmiralCurtiss did the most logical thing and immediately threw out most of the existing Wii Remote handling code used for Netplay. At this point, we knew the problems with Wii Remote Netplay and could now try again to write a solution that avoided these issues.
The big one is that when a Wii Remote talks to the Wii, it does this with input reports - small chunks of data sent over the Bluetooth connection that contain a partial Wii Remote state. These reports are what was sent over the network during Wii Remote Netplay. The problem comes that the game can configure at any time what kind of report it wants the Wii Remote to send. This means if there's any kind of problem on netplay on any of the clients and the reports don't match what the game is expecting, Dolphin would lock-up. In the best case, netplay would close but usually the entire emulator would crash.
Because of how sensitive this is, AdmiralCurtiss decided to take a different route. Instead of sending just what the game is requesting, all players will send all of the Wii Remote state across netplay and let each Dolphin instance translate those into an input report. This makes Wii Remote netplay more resilient to crashes over minor configuration problems and also opens up a ton of new features. Since we're no longer relying on each Dolphin instance to sort out the Wii Remote state, players on netplay can now do things like change attachments during netplay and it can handle events where the game might forcibly disconnect a Wii Remote.
AdmiralCurtiss also standardized Wii Remotes to use the same conventions as GameCube netplay. This means Dolphin's automatic port assignment now works the same across all controller types. Best of all, Dolphin no longer crashes when you attempt to end a Wii Remote Netplay session.
That's not to say everything is perfect. AdmiralCurtiss mostly focused on getting things working with this rewrite, but there are still some missing features. "Golf Mode" and "Host Input Authority" should be able to work under the new implementation, but weren't done alongside the first wave of changes. As well, SD cards are not synced on netplay, so it's usually best to unplug them or make sure they are manually synced between players if SD cards are absolutely necessary.
With these changes, we're hoping that Wii Remote Netplay will finally be a feature that our general users can setup and enjoy.
We're going to be talking a lot about Vulkan coming up as there were some renovations done throughout the backend. A consequence of these renovations is that Dolphin now uses Vulkan Memory Allocator (used by many open source projects to efficiently allocate memory, including other emulators) to handle memory management with Vulkan.
This is a definite improvement, but it didn't exactly fix anything. The problem was that other changes to the Vulkan backend left us with some instability problems and we thought that Vulkan Memory Allocator would fix the issue and most likely improve performance. It didn't fix the issue and didn't have an immediate impact on performance, but it still optimizes how we're handling memory in Vulkan. VMA is a definite improvement and we're happy to join the ranks of projects using the library.
5.0-17499 - Vulkan: Raise number of command buffers and 5.0-17620 - Vulkan: Allocate descriptor pools as needed by K0bin¶
Now we're going to get into why VMA was thought to be needed. Namely, Dolphin's usage of Vulkan was pretty primitive in some ways. The emulator was only using two command buffers and a single descriptor pool of 100,000 descriptor sets every frame. These choices weren't exactly efficient, but they weren't exactly causing problems either. Except on Android.
Our eternal battle with GPU drivers on Android has an ebb and flow. While we love to blame the drivers for their problems, there's always the otherside of things where Dolphin does something stupid. This time, we were the ones being grossly inefficient.
K0bin found out that Adreno GPUs were forced to stall when running Dolphin because they were waiting on an available command buffer, particularly in games that need readbacks (EFB/XFB2RAM titles.) By increasing the number of command buffers to eight, K0bin reduced GPU stalling and made it so the mobile GPUs could work more efficiently. This doesn't mean your maximum performance at 1x Internal Resolution will go up, but this might mean you'll be able to play at a higher internal resolution before you become GPU limited!
This is a major optimization and we wanted to show some fancy graphs showing how much better things were running. Unfortunately, Android said no. Even though we reduced GPU usage by up to 30%, due to performance governors on the phone clocking things down, we were unable to see a difference in framerate on our devices. While sometimes the CPU would clock up, the numbers varied so greatly even within the same build, that we no longer feel confident in giving performance numbers on Android.
The good news? This brings Vulkan's performance roughly in line with OpenGL on Android under Adreno devices. While we can't guarantee the drivers play nice in every game because performance testing has been a nightmare, we were able to confirm this fixes NES VC games being abnormally slow on Vulkan + Adreno.
Unfortunately, this optimization had some unintended effects that reverberated through the Vulkan backend, especially for our macOS users through MoltenVK. Players were reporting new and exciting crashes throughout tons of games, forcing K0bin into figuring out what was going wrong.
Thankfully, the issue became apparent relatively fast. The new code wasn't wrong but other parts of the backend relied on how things worked before. Previously, if Dolphin ran out of descriptors, it would submit a command buffer expecting that to get it a fresh descriptor pool. With the new changes, submitting a command buffer would no longer get a fresh pool, and Dolphin would find itself still out of descriptors, at which point it would skip the draw.
In order to prevent the crashes, graphical issues, and further reduce memory overhead, K0bin changed Dolphin to now allocate descriptor pools as needed. This technically could positively affect performance, but we weren't able to measure any difference in practice.
New hardware always brings new opportunities. But that new hardware doesn't need to be something that pushes the boundaries of performance. In fact, one of the most exciting new devices this year is made up of decent, but unremarkable hardware in an interesting form factor. Of course, we're talking about the Steam Deck. This is an exciting target for Dolphin because it's got a respectable CPU and real GPU drivers paired with a mobile gaming form factor. In theory, it should be great for Dolphin.
However, early days of testing has been a somewhat rocky experience. Using Dolphin's desktop GUI on the Steam Deck isn't fun and takes a lot longer than you would like. Setting up controls is frustrating and actually getting Dolphin to compile on the Steam Deck is a nightmare unless you're fairly well versed in "Linux-fu". Thankfully, there is an easily accessible flatpak version (not maintained by Dolphin Emulator Staff) that keeps up with our beta builds, but it has a few problems not present when you compile yourself.
Thankfully, the situation improves a lot once you get past the setup stage. Light-to-midweight games run very well, and if you need a bit more juice you can always lock the GPU at a higher clockrate in exchange for lower battery life. But the hardware was just missing full speed on some mainstream titles. One game in particular was Super Mario Galaxy, which hovered around 80 to 90% speed in several strenuous areas.
A slight optimization wouldn't be enough to make one of the Wii's premiere 3D platformers fullspeed on the Steam Deck, but it was close enough to at least give it a shot. Having observed the performance problems first hand, K0bin downclocked their desktop and decided to profile things there. While a 20% jump in performance was a lot to ask for, they figured they could look for some low-hanging fruit and at least bring it closer.
Things took a twist when they discovered that the function GetVertexSize was using a lot of CPU time on the GPU emulation thread. This function should only need to come up once per vertex format and isn't anything that you'd expect to see in a flame graph. Surely there was a reason, right? Well, they started optimizing it and immediately got positive results. In fact, just optimizing this alone was able to push performance up on a downclocked R9 5900 roughly 30%! That's a huge jump, and one that had to be seen to be believed.
Testing it on the Steam Deck didn't provide as huge of a jump everywhere, but it was just enough to push it to nearly full speed in most demanding areas. But then the question became how we missed such an obvious problem.
git blame revealed that this is a performance regression from just under 2 years ago during the fifoplayer quality of life updates. We didn't notice the performance regression because other optimizations were masking it, but an erroneous duplicate GetVertexSize was added which was causing it to be called for every single primitive. Suddenly something that should only take a little CPU time was taking a lot.
Finding this issue inspired K0bin to continue to optimize that part of the code. They discovered that we could also use the cached VertexSize instead of having use GetVertexSize again in some cases. This gave another jolt to performance.
The fix plus the optimization put together more than did the trick. We tested the Steam Deck after painfully compiling a build with the optimizations and played through a good bit of Super Mario Galaxy (and a little of Super Mario Galaxy 2) at full speed and 2x Internal Resolution without any noticeable slowdown. In fact, you can even throw on Hybrid Ubershaders and get a pretty smooth experience on par with a mid-range desktop!
So, at this point you're probably wondering if this optimization carries over into other games. After all, pretty much every game is going to be using primitives, right?
The answer is yes, but the numbers aren't usually this ridiculous. On high-end gaming PCs using Dolphin's default settings, the Super Mario Galaxy games saw an improvement of around 30% when running at 4x Internal Resolution. High polygon games that lack other bottlenecks will improve similarly, but games that have different performance profiles may not see as big of a benefit. If a game is really difficult to run with tons of different bottlenecks, they will see very little gain. Rogue Squadron 3 gains less than ~1.5% performance in most areas.
Most normal games will see an increase in performance ranging from around 5% to 10%, with some anomalies like Super Mario Galaxy seeing larger gains.
The other major bottleneck in the Super Mario Galaxy games comes from their use of EFB Peeks. In essence, an EFB Peek is when the GameCube/Wii CPU reads pixels of the Embedded FrameBuffer (EFB). By peeking into it the game can learn information about one or more pixels, such as their depth, color, etc.
Super Mario Galaxy 1 and 2 use EFB Peeks for two things, and only one of them is necessary for gameplay. The first thing is a depth check for pixels on screen. This is how the game is able to tell where your cursor is in relation to 3D space. It's used to tell where to shoot starbits and can be used to see if you're trying to grab objects like Pull Stars.
The other use doesn't affect gameplay but can hurt performance just as much. This check is only active while a "sun" is on screen. Even when the sun is within the field of view of the camera, that doesn't mean it's actually visible. There might be a wall, planet, ground, or something else occluding (blocking) the view. That's where EFB Peeks come in - the game uses EFB peeks whenever a sun is in front of the camera to make sure it is actually visible! This is why many users complain of slowdown on weak devices only when a sun is on screen!
This technique is an easy and effectively free way to sample the screen for occlusion on the GameCube and Wii. Tons of games use this technique (or a variation on it) for rendering lens flares only when the source light is visible. For example, The Legend of Zelda: The Wind Waker does the same check but with inverted conditions - it isn't checking if the sun is visible but if it is blocked. However, not every developer got the memo on this simple lesson. See Rune Factory: Tides of Destiny.
While this technique is more-or-less free on the GameCube, it is an absolute pain to emulate. We've covered something similar in the previous Progress Report so this explanation will be a little familiar.
The GameCube and Wii use a unified memory model, where the CPU and GPU can read or edit eachother's memory at any point with no performance cost. However, Dolphin runs on split memory model systems where the CPU and GPU cannot access eachother's memory easily. To emulate the unified memory model, Dolphin duplicates the GC/Wii Main Memory on both the host's system memory and the host's GPU vram, and we sync any changes across both memory pools. Any time a sync is issued, we have to stop all work until the sync is complete, so the host hardware will just sit and wait while the data is moving through the system. At the speed computers operate, moving data from system RAM to GPU VRAM takes ages, so all that waiting causes a sizable performance hit.
Specifically with EFB Peeks, we have to stall the GPU and synchronize the memory pools so the CPU can read the rendered frame.
Fortunately, desktop GPU drivers have many optimizations to minimize the impact of stalls, and modern desktops are so powerful they can usually just eat the performance hit while keeping emulation far above full speed. The only time that EFB Peeks become a problem for Dolphin is in very extreme cases where they are combined with other shenanigans (EFB Pokes, Store EFB Copies to Texture and RAM, etc.) or weaker hardware with iffy drivers and stingy governors.
Whether you're on Adreno, Mali, or PowerVR, a game using EFB Peeks, Pokes, or one that requires the use of Store EFB/XFB Copies to RAM usually spells doom for performance. In the mobile landscape, everything is optimized to save power. But mobile SoCs do not understand Dolphin's weird workload. If a game triggers a memory pool sync, we have to stall the GPU while data is moving around. Mobile power governors see us stalling and think we are done with work, and immediately drop the clocks back to idle. But a few microseconds later we're back to needing maximum power. Unfortunately, the SoC will take its time and reluctantly raise clocks back up. Super Mario Galaxy and Super Mario Galaxy 2 will be doing many of these checks every frame. Unless your phone is absurdly powerful or your performance governor is extra forgiving, it's very hard to maintain to maintain full speed with all of this going on. Many of our mobile players have been working around this by only enabling EFB Peeks in stages that absolutely need them.
There is no way to make stalls cheap, and power governors are mostly out of our control, so the challenge becomes bundling and reducing the number of required readbacks as much as possible. This optimization by K0bin helps us get a little more mileage out of our last GPU sync/stall. Instead of just invalidating the EFB Peek cache, on tokens and drawdone, we can queue an update to all previously used EFB tiles. If the game only does work on the CPU between the DrawDone or Token and the actual EFB read, the read is able to hit the prepopulated cache and no longer needs to stall the GPU for another readback. This reduces the number of readbacks in many situations.
With this change, EFB Peeks are significantly less costly. In our testing, a Snapdragon 865 was able to run many stages of Super Mario Galaxy at 60 FPS without disabling "Skip EFB Access to CPU", including ones with Pull Stars that would need the performance the most.
Is this a catch all to improve performance across all games? Unfortunately, not really. EFB Peeks are pretty much the best case scenario of readbacks, and this change further cements this. Other games like Super Mario Sunshine uses Peeks, Pokes, PerfQueries, Store EFB Copies to RAM, and more. Even though this game only runs at 30 FPS (unless you choose to use a 60 FPS mod), the fact it uses so many difficult to emulate features turns it into a performance nightmare on Android.
This optimization has been merged into the setting Defer EFB Cache Invalidation, which is enabled automatically for the Super Mario Galaxy games. Because these cached results could break things in other games, we haven't enabled this optimization everywhere. You can test this change out in other games by enabling it under Graphics Settings > Advanced > Defer EFB Cache Invalidation.
D3D12 is usually considered Dolphin's most performant backend for integrated and low-end graphics cards on Windows. Unfortunately it's also gotten another reputation among people with stronger computers: the most unstable backend. In fact, there have been dozens of reports of crashing in games specifically when using D3D12, forcing users to switch to other backends.
Things went from bad to worse when unrelated optimizations in 5.0-17542 caused more games to start crashing in D3D12. So K0bin was enlisted to figure out why this optimization that improved all backends would only cause problems in D3D12.
After fixing up D3D12's validation layers, which hadn't been touched in a while, K0bin quickly found out there was a slight issue with
BindFramebuffer in the D3D12 backend. It seemed very possible for it to rely on a pipeline configuration that hasn't been set. While we had seen a lot of reports of crashes in D3D12, we were never able to figure out an exact cause because of how particular it was to trigger it. You needed specific games, running under specific settings, using a specific GPU and sometimes even a specific driver version.
Ironically, by accidentally making the problem occur more often, K0bin made it much easier to track down and fix. All that was needed was a small change to prevent Dolphin from relying on unset pipeline configurations. This fixed both the new crashes on D3D12, and the previously unsolved crashes that had been plaguing it for years.
We're happy to report that D3D12's stability should finally be on par with the other backends now.
The last couple of changes have been about achieving maximum performance, but now we go into a different kind of performance. As mentioned in the last Progress Report, tellowkrinkle has been working toward smoothing things out with Ubershaders. Due to various issues across various drivers, Ubershaders haven't been able to completely remove stuttering, especially in the modern Graphics APIs.
The original Ubershaders were created back when D3D12 and Vulkan were the "next-gen" video backends and most of our users were on D3D11 and OpenGL. We were aware that the new APIs used "pipelines" but it wasn't 100% clear the effect they would have on Dolphin yet. For Ubershaders, it's been a problem, to say the least. We talked about these a bit in the last Progress Report where tellowkrinkle already had one change dedicated toward reducing pipeline counts.
For a quick rundown, the graphics pipeline is a fundamental building block for rendering on newer graphics APIs. It contain all of the data for a draw, taking vertices and textures all the way to the pixels in the render targets. However, since the pipeline contains all the settings for the draw, the display driver can optimize it before feeding it to the GPU. Some drivers then optimize those pipelines by combining the hardware vertex loader settings into the actual shader, which essentially makes our Ubershaders not so Uber anymore. So when the vertex loader on the GameCube changes settings, a whole new shader would need to be compiled with new vertex loader settings, usually resulting in a lengthy stutter.
We did run into this issue with Vulkan when Ubershaders were first implemented, but we were hopeful that API extensions or driver updates would eventually lessen or erase the issue. That hasn't happened.
tellowkrinkle's solution isn't to make pipeline generation faster, but to simplify our pipelines so there are fewer branches and thus fewer pipelines in total. tellowkrinkle was able to reduce our pipeline count so much that now nearly every pipeline is able to be precompiled at boot! And while these changes aren't able to remove the possibility of shader stuttering on every backend on every graphics card, it brings us much closer to that reality.
The Vertex Loader Ubershader, along with some changes leading up to it make a huge difference in shader compilation stuttering on all hardware.
- Metal: macOS users can rejoice, as there are now no known drivers that will suffer from shader generation issues when using Ubershaders.
- D3D12: On NVIDIA and AMD graphics cards, using D3D12 should now provide the same stutter free experience as D3D11 has when using Ubershaders. Intel's drivers were not tested, but are likely greatly improved or completely stutter free.
- Vulkan: This is the most complicated case. On macOS via MoltenVK, shader compilation stuttering is completely eliminated except on Apple Silicon, where there may still be rare pipeline compiles. On Windows and Linux, the situation is improved but NVIDIA still tends to have a few noticeable compiles here and there. Wider support of the
VK_EXT_rasterization_order_attachment_accessextension would help eliminate pipeline compiles across pretty much every driver. That includes mobile drivers too, such as Adreno and Mali!
- D3D11: D3D11 does not have pipelines and doesn't benefit from this change. It also didn't suffer from the problems, so there's that.
- OpenGL: While it does have proto-pipelines with display lists and command lists, this change only targeted the newer APIs. OpenGL worked fairly well on most drivers anyway, so there wasn't too much to gain.
Odds are, you probably won't see shader stuttering anymore even if you're on one of the configurations listed as not completely fixed. And if you do see shader stuttering, it'll be greatly reduced compared to before.
This is a big one for macOS users, but it will have implications for everyone else sooner or later. The GameCube/Wii GPU has features called "Pointsize" and "Linewidth" which allows a developer to draw a point or line and give it an arbitrary size without having to use actual vertices or polygons. OpenGL developers may find that this feature sounds familiar, but that is to be expected as the GameCube and Wii's pointsize and linewidth were based on OpenGL's! However, the GameCube added enough special sauce that they cannot be accurately emulated with OpenGL's native pointsize and linewidth. Since newer APIs have nothing like this and neither does D3D11, all of the APIs our backends use do not support this feature natively.
Many games use these features, and without emulation of pointsize and linewidth, any lines will be 1 pixel wide and any points don't appear at all!
So, considering our host GPUs can't natively render this, we had to find a way to emulate it. Our solution has been the use of Geometry Shaders. Geometry Shaders are an optional shader stage that can take a primitive in as an input and can output zero or more primitives. For Dolphin's purposes, we could use the "Point" or "Line" request as the input and then, based on the other properties the game specifies for that point/line, draw triangles that will create an identical shape.
When Dolphin adopted Geometry Shaders, they were a new and promising feature in graphics tech. Unfortunately, they were quickly superseded by newer features because Geometry Shaders are slow, so games avoided them. This put Geometry Shaders in a precarious position that finally reached a breaking point with the Metal Graphics API by Apple - Metal does not support geometry shaders whatsoever. Even though Geometry Shaders work in macOS under OpenGL, but, it's not in Metal. For us, this meant we needed to find a new solution if we were going to emulate Pointsize and Linewidth on modern macOS machines.
Originally, skylersaleh (who did Dolphin's port to macOS M1 and now works on skyemu) started an experiment with using tried and true Vertex Shaders. While this could make wide-lines and thick-points, the original experiment failed because it wasn't able to get them to render correctly. pokechu22 picked it up but quickly found that this wasn't a simple fix.
Dolphin as it was couldn't give the Vertex Shaders all of the information it needed to determine the way to orient line caps. A way to work around this was to have Dynamic Vertex Loaders, which would be able to determine the information we need directly in the Vertex Shader. Funnily enough, while 5.0-17403 was designed to make Ubershaders more effective, those same Dynamic Vertex Loaders could be used to finally finish Pointsize/Linewidth with a Vertex Shader!
Support for Geometry Shaders, especially with Metal outright abandoning them, might be on the way out. So having this option is good now, but might be outright necessary later. Currently, we still use Geometry Shaders when they're available, but on GPUs without Geometry Shader support, Dolphin will now use the Vertex Shader implementation instead. Users can also try it out - it can be enabled in Graphics Settings -> Advanced with "Prefer VS for Point/Line Expansion".
Because the Vertex Shader implementation is slightly faster, we will probably move over to it as the default implementation. This actually comes into play throughout the Metroid Prime Series, where the linewidth effects can effect overall performance. If your machine has a driver with incredibly slow Geometry Shader support, this may help things! However, on most modern GPUs, this is far down on the list of bottlenecks.
Note: Our D3D11 Backend does not have Vertex Shader Linewidth/Pointsize for now. It's not impossible to accomplish, but due to the inline constant system from D3D12 not existing, it would take a different implementation.
For people debugging, modding, TASing, or just in general messing around with a game's behavior, this is a major enhancement that will make your life easier. Dolphin has long had support for breakpoints, but the problem was that if it was a frequently used instruction, you'd have no way of telling it only to notify you if the condition you were looking for actually happened. Instead, debuggers would often rely on third party tools in order to debug more difficult issues.
This change adds support for basic logical conditions, but TryTwo has already started work on callstack based conditions and support for conditional Memory Breakpoints as well.
When Dolphin added support for the Integrated GBA there was some excitement about TASing titles that require GBAs. After all, Integrated GBAs worked with Dolphin's movie code as shown by Netplay Support!
Technically, Integrated GBAs could be controlled using TAS input when things were merged, but there was no built-in support. Instead, Bonta hooked up the existing GameCube Controller TAS input to the GameCube TAS Input window. However, as with any major, complex change, this detail was actually overlooked. When the TAS Input Handling was rewritten, this feature was accidentally removed and suddenly there was no way to control Integrated GBAs through TAS Input.
As the guilty party that accidentally removed support, JosJuice quickly jumped on the issue to rectify things as soon as they realized what had happened. But rather than just returning the old normal, JosJuice greatly improved the situation by making a special TAS Input window for Integrated GBAs with all of the controls now clearly labeled.
Dolphin's FPS counter has always just been there. We don't even know for sure how long we've had it - we found it in r488 (1.0-488) from 9 Sept 2008, but it could be even older! It has changed a lot in that time of course. Originally it was an overlay built directly into each graphics plugin, before being moved to the Rasterfont system split between videocommon and the video backend, and finally moving to the shared Dear ImGui form we have today. However, despite being moved and rebuilt several times, how our FPS display calculates frames per second has never changed. Until today.
Progress Report newcomer Sam-Belliveau was distracted by how Dolphin's FPS display constantly flickered above and below 60FPS. It annoyed them so much that he decided he was going to do something about it... and fell down a rabbit hole. First of all, he added VPS (v-blanks per second) and emulation speed measures to the performance overlay, so that users in fullscreen and on Android can finally get this essential information.
And he added color to the performance overlay, providing a very glanceable indicator of Dolphin's performance! The colors are cyan (full speed), green, yellow, orange, then finally red.
But most importantly, he changed how Dolphin's FPS display is averaged. Before we explain how he did it, here's a before and after of the FPS display with a title running at a perfect 60fps (59.94 technically).
Previously, the FPS display used an extremely simple formula:
number of frames in the sample / microseconds of the sample / 1,000,000. That's it. This is an incredibly basic way to calculate frames per second, and it has worked for Dolphin for a very long time. It does the job. However, the downside of this method is that it is "sample and hold". Between samples Dolphin just shows the previous sample, completely unchanged, which feels unresponsive and could make stutters and other problems either invisible or overly emphasized. To minimize this Dolphin could use a very short sample cycle, but that has the consequence that each sample has little data to work with. If your sample size is only a few frames long, then even a spike lasting a single frame could significantly alter the results of the average, making the FPS display highly unstable. A longer sample time would resolve that, but then we're back to having it feel unresponsive and hiding stutters and other problems. So Dolphin developers of the past chose a sample duration of 0.25 seconds - this leaned a bit toward frenetic and unstable but kept the FPS display reasonably responsive. With a super simple formula like this, compromises are unavoidable.
The solution to this is to use an exponential moving average on top of a standard moving average - we'll call this a "Euler Average" for short. Basically, it averages the framerate over the last second (by default) using a standard moving average, then applies an exponential moving average on top to reduce jitter. The result of this math is that Dolphin can sample every frame so it is always up to date, it can smooth out sudden spikes and drops for a stable, legible indicator, and it can update the FPS indicator every frame so it is visually pleasing as well!
To show how this works in action, here is a graph showing the two methods against eachother. The data we're feeding into it is from a Dolphin test with an unlocked framerate, with frame to frame swings as high as 100FPS. It's a worst case scenario for an FPS display. How do the old and new FPS formulas handle this awful data?
The purpose of an FPS display is to give a glanceable reference of a game's performance. But the FPS display with the original formula can't handle data that is this variable. It just goes all over the place; even if you stared at it you would have no idea how fast the game is performing! But the Euler average is able to display a proper average of the framerate even from this garbage data. Of course, no one should encounter frametime variance on this level in actual gameplay unless something has gone very very wrong, but even in this worst case scenario our frame rate display is now able to do its job thanks to the new formula!
A consequence of using the Euler Average is that there is a now ramp up. If a game goes from 0FPS to 60FPS in one frame, it will take the whole sample window (1 second by default) to resolve to 60. Averaging the current data with prior data prevents sudden spikes and drops from making the FPS display devolve into noise, so it has to catch up to extreme change. To account for this, this change also adds a control to allow the user to change the sample window duration. A longer window will give a smoother and more consistent indicator, and a shorter window will be more frenetic but will be closer to real time and show smaller drops.
This and other new controls necessitated a move for our performance display options. You'll now find them in Graphics > Advanced under "Performance Statistics". This is not necessarily their permanent home but it's a safe place for Sam-Belliveau to continue working on this feature.
With this, our performance statistics display is better than ever! But Sam-Belliveau is not done. Expect this feature to get even better in the coming months!
More Patches for dcache Reliant Games¶
Dolphin doesn't currently target emulating the CPU Data Cache (dcache) and once it does, odds are that enabling it will make emulation too slow. For now, we're relying on patching these games that accidentally rely on the GameCube/Wii's CPU dcache behaviors.
The past three months, we've seen two new patches from smurf3tte, who has done the hard work of reverse engineering these games, figuring out why they need dcache emulation, and patching their code to no longer need it! So if you're playing Dead to Rights (audio issues) or Ten Pin Alley 2 (crashing before gameplay) in the latest development builds, you have smurf3tte to thank!
Note that the patch for Ten Pin Alley 2 is only for the NTSC version of the game.
The Android GUI Refresh¶
There are far too many changes that have gone into this over the past quarter of the year. t895 has been extremely busy going through the Android GUI and modernizing it. From redesigning buttons, to making things fit your phone's theme/shape/size, and even some optimizations that will allow the menus to run more smoothly with fewer hitches.
Everything is at least a little nicer than before!
One of the biggest keys is that Dolphin now supports Google's "Material You" design system. By default, Dolphin uses its own color scheming under "Material 3", but if you want it to match your system-wide theming better, you can swap over to "Material You" and several other color based options in the theme menu.
In addition to the theming changes, t895 has greatly optimized the cover loading/caching mechanism to prevent the lockups on first load when swapping between GC/Wii/WiiWare tabs. This should make app usage much smoother overall. There are tons of minor fixes, especially for those with oddly shaped phones that should prevent things from going off screen or awkwardly resizing.
A bunch of the menus have also been redesigned to make things fit better and look nicer.
Some changes are subtle.
Others are radically better.
But basically every window has been given some needed attention.
While none of this directly affects emulation, it does make setting up the emulator easier and allows for greater customization. Those of you that want your phone and its apps to be personalized for you should have a lot of fun with these changes. For everyone else, these changes will make setting up and adjusting things from within the emulator much smoother.