Let's kick off the new year with a bang! January will finally let Dolphin answer the question that gets asked every progress report: "Does Rogue Squadron work yet?"
Thanks to a ton of work from the staff, tons of testing from the forum users, hardware tests, newcomers and veteran's alike, Star Wars Rogue Squadron II: Rogue Leader and Star Wars Rogue Squadron III: Rebel Strike are both playable and completable in Dolphin at long last.
Considering just how many big merges were changed and how much work was done that may not even be the biggest news of the month. So hold tight, and please enjoy this month's Notable Changes!
A few months ago Fiora as much as doubled the performance of MMU games through improvements like the "far code" cache, implementing paired loads/stores in MMU mode, and a few other tweaks. Regardless of all of her, and other developers, optimizations, MMU mode still remained a very demanding feature.
Developers attacked the problem again over the past two months, with a goal of reducing MMU overhead as much as possible. While there are quite a few MMU games, the goal was to get Rogue Squadron 2/3 near full speed on current hardware before it was playable.
There were many changes revolving around two basic ideas: eliminate as much of the possible impact of the MMU on all code in the game that didn't actually use those features, and shorten the address translation code path, from memory loads and stores to their associated page table lookups, as much as possible. skidau started by fixing block linking to work in MMU mode, and then magumagu extended this by improving fastmem to support MMU too.
Fiora then painstakingly assembled roughly two dozen MMU-related patches, including optimized paired loads and stores, exception checking and TLB lookups while fixing a number of bugs that could cause random crashes in MMU titles.
The overall performance improvement of Fiora's "Faster MMU 2" was on the order of ~80% in the Rogue Squadron titles and significant amounts in other MMU titles. Mixed with the performance improvements from magumagu and skidau, some MMU titles are nearly twice as fast as just a month ago!
The MMU performance improvements have also inspired magumagu to start work on a variety of large-scale, much-needed changes to unify and correct Dolphin's memory handling -- which we hope may lead to the other two titles in the Disney Trio of MMU Destruction* to actually work.
This is one of those nifty enhancements that has been talked about for a few months now. By overriding the clockrate of the GameCube/Wii CPU, users can affect games in quite a few ways.
Original intended use
- Variable Framerate Games - Some games support variable framerates, such as The Sims series, Gauntlet Dark Legacy, Spyro the Dragon: Enter The Dragonfly, Crash Bandicoot: Wrath of Cortex, The Last Story and many others. Depending on the CPU load, they will swap between 20, 30, and even 60 fps. By giving the GameCube/Wii more processor horsepower, Dolphin can now allow these games to run at their maximum possible framerate at all times.
Most 30 FPS games are not variable framerate titles, and won't run at a higher framerate even if the CPU is overclocked to 400% without some kind of game specific patch/hack. Anyone willing to try to make some should go for it, as the difference we've seen already is immense!
Other discovered uses
Cycle Accuracy Issues - Dolphin is not a cycle accurate emulator, so sometimes its emulation of the GC/Wii CPU is not accurate enough in a game. Several games have videos that rely on IPC (Instructions Per Clock) very carefully, such as The Legend of Zelda: Ocarina of Time Master Quest's promotional videos. By being able to overclock or underclock the processor, users can avoid hangs caused by Dolphin's CPU speed inaccuracies. In the future, the goal will be to make Dolphin's CPU emulation better so that a hack is not needed to make any game run correctly. In fact, there are already plans to get this ready; so anyone interested in the actual implementation would likely be able to jump right in and help. Underclocking can even help with weird glitches like the sun flickering in Dualcore with The Legend of Zelda: The Wind Waker.
Speedhack - In some games, this feature can be used as a speedhack. By lowering the clockrate of the emulated CPU, it reduces the demands the emulated wii places on the users machine - making it much easier to emulate the game without slowdown. While the actual game may run worse (choppier in some games, slower in others,) it can be preferable to the audio stuttering/unpredictability if the host CPU is too slow to emulate the game.
This is a relatively self-contained fix; but it's definitely worth noting as it allows several games to boot properly. Several games based on Nickelodeon properties used this specific method to render cartoon cutscenes; which inevitably ended in failure as Dolphin didn't emulate this at all. With the fix, things will render properly, but because of how slow it is to emulate EFB Pokes currently, people playing these games may want to disable this feature by enabling "Skip EFB Access to CPU" in order to get into the game faster.
With a feature that lets Dolphin write more than one pixel per poke, it should be possible to make these videos work fullspeed; it's just a matter of implementing it.
A year ago degasus merged an amazing optimization called texture pooling. Texture pooling is a cache for unused texture objects. Allocating and freeing these resources isn't an easy task, especially for OpenGL. As such, people would notice that refraction effects in games like Metroid Prime would bring OpenGL to its knees, while only mildly bothering D3D.
In the old system, if the texture cache entry didn't match, Dolphin would free it and create a new one. With texture pooling, Dolphin doesn't free it but instead pushes it into a pool. And instead of creating a new one, it will also check first within the pool to see if the texture already exists. Texture pooling resulted in absolutely massive speedups in games that hit this bottleneck.
Unfortunately, without the necessary cleanups and work done, it caused crashes and other issues and had to be reverted despite its huge potential as a performance enhancement. Now, after several texture cache cleanups and a much more carefully coded patch, texture pooling returns back and better than ever. This is an absolutely massive speedup in some games, in Harvest Moon: A Wonderful Life this can amount to a 1000% speedup during nighttime sequences!
In most games, there will be more moderate speedups. The Metroid Prime series will see great improvements on a lot of their special effects. It seems that almost every game benefits from the increase in efficiency to the tune of 5 - 15%.
Dolphin's Vertex Loader was one of the obvious bottlenecks that seemed like a low hanging fruit. Fiora showed significant performance gains a few months ago through basic optimizations of our existing Vertex Loader, but it was still a very primitive sort of JIT; it was well known it needed a proper rewrite for ideal performance.
Once degasus did the necessary cleanup and preparation; he passed on the task of rewriting the x86-64 Vertex Loader onto Tilka. After a few weeks of struggling, and a few odd regressions Dolphin's brand new Vertex Loader JIT was merged.
What this does is more efficiently convert vertices passed from the emulated GPU into a format usable by the host GPU; meaning less CPU overhead. How big of a benefit mostly depends on if, and how much, the game was bottlenecked on the vertex loader, but in some areas on Rogue Squadron II the game can be up to 50% faster in Vertex Loader-limited scenes, like the ship bay with 10,000+ polygon ships!
The one catch is that the Vertex Loader JIT relies on SSSE3, so only SSSE3-supporting CPUs (Core 2 and newer for Intel, Bulldozer and newer for AMD) will benefit from the speedup.
This one is pretty self-explanatory; users can use configure their keyboards to allow them to type in GameCube games and Homebrew that support the GameCube Keyboard Controller, such as Phantasy Star Online. Another awesome, obscure GameCube peripheral emulated in Dolphin! Unfortunately GameCube Controller Adapters, including Native GameCube Controller Support, will not allow you to plug in and use a GameCube Keyboard, since the adapters do not transfer serial input directly. The one exception to this would be the Raphnet Adapter, since it could convert it into ordinary keyboard presses; but it is currently unknown if it will work properly. If anyone finds out, please let us know in the comment thread.
Sometimes, when digging through the code some funny little problems can be found. In this case mimimi realized that the texture cache for paletted textures was completely broken. This meant that when using emulating framebuffer copies to texture, they would be a garbled mess. When properly sending the emulated framebuffer copies to the emulated ram, the texture cache would have to be disabled or else the textures would not detect they needed to update.
What caused this oversight? Technically, it actually spawned from a speedhack from ancient times that made paletted textures a lot faster in Dolphin. But, users who have already updated may have noticed that mimimi's quick fix doesn't cause any performance regressions. It turns out that the previously mentioned Texture Pooling merge prevents the slowdown that this merge would have caused!
If that wasn't good enough, by fixing how they were handled, EFB copies to RAM will no longer need safe texture cache for paletted textures! This isn't a complete solution though - all it's doing is reinterpreting the EFB copy as a paletted texture. In one unfortunate case, this missing functionality breaks games like Dragon Ball Z: Budokai Tenkaichi 3 in EFB copies to texture. Anyone experienced in graphics programming could likely write a GPU decoder for this and fix not only that, but also get Twilight Princess, Rogue Squadron, and the rest of the affected games working perfectly with paletted textures without needing the expensive EFB copies to RAM option.
Dolphin's lighting code is not one of its bright spots. It's to the point where previous attempts to sort out what it was doing and compare it to how console works left the coder dismayed to the point of not wanting to mess with it. NanoByte011 being relatively new to the project, did not realize this and ended up solving a lot of Dolphin's weird lighting problems while reshuffling Dolphin's light attenuation code.
It's really hard to say how much this actually fixes. While there are a few big examples where known issues were fixed, a majority of Dolphin's lighting problems were relatively minor and hard to notice. There could be hundreds of games that perform more like their hardware counterparts.
A feature that often goes overlooked by users is that Dolphin has the ability to dump textures from games, so users can modify them and then reload them into the game through the "load custom textures" option. By doing this, users can create all kinds of textures, but by far the most popular use of this is for high definition texture packs. By placing these texture packs in the load directory and enabling the option, Dolphin can greatly enhance the visual fidelity of the game in question.
The most expansive HD Texture Pack to date is for Xenoblade Chronicles. While the texture pack is a massive work of art, the people behind it were experiencing problems. Namely, Dolphin's way of handling paletted textures was insane; sometimes there would be thousands of duplicates of the same texture, and for the HD texture to be guaranteed to work; they'd have to replace every single one. degasus does away with that design issue and adds a bunch of new enhancements to make HD Texture Packs easier to make and use. Hopefully with these changes, Dolphin will see many more custom texture packs in the future.
Do note, that compatibility with older texture packs will be broken by this. Dolphin will currently convert old format custom textures into new ones as they are loaded if an INI setting is enabled, but that functionality WILL be removed eventually. All users actively working on or managing custom texture packs are advised to convert their texture packs to the new format. Details are available on the forums.
zfreeze is a notable feature of the GameCube/Wii GPU with no real equivalent on modern PC GPUs. It can "freeze" the depth value for pixels in a polygon to an arbitrary reference plane. The intended use for this was to combat z-fighting, that ended up being used in a variety of ways by different games. While this sounds like something that should be fairly easy to emulate, it definitely isn't. Limited ability to understand the feature on top of limitations within what Dolphin can do with OpenGL and D3D made it a nightmare to even comprehend how to tackle the feature.
It has gotten to the point where tackling zfreeze has gotten personal for many developers. For years, it has taunted the Dolphin as this seemingly impossible to emulate feature that breaks some very popular titles. Not even the software renderer had a working implementation! Many attempts were made to properly emulate it, hack it, or work-around it in a way that would make the feature less of a stopping point, but nothing succeeded.
The first partially successful attempt came from neobrain in 2012. His zfreeze branch actually got Rogue Leader's skybox to work in certain situations, but attempting to fix any of the other titles immediately broke Rogue Squadron II. He indefinitely put the project on hold in order to write hardware tests, but never got around to it and eventually lost interest in emulating the feature.
While his branch may have been left in the past, the desire to play Rogue Squadron 2 (and 3; once it started booting in Dolphin,) never left Dolphin's userbase. One of the most asked questions after every single progress report posting was "Is Rogue Squadron playable yet?" Eventually, phire came up with a hack to at least make the Rogue Squadron games work correctly with zfreeze. This sacrificed compatibility with all other zfreeze titles, and wasn't ever considered for an actual build, but nonetheless planted the seeds of curiosity.
phire's project was to make a set of hacks that would work with every single situation that zfreeze was used for in various titles. During this time, he ran into several different uses for zfreeze.
Combating zfighting on Decals¶
Without zfreeze, zfighting rules the day.
Proper zfreeze allows all the decals to sit flat without zfighting.