TL;DR: I created a patch called EmperorLauncher, which modifies Emperor: Battle for Dune to run well on modern systems, with:
High resolution support
Working online multiplayer with direct ip connection
Coop Campaign mode
You can download the patch here, and the source code is available on github.
Me and a friend playing Coop Campaign in 4K
The rest of this blog post is a fairly technical explanation of how and why I made this thing.
Table of Contents
What is Emperor: Battle for Dune?
Emperor: Battle for Dune is a 2001 realtime strategy game made by Westwood Studios, arguably the inventors of the RTS genre. It is a sequel to Dune 2000, itself a remake of sorts of Dune II, considered by many to be the original RTS. The Dune RTSes hold a special place in my heart, with Dune 2000 being the first PC game I ever bought, and my introduction to the Dune universe. Emperor followed up on Dune 2000, bringing 3D graphics, vastly improved UX, and an absolutely bangin' soundtrack. All that said, it isn't that well known these days. I suspect this is because it was outshone by Westwood's other RTS series, the juggernaut Command & Conquer.
Google trends, 2004-2010
What's wrong with it?
A lot. The years have not been kind to Emperor:
The game can't run at higher resolutions afforded by modern screens
Game simulation speed is uncapped in multiplayer, rendering it unplayably fast
Westwood Online (WOL) doesn't work anymore, so you can't play multiplayer except through LAN
You can't play the campaign in coop mode at all, because that was an online-only feature not supported over LAN
The installer included on the disk is broken
And finally, many visual effects are broken by the high framerates of modern PCs
Main menu in seizure mode
I love coop games, and I always wished more RTSes in particular had a coop mode. When I found out all these years later that Emperor had a coop mode that I'd never known about, and it was no longer playable - I knew in that moment, deep in my heart, that this was a cosmic injustice that must be put to rights. I also just had a hankerin' to do some reverse engineering, so...
How do we fix this mess?
I started off with a modest initial goal. The "main" exe of the game is a misdirection: Emperor.exe is a thin wrapper that runs the real game executable, Game.exe. But if we run Game.exe directly, nothing happens. So my initial goal was to make a replacement for Emperor.exe, so I could control the launch of Game.exe. Later on, once I have this control, I can use it to inject a DLL containing my patches. More details on that later.
At this point I hopped into IDA to see what else Emperor.exe was doing other than calling CreateProcess.
What is IDA?
IDA is the industry standard reverse engineering tool. It is an incredibly powerful tool that operates on a database of knowledge about your executable. At its core it is a disassembler (IDA - Interactive DisAssembler), which turns machine code into slightly more readable textual assembly language.
Part of a disassembled function from Emperor.exe
It is also able to go one step further than that, and decompile the assembly into mostly-compilable C code. It will need some help from you, though. At the beginning, it will not know the types of anything. All the datastructures and typedefs are completely gone, everything is just an integer or int*, functions have been inlined, or optimised out entirely, and everything has names like sub_402E80 or a2.
The same code, decompiled into C
As you browse through the code, you annotate things that you know. For example, in the function above I was able to look at the pointer that was being passed in, and see which offsets were being used. Since CreateProcessA is a known, documented Windows API function I was able to infer what was contained in some of those offsets. Using that knowledge, I was able to create a custom structure definition (ProcessRunData) with some fields filled in. Now I have a type for the parameter, so I can search for callers and annotate the type of the variable being passed to runCommand. And oh look, I'm using some of the fields in ProcessRunData with some other functions and variables, so now I know their type too. In this fashion, you could slowly flood-fill information through the whole binary until you had a full understanding of the whole program if you wanted to. That would of course be incredibly laborious for a game this size.
This is probably a good time to mention that I am not a skilled reverse engineer. I'm definitely not a professional, and this is my first real foray into reverse engineering a binary like this. Cosmic justice aside, the main reason for this project was learning about reverse engineering for fun. So all that said, I spent the next couple of evenings reverse engineering a curious set of string manipulation functions that turned out to just be std::ostringstream.
Behold, function names
Eventually I dug myself out of that rabbit-hole, and found out what Emperor.exe was doing that was so special. Before running Game.exe, it creates a mutex, and an anonymous file mapping handle - essentially just a chunk of allocated memory with a handle associated to it. It doesn't do much with the mutex, as far as I can tell just using it to make sure there is only one instance of the game running, but it does something odd with the file mapping. It launches Game.exe with bInheritHandles set to 1. This means the child process "inherits" our handles - it is able to use the same numeric value of the open file mapping handle, as though it had opened it itself. Emperor.exe then loads some data from the file Emperor.dat in the install directory, and does a bunch of manipulation to that data, presumably some sort of decryption. It then maps the mapping handle, and copies the decrypted data into the mapping.
So now we have some decrypted data in a mapping. The mapping handle is valid in both processes, so the child process can use the handle to retrieve the data that the launcher stored. But the child process doesn't know what the handle's value is. So the parent needs to send it somehow. Windows has an IPC message passing system where threads have Message Queues. This is used as the basis of window functionality - windows (HWNDs, actual window objects) use this queue to send and receive the messages that make them work.
The parent process tells the child the handle value by sending a message to the main thread of the child, using 0xBEEF as the custom message ID, because why not. This works because CreateProcessA will actually tell the parent process the thread ID of the main thread of the newly created child process. I didn't really want to reverse engineer the decryption code, so I created a dumping tool that I could sub in in place of Game.exe, which reads the data sent to it and dumps it to disk.
Not that kind of dumping tool
Turns out the data was "UIDATA,3DDATA,MAPS", three strings that get passed to a bunch of asset loading code, making sure the game cannot work without them. From that point it was simple enough to write some code to perform the sequence myself, and I was able to start Game.exe successfully.
Patch injection
Now we get to the real meat and potatoes of this project - injecting our custom patches into Game.exe. We can force a process to load and run our code by using the ol' CreateRemoteThread & LoadLibrary trick.
Excuse me, what trick is that?
CreateRemoteThread is a Windows API function that takes a "ThreadProc" function pointer, argument pointer, and process handle as arguments. The process handle is normally a handle representing the calling process, but not this time. When CreateRemoteThread is called, it creates a thread in the target process, which runs the passed function with the passed argument. LoadLibrary is a Windows API function that takes a string path, and loads the DLL located at that path into the process. DLLs on windows can have a dllmain function which will be called when the DLL is loaded.
A function suitable to be passed to CreateRemoteThread must take a single pointer argument. LoadLibrary is a function that takes a single pointer argument. Do you see where this horror show is going?
So the only remaining problem is that we need to get the path to our DLL into the memory space of our target process. Not a problem, we can use our friends VirtualAllocEx (which takes a process handle parameter), and WriteProcessMemory to allocate a buffer in the target process' memory, and copy our DLL path into it. We then grab the address of LoadLibrary and pass it, along with our newly created DLL path buffer, to CreateRemoteThread. A new thread gets created in the target process, and loads our DLL. The dllmain function present in that DLL runs, and boom! We have code execution, from here we are free to do whatever nefarious things we like.
Wait a minute, how do we know the address of CreateRemoteThread in the target process? Windows does have ASLR, so DLLs are loaded at random addresses for security reasons. You would think this would make it impossible for us to know where CreateRemoteThread is located in another process. Well, bizarrely, we can just use the address of CreateRemoteThread in the current process. Windows does randomise DLL base addresses, but it will randomise on first load, then try to use the same address in every process after that. Documentation on this is hard to find (if you do find something concrete, please let me know), and Raymond does only say the kernel will "try" to use the same address, but in practice there seems to be a special rule for some basic DLLs, like Kernel32.dll (which contains LoadLibrary), so they are always loaded at the same address on a given boot, to enable procedures like this one. Even if it isn't documented, this is a well known technique, which is used in all sorts of things out in the wild, including, I'm sure, some Microsoft products - so by now it's not something they could afford to break.
So, if we start the process in a suspended state, then inject a dll, we get code execution in the target process before main is run. But what we really need is to modify existing functions in the Game.exe binary. How do we do that? In short, we use the detours library[1].
What is detours?
Detours is the real black magic of this world. It is the arcane invocation that dispenses with arbitrary social constructs, and cuts straight to the fundamental truth of the machine. Types are an illusion. Data structures don't exist. There is only functions. And detours patches functions.
Let's say we want to wrap a function from the standard library with some logging, in this example we'll use sendto. We want to replace all calls to sendo with our modified function, which does some logging, and then calls the real sendto function. Something like this:
We can grab a pointer to the original sendto function pretty easily:
Then comes the real magic. We can just edit the bytes of the original function, replacing the first instruction with the opcode for an unconditional jump to our patched function. It's a little bit more involved than just memcpy(&sendto, &jumpCode, sizeof(jumpCode)); though. Memory pages containing functions are not writable for security reasons, so we need to set it writable with VirtualProtect, edit the function, then set it back. We then need to call FlushInstructionCache, because otherwise we could end up with different instructions in cache vs memory which could cause a multitude of problems. And of course, we need to be sure that while we're doing all this, there isn't some other thread executing the code we're modifying. In our case we're executing our hooks before main runs, so we know there are no other threads to worry about.
Ok, so now we've redirected the original function to our replacement, but there's one thing missing. If we want to wrap the original function, not just replace it, then we need a pointer we can call to run the original, unpatched code. But we've just stomped all over it by replacing random instructions with an unconditional jump. This is where the real genius of detours comes in. The library will take note of the instructions it ruined, copy them to a newly allocated chunk of memory somewhere, and set up a little wrapper function. That function contains the copied instructions, followed by a jump instruction that jumps back into the original code, just after the last broken instruction. As you can imagine, there is a lot of subtlety here, especially when dealing with an instruction set like x86, where not all opcodes are the same length. The end result is you can do something like this:
sendto_orig now points at the magic generated chunk, so it behaves exactly like the original function, while the original sendto address is now set up to jump straight into our wrapper, leaving us free to call the original function from inside our wrapper.
Now I can take the addresses of functions that I found and figured out in IDA, and wrap / replace them with detours. During my miscellaneous poking around in IDA, I noticed several calls that looked like some sort of debug logging function. When I looked at the function being called though, it was empty, just a return instruction. I hooked up a detour which forwarded the log message on to vprintf, but no luck, the process just crashes - a segfault inside vprintf.
The problem with this function is that it's not one function, it's many functions. My guess is that in the original source code, there were a bunch of debugging functions that were behind #ifdef DEBUG, and in release builds they were replaced with empty functions. This, combined with things like empty virtual functions to satisfy inheritance, means the final binaries end up with a lot of empty functions. They all compile to nothing but a single ret (return) instruction, and then the linker says "oh look, I have duplicated functions" and merges them into one location. Unfortunately, one of these empty functions is the debug logger.
My first attempt at solving this was to use a heuristic to detect if the parameters I've got look like a log string. The heuristic is pretty simple: if we interpret the first argument as a pointer to char, then we iterate a few bytes and check if those bytes are ASCII printable characters. If so - great, we have a log call, we can forward our parameters to vprintf. If not, then do nothing. There's also the obvious problem here that we can get all sorts of random values passed in here as a first parameter, and they're not all valid pointers at all, let alone pointers to null terminated strings. So, there's a pretty high chance we're going to segfault. But that's ok. Windows handles segfaults by throwing an SEH exception. We can catch and ignore the exception, then presume the call is not a logging call. Surprisingly, this actually worked pretty well.
I wonder how long it's been since someone last read these log lines
That was cool, but not really enough. There were still false positives and false negatives, and false positives could still cause the game to crash. If I wanted to fix this for real, then I'd need to get in there and annotate every. single. call. I could give myself a bit of a head start though. I tightened up my heuristic to give no false positives (at the expense of some false negatives), and logged call sites by using the AddressOfReturnAddress intrinsic inside my detour.
What I really wanted was to separate the logging calls into a different function from the other empty function calls. Actually, two functions, because there were two variants of the logging function. IDA has a patching mode that allows you to make binary changes to the executable under analysis, without applying them to the real file on disk. It then uses those patches during disassembly and decompilation, so you get a virtualised view of how your patches will affect the binary. You can also dump the patches to a file, or apply them to the original binary.
The original empty function had a bunch of zeroed out padding bytes immediately following it, so I could patch in a few empty functions in that padding space by inserting ret instructions. Then I patch the actual logging calls to call one of the new empty functions, and later on I can detour that function to reenable logging. With an incomplete list of addresses from my heuristic dumper, I could use IDAs built in python scripting to patch known good call sites.
IDApython script to patch call sites with an offset
My heuristic worked pretty well for most of the call sites, but it was not all of them by a long shot. I wrote another python script to find unpatched call sites and apply some more heuristics, like looking for patterns of pushing a string constant before the call instruction. These got me another bunch of calls, but in the end I still had several hundred call sites that I just had to manually annotate. It took a few excruciatingly boring hours, but I got it done. I then exported the patch data, reformatted it as a C++ array, and applied the patch at runtime.
The reason I spent so long making sure the debug log was working so well, was that I had an intuition that it would end up being enormously helpful during the rest of the project. Actually, that's a lie, it was just a fun technical challenge and made me feel like a digital archaeologist. But hey - I had you fooled with that excuse right? Jokes aside, it really did help to have the original logs. For example, when I was working on getting multiplayer over WOL (Westwood Online) working, the client was hanging some time after receiving the game start command. By looking at the debug log, I could see a failed assert message: "MyId == INVALID_ID". I then searched IDA's strings view, jumped into the function where the assert was failing, and realised that it was the function which handles receipt of an SC_MESSAGE_YOUR_DETAILS message. A few lines above I saw that we had logged successfully receiving an SC_MESSAGE_YOUR_DETAILS message already. This tipped me off to look at my wireshark dumps, and I noticed that I was incorrectly sending a GAMEOPT command containing an SC_MESSAGE_YOUR_DETAILS message to all connected players instead of just the target. Without the debug log prompting me, I have no idea how long I would have taken to notice.
Patching graphics
Contemporary graphic design from the D3D7 era
Emperor was built with the then-current graphics API Direct3D 7. Broadly speaking, DirectX 8/9 is around when graphics APIs started to vaguely approximate the kind of feature set we expect today, moving away from a fixed-function pipeline and towards a modern shader-based world. DirectX 9 in particular, was still occasionally used up until quite recently, with some still-popular games still using it (for example CS:GO, which was only retired in 2023). I won't get into the weeds on graphics APIs here, but suffice it to say that Direct3D 7 support in modern windows is... patchy at best.
High resolution windows
One of the major issues is the resolution limit. From what I can tell, Direct3D7 on modern systems is implemented as some sort of wrapper layer that redirects to a more modern version of DirectX, like 9 or 11. Somewhere in that wrapper layer, the maximum texture size is being limited to 2048. Luckily, I was able to pull in some code from UCyborg's LegacyD3DResolutionHack patch, which solved that problem for me[2].
There was also another problem. The game cannot handle running at anything but a 4:3 ratio. Rendering works, but the UI is completely broken, like it's zoomed in way too far. Also the in-game mouse rendering has a broken offset, which varies depending on your distance from the centre of the screen[3].
Mouse offset. The system cursor is where clicks really register.
So, we need some 4:3 letterboxing. This was actually surprisingly difficult. I tried just telling the game to render fullscreen at 2160x2880 (a 4:3 slice out of 4k), but it didn't like using arbitrary resolutions in fullscreen mode like that. The game does have a windowed mode though, accessible by passing -w in the command line args, and arbitrary resolutions are not a problem when windowed. But then we get an ugly floating window with a border, that doesn't cover up the taskbar. I tried a bunch of bad ideas, but what worked in the end was a patch to remove the border style, and reparent the game window on top of a fullscreen black window. I also added mouse capture, so edge scrolling isn't broken if you have multiple monitors, or want to play in windowed mode.
To limit the framerate to 60 FPS, we need to detect the current framerate, calculate the amount we need to delay, and then sleep the thread for that amount of time. To do that, we need to patch a function that is called once per frame, preferably at the very end of the frame. IDirect3DDevice7::EndScene fits the bill perfectly. But this was a little tricky, because IDirect3DDevice7::EndScene is not exported directly - it's a member function of a "lightweight COM" object created through a chain of COM method calls.
What is lightweight COM?
COM is many things, but among other things it is a method of representing objects that allows us to have ABI stability. Lightweight COM (also known as nano COM) is a subset of COM concepts that focuses on that representation and ignores the rest.
What is ABI stability?
To understand ABI stability, let's take a step back and discuss API stability. API stability means that you provide a source-code level API that does not change, so consumers of that API can upgrade to new versions without having to change their code. ABI stability is taking this concept one step further for native code, and providing guarantees that consumers can upgrade your library without recompiling.
For example, if I add a new field to a struct that is part of my public interface, the API has not changed. Any code written against the old version can be successfully recompiled against the new version, and all is well. But an old binary, compiled against the old version will not work anymore. The compiler used information about the size of the struct, and the offsets of its members during compilation, and that information is now wrong. This binary interface is known as the ABI (Application Binary Interface). ABI stability is the practice of crafting updates to native code libraries without causing this kind of breakage.
Microsoft doesn't want to force everyone to recompile their DirectX apps every time they add a field to a class, so the COM object representation takes care of that. In essence, a COM object is a struct that contains, as its first member, a pointer to a table of functions (known as a vtable). Something like this:
When you're actually implementing a COM object in C++, Microsoft's MSVC compiler gives you a bit of help, but under the hood this is what you get. You could also implement a COM object in plain C, and in that case you would need to handle your layout manually. Under COM, you never access the members of a class directly, the consumer of your class only ever sees your vtable and a pointer to your class. This way, consumers never depend on the exact size or layout of the underlying struct, so you're free to change it.
COM allows a single object to provide multiple interfaces, think of this like normal class inheritance, but with some extra possibilities. Each interface has its own base pointer, with its own vtable. You switch interface not by a simple cast, but by using a special QueryInterface function. There is also some reference counting going on, but that's not important to this discussion.
To get a pointer to the IDirect3DDevice7::EndScene function, the application has to:
Call DirectDrawCreateEx to get an IDirect3D7*
Call IDirect3D7::CreateDevice to get an IDirect3DDevice7*
Grab the function pointer from the vtable on the IDirect3DDevice7
So I'm confronted with a dilemma. I want to get the function pointer, but in order to do so I need to call a bunch of Direct3D functions that would create complicated objects, potentially interfering with the application. But maybe...
... maybe we could let the game create the objects
So in the end, what we do is the following:
Patch DirectDrawCreateEx on app start
When DirectDrawCreateEx is called for the first time, before returning we use the IDirect3DDevice7 we just created to grab a pointer to IDirect3D7::CreateDevice, and patch that
When IDirect3D7::CreateDevice is called for the first time, before returning we use the IDirect3DDevice7 we just created to grab a pointer to IDirect3DDevice7::EndScene and patch it to insert a frame rate limit.
Patching networking
It's got a 28.8 BPS modem
The next job is to get multiplayer working. What we want is a very basic version - no fancy lobbies, hosted servers or clan systems. Just forward ports and then copy/paste your friends IP to connect. Classic style, and most importantly requires no infrastructure so it won't break if I lose interest in maintaining it.
There's two roads we can go down here, patching LAN mode, or WOL (Westwood Online - the game's defunct online service). LAN mode is actually working, however it is not ideal for modern players. It depends on sending UDP broadcast packets to announce servers, a commonly used trick in old LAN games. The idea is, when you host a game, you broadcast a special packet to all hosts on your LAN subnet, essentially announcing "I am running a game server". That lets the game show a list of lobbies in the LAN menu, so players can connect without having to open a cmd window and call out their IP address to eachother.
The downside is that, for obvious reasons, broadcast does not work online. And as the game has no way to specify an ip to connect to manually, there is no way to play with a friend online using this mode. I did initially work on allowing online play through LAN mode by patching the LAN chat to allow you to specify an IP to connect to, but once I realised there was a WOL-exclusive coop campaign mode I abandoned that approach.
There's two components to getting WOL working. We need to stand up a fake WOL master server so the game can figure out where to connect and what kind of game to start, and then we need to do some proxying of the game packets to make it work over a direct IP connection. Surprisingly enough, there is an master server running at xwis.net. From what I can tell, it's fan run, and at time of writing they have access to the original DNS entries used by the game (servserv.westwood.com). Emperor is not really working out of the box on xwis though, so despite them mentioning Emperor on their homepage, I think it's really only used for other Westwood games in the Command & Conquer series. Still, it's enough to make and join a lobby so we'll start with the packet proxying and come back to the master server issue later.
The packets must flow
You're probably wondering "why the hell is he talking about proxying packets, I thought he wanted a direct connection". Well, I do. But there's a problem. Emperor uses a peer-to-peer networking model. Pairs (every pair? not sure) of players open direct connections to eachother. Critically, they open two connections for each pair one from A->B and one from B->A. This means every client in the game needs to have open ports so they can accept connections. Worse still, the game randomly chooses which ports to listen on. That's really not gonna fly, unfortunately.
Originally, the WOL master server was accompanied by a "mangler" server. "Mangler" here seems to refer to mangling packets for NAT punching, a technique for allowing connections between two clients, both of whom are behind a NAT so direct connections are not normally available. This is unreliable at best, but more to the point, it needs an accessible server to coordinate the connection and the original is long gone. The game connection was just hanging at this stage, waiting forever for the mangler server to respond, so I patched out the mangler call entirely.
Now we come to the real problem: the server will assign port ranges to each of the clients, send the ips and port ranges of each client to all the other clients, and they will attempt to open connections to one another. The meta communication about client list and port ranges takes place over IRC (yes, actual IRC, more on that later) hosted on the master server.
I solved this by completely co-opting all of winsock (winsock is the windows implementation of network sockets). I intercept all the required functions, and tunnel all connections through a single client->server network connection. So when a client wants to send a message to the server, or even to another client, it is intercepted, wrapped in a header and pushed through the single network connection. There is a thread running on the server dedicated to receiving messages, and dispatching them to the appropriate destination. There are some subtleties around determining the appropriate destination based on port range, faking the correct sender ip, etc, but in the end it works quite well. The game still believes it's operating in a peer to peer fashion, but we get the benefit of being able to direct connect with only the server host needing to worry about their network config.
With all of that setup, I was able to start and join probably the first Coop Campaign game in over a decade.
Atreides tutorial mission, immediately followed by a victory dance around my apartment
The part where I write an IRC server for some reason
As mentioned before, the WOL master server is really just a glorified IRC server. There is some customisation though, so luckily the xwis server that's still running served as a great example to work from. There is even an open source WOL server implementation for reference as well.
But wait a second, why write a server at all if there's one available online already? Well, because it won't be forever. My goal with this project is partly about just being able to play coop with my friend, yes, but it's also about cultural preservation. I want to craft a collection of bytes that ensures anyone who wants to play this game can do so, forever. If I rely on some server being up, I can't do that. As for why I didn't just use the pvpgn open source code - that project has very different goals to me. I want to provide the bare minimum needed to get a multiplayer game running, not run a competitive play community.
WOL is a weird mix of standard IRC stuff and custom bits. Game lobbies are special channels, they use standard IRC topics for game lobby info, but then use PAGE instead of PRIVMSG for sending messages in lobby chat, and synchronise game settings with GAMEINFO messages whose content isn't even ASCII. In the end, implementing a basic WOL server wasn't too complicated. It took some trial and error, and it definitely isn't robust if you wander off the happy path, but it works.
Packaging
Ready for UPS
Replacing the installer
As I mentioned at the beginning of this post, the original installer for Emperor is broken. Westwood did release a workaround installer that you can use by copying the contents of the install CD to your hard drive and overwriting the setup exe, but honestly that's a mess. I'd prefer to provide a nice simple tool that can handle it for the user, and also deal with patching the game up to v1.09, the last official patch.
The basic install is pretty straightforward, just copying some files off the CD, and extracting some others from a .cab file on the CD. cab files are just a basic archive format, like zip, and there is an interface built into windows for extracting them.
The hard part was patching to v1.09. I was hoping I could just grab a copy of the patch file, EM109EN.EXE, right click -> extract with 7zip and be presented with some shiny new up to date binaries, but alas no. The next thing I tried was checking for windows resources. Windows resources are a standard method of embedding binary data into an executable file on windows, supported by the compiler and build system. Here, I actually got a hit. I didn't have any filenames, so I couldn't tell their types, but there were a few, mostly tiny files on the order of a few hundred bytes. One of the files was a few hundred kilobytes though. I opened it up in a hex editor and saw something I didn't expect.
This program cannot be run in DOS mode
That's a windows executable file header![4] So we have an executable file embedded in the patch exe. I tried to start analysing the new binary, but none of my tools worked. "file", the unix magic number file identification tool couldn't tell what it was either. Clearly something wasn't right about this file. If I'd paid more attention and done some research on the windows PE format, I would have seen that that "MZ" bytes that should be at the very beginning of a PE file were offset by one word, and that was all that was wrong. But for now I decided to go back to the original patch binary and see if I could figure out what it was doing with the executable.
Loading & running the DLL
I went searching for uses of LoadResource, a standard windows API that a program can use to get a pointer to a resource it has embedded, and I found this code. It is extracting the binary resource we found to a temporary file, then loading the temp file as a .DLL and running a function from inside it. We can also see here why we weren't able to load the executable before - the first four bytes of the resource are storing the size of the file, then the rest is the file itself. I'm not quite sure why they did this, as they could have just used the SizeofResource function instead.
I also noticed when running the patch program in a debugger, that all the actual file changes in the install directory were happening in the function pulled out of the embedded DLL. I had a look at the functions exported by the DLL to check out their names - the function that EM109EN.EXE calls is named "RTPatch32@12". RTPatch sounded like it might be some kind of patching tool, so I went searching and sure enough, it is.
I then went searching for any third party tools for working with RTpatch, and found the myRTP tool by Luigi Auriemma. Checking his source code, he straight up just loads and runs the DLL, with some arguments to tell it where to find the game. I tried the same, but noticed that it was just ignoring the directory I was passing in and fetching the Emperor install directory straight from the registry. I was able to fake the registry keys that it expected, pass in the same command string that EM109EN.EXE did, and that actually worked. This is kind of funny, as the embedded DLL is 168 KiB, while EM109EN.EXE is 5.5 MiB, meaning the actual patch data is about 3% of the file size. If they were willing to accept that kind of overhead, it makes me wonder if it was worth all the effort of using a binary diffing tool rather than just shipping the changed files whole.
Westwood Online Shared Internet Components
Loading the shared components was tricker than it seemed
I knew we would get back to COM eventually. When you use the original installer, there are two components - Emperor, and something called the "Westwood Online Shared Internet Components". Without this optional component installed, WOL doesn't work. Of course, I want to package this into my installer so the user doesn't need to worry about getting it set up.
The problem is that this component seems to live up to its name - it really is shared. I suppose the same installation of the shared internet components could be used by multiple Westwood games with WOL support. It's not installed in the Emperor directory, but in its own folder somewhere, and at first I didn't know how Emperor was finding it. I thought it might be through something saved in the registry? Well, it turns out that I was correct, but not in the way I expected.
The main thing installed is WOLAPI.DLL, and it is a COM class library. Emperor loads code from the library by using CoCreateClass to instantiate the various COM objects contained in WOLAPI.DLL, which were registered at install time.
Ok for real this time, WTF is COM?
If you didn't read the previous section "What is lightweight COM?", I suggest you go back and read it now.
Again, I'm not going to give you a full explanation of all the subtleties of COM, mostly because I'm certain I'm not qualified to do so. But I can explain the process of registering a COM class library DLL.
Let's say we have a chunk of functionality implemented as COM objects, and we want to make that code available as a library to be used by other processes. How do they find our DLL? How do they identify the objects they want to create? COM registration is a method for solving this problem.
In COM, in addition to its human readable name, each class is identified by a unique identifier known as a GUID (or more specifically, a class ID or CLSID). There is a standard COM function, CoCreateInstance, which is used to create instances of classes. You identify the class you want to create by passing in the CLSID of the class, and it gives you back a pointer to a new instance of that class. But where does it store the list of available classes?
In the windows registry, under the key HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID. So COM registration is essentially taking a DLL full of COM classes, and saving the CLSIDs of those classes, along with a path to the DLL implementing them into the windows registry at a system wide level.
I've glossed over a lot of complexity here - for example COM has "out of process" servers, where the class is provided by an implementation that lives in a separate process, with automatic marshalling across process boundaries. But the part that matters here is just about telling CoCreateInstance where to find the WOL interfaces.
Ok, so can we just copy WOLAPI.DLL into the install folder and do normal COM registration? Well, we could but it's not ideal. COM registration writes to the HKEY_LOCAL_MACHINE key, which requires admin access. Also, if at all possible I would like to keep things scoped to just affect our game process. Registering COM objects system wide seems unnecessarily messy.
As far as I can tell, it's not possible to register a class library for just one process[5], but it is possible to register a class for just one user by redirecting the registry, and then using the OaEnablePerUserTLibRegistration function, so that's what I did.
Launcher UI
The last thing left to do was to make a basic launcher UI where players could type an ip to connect to, and tweak some basic settings.
UI design is my passion
In keeping with the very microsofty and quite retro theme of all the tech presented so far, I decided to use plain old win32. It was my first time making a UI with raw win32 controls, and I gotta say - it's kinda rough. I can see why people generally don't use it anymore. But for something so simple and static like this, it was fine.
Conclusion
So with that, my goals were pretty much achieved. This blog post is getting long so I won't got into details about the last few bits of polish. If you made it this far, thanks for listening to my ramblings, and if you try playing it, I hope you enjoy the game.
•••
1: Detours actually takes care of the CreateRemoteThread & LoadLibrary trick for us also.
2: It's actually a bit of a crazy hack, and probably not very resilient to change. What they're doing is searching for the bytes of a comparison to 2048, like this:
Example from DIRECT3DDEVICEI::SetRenderTarget
They just search the whole binary for the pattern B8 00 08 00 00 39, and replace the 00 08 00 00 part (little endian / 2048 in decimal) with FF FF FF FF which makes the check an unconditional pass. B8 at the start is the mov opcode, and the 39 at the end is the start of the cmp opcode that follows. I had a scan through d3dim700.dll on my system, and it seemed like there weren't any false positives. Still though, this is not a super nice solution, and could be broken if Microsoft does ever decide to ship an update to Direct3D 7 for some reason. I would actually like to use a complete reimplementation of the D3D7 API on top of a more modern D3D, like dxwrapper, but I tried a few, and none of them really worked well with Emperor.
3: Actually, it seems like the offset is always there, and always a little broken, even at 4:3. It's just small enough that it doesn't really matter.
4: Every windows executable actually starts with a DOS executable. Normally, this is just a small program that prints the message "This program cannot be run in DOS mode". This was a backwards compatability thing in the early days of windows. If a user tried to run a windows EXE from DOS, they would get a nice error explaining to them that they needed to use windows to run this file. Source here.
5: I did try using the DllGetClassObject function directly from the DLL, but it didn't work.