Welcome to h3dec
28 Dec 2018
Inspired by the fascinating work done on projects such as StarCraft for Pandora, pokeruby and Devilution, I set out today to look into the executable of Heroes of Might and Magic III. The end goal is to derive a C codebase which could be compiled to obtain an executable functionally equivalent to the original. This would facilitate, among others:
- understanding of the game architecture
- native porting to other platforms
- source-level modding (as opposed to in-memory hacks like HoMM 3 HD)
As for the tools, I expect to use radare heavily, along with the RetDec decompiler. Of course, a staple of the RE community is IDA Pro and its C decompiler. I intend to try it out as well, but due to licensing reasons, the final workflow should be preferrably based entirely on free software. A well-defined and data driven workflow would allow a choice from multiple tools for each step.
Because everybody loves pictures, let’s look at a picture of how I imagine this all could work:
The distinction between manually curated inputs and intermediate products is important; the goal is to work smart rather than hard, which means minimizing the amount of work that needs to be repeated for every new executable to be processed. The hint databases should only contain information that cannot be derived from the executable by any other means (static analysis, heuristics, seeded random guess…). This means that the approach taken by pokeruby (disassemble all, then hand-decompile functions one by one) is unsuitable for us, because it scales poorly.
One questions that remains to be answered is whether it is worth it to aim for a bit-perfect reproduction of the original executable. This would imply the need to use the original tools – initial analysis suggests that MSVC (6.0) Visual Studio 6.0 and Microsoft Linker (6.0) were used as the compiler and linker, respectively.
For now, the next step will be to just parse and disassemble the entire .text
section in a way that allows us to reassemble a bit-exact binary. There doesn’t seem to be any readily available tool to do this.
- Objdump output could probably be regexed, but that’s a dirty, crappy solution.
- notaz’s StarCraft port uses a custom IDA plugin to dump assembly and a custom decompiler that emits compilable low-level C representation of the assembly code
- Radare2 hangs when asked to do
pD
on the entire.text
; will probably need to use r2pipe and step by function or even single instructions - radeco is trash; r2dec is irrelevant for our purposes
- Snowman produces something. Parts of it don’t look too bad, but parts of it are pretty hopeless, e.g.
void fun_61e5e9(signed char cl, int32_t a2) { int32_t ebp3; *reinterpret_cast<signed char*>(ebp3 - 0x90) = 1; return; }
WinAPI calls are also reconstructed poorly, etc. This is not the way forward.
- RetDec is too memory hungry to run over the entire file at once, but when applied partially, provides very pleasing results. If the database of variable & function signatures can be saved and loaded independently on what part of the binary is currently being processed, this will be the way forward for decompilation.
- What about objconv?
- What did pokeruby do? Can’t find anything ARM-related in their tools repo. GBA-IDA-Pseudo-Terminal might be relevant.
Cool/useful stuff: