← Home

Decompiling Xbox games using PDB debug info

26 January, 2026

How decompilation works

In the world of matching decompilation, projects use a tool which consumes an input binary and lifts objects for comparison and linkage.

Doing this lets them employ a divide and conquer strategy where a game is reverse engineered, object by object, until the game is buildable, and matches on a byte or instruction level using fully original code. In the dark ages, every project started with a disassembly of its target binary. The boundaries between different object files ("splits") were determined manually through heuristics and processes of elimination.

Fast forward some years and the decompilation landscape has developed considerably. Many different open-source toolkits called splitters exist which let you establish a decompilation workflow for all sorts of targets. Code is processed by control flow generation algorithms that lift relocations. You never deal with disassemblies because the tools write object files for you.

decomp toolkit

x86 object splitting

Currently most decomps target 6th and 7th generation PowerPC console games using decomp-toolkit.

But what if we want to decompile an original Xbox game? Specifically, I'm working on the PAL debug build of Halo 1, since it has debug symbols in the form of a PDB file.

The vast majority of x86 reverse engineering projects load a DLL which hooks functions in the base executable. Nobody has managed instruction level matching using this, which is something I wanted. And only the debug Xbox kernels even support loading DLLs (in the form of "debugger extensions.")

Instead I started looking at splitting tools for x86, of which there were two solutions: either scripts to postprocess disassemblies or plugins for IDA Pro and Ghidra that work based on exporting the tools' symbol databases.

Section contributions

I couldn't find a tool that actually read the PDBs for splitting. The best the existing tools could do was use the PDB like a map file to be consumed by another reverse engineering tool to provide symbolisation.

Using the PDB this way meant that I'd miss out on a lot of information. The VC++ linker logs the individual sections of input COFF objects being linked into the output executable to the PDB in the form of section contributions.

These structures contain:

This is very useful in the context of splitting objects because it's a record of how all object files were laid out verbatim!

And they still exist even in stripped PDBs. In my case, I have a stripped PDB, so I don't have type info or private symbols for the target. But we can still automatically enumerate every logical piece of data and code. Compared to other tools we are no longer guessing the sizes and locations of data symbols based on auto analysis.

The PDB splitter

So I wrote my own splitter that uses the section contributions data to create objects.

My specific target uses Visual C++ 7 beta 2 on the old VC++ 2.00 debug info format which no-one implements support for, so to read the file I had to modify the Rust pdb crate. Though, the official Microsoft crate can read these now, but where's the fun in that?

Some information we don't have is names for private symbols or COMDAT data (the rules for handling duplicate definitions of symbols.) In my case I just ignored COMDAT except where it is plainly obvious that it was used like for string and floating point constant deduplication. For nameless section contributions, having their flags was useful when creating temp names because it lets you prefix symbols based on their contents, e.g. code_0041234.

Control flow generation

Every decompilation needs to identify all pointers referenced in their binaries. We have a complete list of all absolute relocations in the binary thanks to the .reloc section, but nothing about relative relocations for jmp or call.

To lift these relative relocations, you can either make a script to export from a tool like IDA or you can find them yourself using control flow generation. I opted for the latter because I wanted to keep the tool self contained.

SafeSEH

Structured Exception Handlers (SEH) are a Microsoft vendor-specific C language extension that lets you write exception handlers inside functions using __try and __catch statements.

When entering a __try statement, the compiler generates a structure containing pointers to the __catch or __finally handlers, which are blocks in the function. There's never a direct jump to these handlers, and without special treatment this is opaque to control flow generation. Hence it failed to find any catch/finally blocks. I ended up fixing about 130 handlers manually.

First boot

Booting into the game the first time it linked immediately booted me back to the dashboard.

After investigating this I found that the Xbox runtime library was failing to assign the console's active cache partition to a drive letter. szCacheDrive was being formatted to "\Device\Harddisk0\Partitin%d\", so something must have broken inside of string formatting.

xapi function

Negative relocations

This turns out to be a failure in the symbolisation of relocation targets concerning a compiler optimisation. Tracing the error to the internal libcmt function for applying format strings, _output:

  34:   80 fb 20                cmp    bl,0x20
  37:   7c 12                   jl     4b <__output+0x4b>
  39:   80 fb 78                cmp    bl,0x78
  3c:   7f 0d                   jg     4b <__output+0x4b>
  3e:   0f be c3                movsx  eax,bl
  41:   0f be 80 e0 ff ff ff    movsx  eax,BYTE PTR [eax-0x20]
                        44: dir32       ___lookuptable

___lookuptable is valid for some values above 0x20 (the first valid ASCII character.) A loop checks if the current character is in bounds, then dereferences ___lookuptable by that character minus 0x20.

Visual C++ optimises away the subtraction by applying a negative offset to the relocation. In this case the relocation points to 0x20 bytes before ___lookuptable.

RELOCATIONS #18
                                                Symbol    Symbol
 Offset    Type              Applied To         Index     Name
 --------  ----------------  -----------------  --------  ------
 00000044  DIR32                      FFFFFFE0         9  ___lookuptable
 0000004F  DIR32                      00000000         9  ___lookuptable
 00000067  DIR32                      00000000        76  $L9491
 000001EE  DIR32                      00000000        6E  __pctype
 ...

This means that if you match relocation targets to symbols based on the last symbol before the target address, you are liable to generate bad relocations and probably crash the game.

movsx   eax, byte ptr ds:stru_6B5C08.HandlerFunc[eax] ; bad

Maybe some kind of heuristic could help here. But what I found good enough was to manually fix it, and after patching a few of these in libcmt and d3d8 the game made it through init.

Main menu

ida debug

Unfortunately there are still some bad pointers. The game can't load into another map from the menu successfully, and if you idle long enough, the content streaming thread in the game crashes inside of a kernel export.

I haven't found the root cause for the crash. The idea is that in the future we'll get to all of these just by finding them naturally through the decompilation process.

There is also the possibility of matching linking so that bad relocations simply disappear. Hopefully there isn't undefined behaviour involved with determining the linker order.

pregame menu

Conclusion

You can visit the repo for Halo on GitHub and the progress tracker at decomp.dev.

There is a (pretty bad) C object writer tool that the repository is currently working off of available here. In the future, I'm going to rewrite this to include the CFG step. I swear.