Building a .NET Disassembler (Part 1)

Introduction

There was a time, long ago, when I was in college, and thought it would be awesome to crack code. This largely means digging into the bytes of an application, and seeing how it works, and possibly modifying the application code to suit your needs. Back in the days where just about everything was C/C++ or at least native code, this meant using a disassembler (like win32dasm for example) and strong assembly language skills.

Actually, before I go much farther with this:
<legal_disclaimer>
Disassembling or decompiling applications, reverse engineering, modifying code, etc all have possible legal ramifications. These articles are purely for educational purposes. I in no way condone or encourage anyone to use their new found skills to circumvent security measures, gain illegal access to licensed software, or whatever else you might do. Be responsible for your own actions.
</legal_disclaimer>
Alright then, onward…

In the age of .NET assemblies, instead of your application .exes being native machine language, they are instead Common Intermediate Language (aka CIL, MSIL, IL). So a .NET assembly is binary, but can be disassemble back to IL code. If you have seen any IL code, it actually looks a lot like assembly.

The goal of this blog series and related code are to build up a .NET assembly decompiler, so it should take in a binary assembly file (.exe or .dll) and output the IL code (in human readable format).

If you didn’t already know this, VisualStudio actually ships with one of these already. Yep, that’s right, right out of the box VisualStudio will let you disassemble your built assemblies back to the IL code. So why build a new one? Well, so we can learn how it works, of course!

Reading the PE Header

A .NET assembly is actually a “normal” DOS PE (Portable Executable) binary file. This is why Windows can run a .NET assembly .exe just like any other .exe.  The layout of the file is documented all over the place on the web, so I’m not going to go too in depth with it.

Reading the PE file header is pretty straight forward. The layout can be found online, or in a C language header (.h) file somewhere. However, I do want to share one observation. The Microsoft IL Disassembler that ships with VisualStudio actually dumps the whole header layout and its bytes for us!

Go find or make a .NET exe file (Just compiling an empty project out of VisualStudio is fine). Now lets use MS’s tool to disassemble it. To do that, go into your start menu and open a Visual Studio Command Prompt (By default, its in

Start > All Programs > Microsoft Visual Studio 2010 > Visual Studio Tools > Visual Studio Command Prompt)

Then run the command:

ildasm.exe /ALL c:\path\to\your.exe /OUT=your.il

If we now look at the “your.il” file (or whatever you named it), it is the disassembled IL code for the assemly. Because we specified the /ALL flag, we also get a bunch more, info, including the DOS header layout, which will look like this:

// ----- DOS Header:
// Magic:                      0x5a4d
// Bytes on last page:         0x0090
// Pages in file:              0x0003
// Relocations:                0x0000
// Size of header (paragraphs):0x0004
// Min extra paragraphs:       0x0000
// Max extra paragraphs:       0xffff
// Initial (relative) SS:      0x0000
// Initial SP:                 0x00b8
// Checksum:                   0x0000
// Initial IP:                 0x0000
// Initial (relative) CS:      0x0000
// File addr. of reloc table:  0x0040
// Overlay number:             0x0000
// OEM identifier:             0x0000
// OEM info:                   0x0000
// File addr. of COFF header:  0x0080
// ----- COFF/PE Headers:
// Signature:                  0x00004550
// ----- COFF Header:
// Machine:                    0x014c
// Number of sections:         0x0003
// Time-date stamp:            0x5009ddf3
// Ptr to symbol table:        0x00000000
// Number of symbols:          0x00000000
// Size of optional header:    0x00e0
// Characteristics:            0x0102
// ----- PE Optional Header (32 bit):
// Magic:                          0x010b
// Major linker version:           0x08
// Minor linker version:           0x00
// Size of code:                   0x00000a00
// Size of init.data:              0x00000800
// Size of uninit.data:            0x00000000
// Addr. of entry point:           0x0000287e
// Base of code:                   0x00002000
// Base of data:                   0x00004000
// Image base:                     0x00400000
// Section alignment:              0x00002000
// File alignment:                 0x00000200
// Major OS version:               0x0004
// Minor OS version:               0x0000
// Major image version:            0x0000
// Minor image version:            0x0000
// Major subsystem version:        0x0004
// Minor subsystem version:        0x0000
// Size of image:                  0x00008000
// Size of headers:                0x00000200
// Checksum:                       0x00000000
// Subsystem:                      0x0003
// DLL characteristics:            0x8540
// Size of stack reserve:          0x00100000
// Size of stack commit:           0x00001000
// Size of heap reserve:           0x00100000
// Size of heap commit:            0x00001000
// Loader flags:                   0x00000000
// Directories:                    0x00000010
// 0x00000000 [0x00000000] address [size] of Export Directory:
// 0x00002828 [0x00000053] address [size] of Import Directory:
// 0x00004000 [0x00000530] address [size] of Resource Directory:
// 0x00000000 [0x00000000] address [size] of Exception Directory:
// 0x00000000 [0x00000000] address [size] of Security Directory:
// 0x00006000 [0x0000000c] address [size] of Base Relocation Table:
// 0x000027bc [0x0000001c] address [size] of Debug Directory:
// 0x00000000 [0x00000000] address [size] of Architecture Specific:
// 0x00000000 [0x00000000] address [size] of Global Pointer:
// 0x00000000 [0x00000000] address [size] of TLS Directory:
// 0x00000000 [0x00000000] address [size] of Load Config Directory:
// 0x00000000 [0x00000000] address [size] of Bound Import Directory:
// 0x00002000 [0x00000008] address [size] of Import Address Table:
// 0x00000000 [0x00000000] address [size] of Delay Load IAT:
// 0x00002008 [0x00000048] address [size] of CLR Header:

This section is actually almost a byte-by-byte description of the beginning of your assembly! You can open the assembly in a hex editor and check it out. The one caveat is that the bytes listed in this .il file are backwards from the bytes of the assembly, because the assembly is written in little-endian format. So the very first thing listed there is:

// Magic:                      0x5a4d

However if you look at the hex of your assembly, the first 2 bytes will be 4D 5A, so the order is reversed. Just keep that in mind.

The actual layout of the DOS header has some “reserved” sections that usually aren’t used, and aren’t listed by ILDasm’s output, so you have to skip some bytes in a few spots:

From WINNT.H:

typedef struct _IMAGE_DOS_HEADER {  // DOS .EXE header
    USHORT e_magic;         // Magic number
    USHORT e_cblp;          // Bytes on last page of file
    USHORT e_cp;            // Pages in file
    USHORT e_crlc;          // Relocations
    USHORT e_cparhdr;       // Size of header in paragraphs
    USHORT e_minalloc;      // Minimum extra paragraphs needed
    USHORT e_maxalloc;      // Maximum extra paragraphs needed
    USHORT e_ss;            // Initial (relative) SS value
    USHORT e_sp;            // Initial SP value
    USHORT e_csum;          // Checksum
    USHORT e_ip;            // Initial IP value
    USHORT e_cs;            // Initial (relative) CS value
    USHORT e_lfarlc;        // File address of relocation table
    USHORT e_ovno;          // Overlay number
    USHORT e_res[4];        // Reserved words
    USHORT e_oemid;         // OEM identifier (for e_oeminfo)
    USHORT e_oeminfo;       // OEM information; e_oemid specific
    USHORT e_res2[10];      // Reserved words
    LONG   e_lfanew;        // File address of new exe header
  } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

So now, we finally get to step 1 of writing our .NET Disassembler. Lets read the DOS PE Header. We actually get lucky here, because

System.IO.BinaryReader

handles the endianness for us.

    public class DOSHeader
    {
        public ushort Magic { get; set; }
        public ushort BytesOnLastPage { get; set; }
        public ushort PagesInFile { get; set; }
        public ushort Relocations { get; set; }
        public ushort SizeOfHeader { get; set; }
        public ushort MinExtraParagraphs { get; set; }
        public ushort MaxExtraParagraphs { get; set; }
        public ushort InitialSS { get; set; }
        public ushort InitialSP { get; set; }
        public ushort Checksum { get; set; }
        public ushort InitialIP { get; set; }
        public ushort InitialCS { get; set; }
        public ushort RelocTableAddress { get; set; }
        public ushort OverlayNumber { get; set; }
        public ushort Unknown01 { get; set; }
        public ushort Unknown02 { get; set; }
        public ushort Unknown03 { get; set; }
        public ushort Unknown04 { get; set; }
        public ushort OEMIdentifier { get; set; }
        public ushort OEMInfo { get; set; }
        public ushort Unknown05 { get; set; }
        public ushort Unknown06 { get; set; }
        public ushort Unknown07 { get; set; }
        public ushort Unknown08 { get; set; }
        public ushort Unknown09 { get; set; }
        public ushort Unknown10 { get; set; }
        public ushort Unknown11 { get; set; }
        public ushort Unknown12 { get; set; }
        public ushort Unknown13 { get; set; }
        public ushort Unknown14 { get; set; }
        public ushort COFFHeaderAddress { get; set; }
    }

And to read it all in, its a simple (but long) as:

        public DOSHeader ReadDOSHeader(BinaryReader reader)
        {
            reader.BaseStream.Seek(0, SeekOrigin.Begin);
            return new DOSHeader
                       {
                           Magic = reader.ReadUInt16(),
                           BytesOnLastPage = reader.ReadUInt16(),
                           PagesInFile = reader.ReadUInt16(),
                           Relocations = reader.ReadUInt16(),
                           SizeOfHeader = reader.ReadUInt16(),
                           MinExtraParagraphs = reader.ReadUInt16(),
                           MaxExtraParagraphs = reader.ReadUInt16(),
                           InitialSS = reader.ReadUInt16(),
                           InitialSP = reader.ReadUInt16(),
                           // ... etc... you can do the rest yourself 🙂
                       };
        }

Reading the PE and COFF Header

Note: The PE and COFF headers are actually 2 separate things, but I am calling them both “PEHeader” in this article, since they are back-to-back in the file, so I just read them both at once.

Note that the last property of our DOSHeader is “COFFHeaderAddress”. This gives us the offset into the file where the PE header is located. The PE Header is defined as:

    public class PEHeader
    {
        public uint Signature { get; set; }
        public ushort Machine { get; set; }
        public ushort NumberOfSections { get; set; }
        public uint DateTimeStamp { get; set; }
        public uint PtrToSymbolTable { get; set; }
        public uint NumberOfSymbols { get; set; }
        public ushort SizeOfOptionalHeaders { get; set; }
        public ushort Characteristics { get; set; }
        public ushort OptionalMagic { get; set; }
        public byte MajorLinkerVersion { get; set; }
        public byte MinorLinkerVersion { get; set; }
        public uint SizeOfCode { get; set; }
        public uint SizeOfInitData { get; set; }
        public uint SizeOfUninitData { get; set; }
        public uint AddressOfEntryPoint { get; set; }
        public uint BaseOfCode { get; set; }
        public uint BaseOfData { get; set; }
        public uint ImageBase { get; set; }
        public uint SectionAlignment { get; set; }
        public uint FileAlignment { get; set; }
        public ushort MajorOSVersion { get; set; }
        public ushort MinorOSVersion { get; set; }
        public ushort MajorImageVersion { get; set; }
        public ushort MinorImageVersion { get; set; }
        public ushort MajorSubsystemVersion { get; set; }
        public ushort MinorSubsystemVersion { get; set; }
        public uint Reserved1 { get; set; }
        public uint SizeOfImage { get; set; }
        public uint SizeOfHeaders { get; set; }
        public uint PEChecksum { get; set; }
        public ushort Subsystem { get; set; }
        public ushort DLLCharacteristics { get; set; }
        public uint SizeOfStackReserve { get; set; }
        public uint SizeOfStackCommit { get; set; }
        public uint SizeOfHeapReserve { get; set; }
        public uint SizeOfHeapCommit { get; set; }
        public uint LoaderFlags { get; set; }
        public uint DirectoryLength { get; set; }
        public IList<DataDirectory> Directories { get; set; }
        public IList<Section> Sections { get; set; }
    }

I’ll talk about those last 2 properties, the virtual data directories, and the sections, in the next part, but for now, lets load the PE Header by first seeking our stream to DOSHeader.COFFHeaderAddress

        public PEHeader ReadPEHeader(ushort headerAddress)
        {
            // the passed in "headerAddress" is DOSHeader.COFFHeaderAddress
            // "_assemblyReader" is a BinaryReader I had defined at the class level.

            _assemblyReader.BaseStream.Seek(headerAddress, SeekOrigin.Begin);
            var header = new PEHeader
                             {
                                 Signature = _assemblyReader.ReadUInt32(),
                                 Machine = _assemblyReader.ReadUInt16(),
                                 NumberOfSections = _assemblyReader.ReadUInt16(),
                                 DateTimeStamp = _assemblyReader.ReadUInt32(),
                                 PtrToSymbolTable = _assemblyReader.ReadUInt32(),
                                 NumberOfSymbols = _assemblyReader.ReadUInt32(),
                                 SizeOfOptionalHeaders = _assemblyReader.ReadUInt16(),
                                 Characteristics = _assemblyReader.ReadUInt16(),
                                 OptionalMagic = _assemblyReader.ReadUInt16(),
                                 MajorLinkerVersion = _assemblyReader.ReadByte(),
                                 MinorLinkerVersion = _assemblyReader.ReadByte(),
                                 SizeOfCode = _assemblyReader.ReadUInt32(),
                                 SizeOfInitData = _assemblyReader.ReadUInt32(),
                                 SizeOfUninitData = _assemblyReader.ReadUInt32(),
                                 AddressOfEntryPoint = _assemblyReader.ReadUInt32(),
                                 BaseOfCode = _assemblyReader.ReadUInt32(),
                                 BaseOfData = _assemblyReader.ReadUInt32(),
                                 ImageBase = _assemblyReader.ReadUInt32(),
                                 SectionAlignment = _assemblyReader.ReadUInt32(),
                                 FileAlignment = _assemblyReader.ReadUInt32(),
                                 MajorOSVersion = _assemblyReader.ReadUInt16(),
                                 MinorOSVersion = _assemblyReader.ReadUInt16(),
                                 MajorImageVersion = _assemblyReader.ReadUInt16(),
                                 MinorImageVersion = _assemblyReader.ReadUInt16(),
                                 MajorSubsystemVersion = _assemblyReader.ReadUInt16(),
                                 MinorSubsystemVersion = _assemblyReader.ReadUInt16(),
                                 Reserved1 = _assemblyReader.ReadUInt32(),
                                 SizeOfImage = _assemblyReader.ReadUInt32(),
                                 SizeOfHeaders = _assemblyReader.ReadUInt32(),
                                 PEChecksum = _assemblyReader.ReadUInt32(),
                                 Subsystem = _assemblyReader.ReadUInt16(),
                                 DLLCharacteristics = _assemblyReader.ReadUInt16(),
                                 SizeOfStackReserve = _assemblyReader.ReadUInt32(),
                                 SizeOfStackCommit = _assemblyReader.ReadUInt32(),
                                 SizeOfHeapReserve = _assemblyReader.ReadUInt32(),
                                 SizeOfHeapCommit = _assemblyReader.ReadUInt32(),
                                 LoaderFlags = _assemblyReader.ReadUInt32(),
                                 DirectoryLength = _assemblyReader.ReadUInt32()
                             };
            return header;
        }

Next Time…

That is all I’m going to cover for this first part of the article.

For additional reading on the layout of all these initial headers and what some of the fields mean, there is an excellent writeup here: http://www.csn.ul.ie/~caolan/publink/winresdump/winresdump/doc/pefile.html

Part 2 is here: Building a .NET Disassembler (Part 2) – Reading Virtual Directories and Sections

Advertisements
Tagged with:
Posted in Programming
One comment on “Building a .NET Disassembler (Part 1)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

CodingWithSpike is Jeff Valore. A professional software engineer, focused on JavaScript, Web Development, C# and the Microsoft stack. Jeff is currently a Software Engineer at Virtual Hold Technologies.


I am also a Pluralsight author. Check out my courses!

%d bloggers like this: