Building a .NET Disassembler (Part 3) – Parsing the .text Section

In the previous part, we figured out how to read each of the “sections” out of the assembly. One of these sections is named “.text”. This section contains a whole bunch of stuff, including the executable IL code. Its general layout is:

.text section layout

So, the image above is a recreation from Serge Lidin’s excelent book Inside Microsoft .NET IL Assembler, and he lists the 1st thing in the .text section the “Import Address Table”. However looking at an actual file, and a couple other resources, this seems to always be some kind of 8-byte “loader stub”. So, for the moment, I am just making this a ‘byte[8]’ property in the Text Section. I’m sure as I get further into this project, I will figure out what it is all about.

Reading the CLR Header

Yep, time to read yet another header. This one isn’t specific to the PE file specification. This is the header for the CLR, and is actually stored in the “.text” section. Well, ignore that statement. It is actually referenced by the 15th Virtual Data Directory named “CLRHeader”. Since each Virtual Data Directory is in a Section, that one should usually resolve into the “.text” section. I’m not sure if the CLR / loader actually enforces that. It would be an interesting experiment to try moving the CLR header to a different section and see if the runtime barfs 🙂

The CLRHeader is defined as so:

    public class CLRHeader
    {
        public uint HeaderSize { get; set; }
        public ushort MajorRuntimeVersion { get; set; }
        public ushort MinorRuntimeVersion { get; set; }
        public uint MetaDataDirectoryAddress { get; set; }
        public uint MetaDataDirectorySize { get; set; }
        public uint Flags { get; set; }
        public uint EntryPointToken { get; set; }
        public uint ResourcesDirectoryAddress { get; set; }
        public uint ResourcesDirectorySize { get; set; }
        public uint StrongNameSignatureAddress { get; set; }
        public uint StrongNameSignatureSize { get; set; }
        public uint CodeManagerTableAddress { get; set; }
        public uint CodeManagerTableSize { get; set; }
        public uint VTableFixupsAddress { get; set; }
        public uint VTableFixupsSize { get; set; }
        public uint ExportAddressTableJumpsAddress { get; set; }
        public uint ExportAddressTableJumpsSize { get; set; }
        public uint ManagedNativeHeaderAddress { get; set; }
        public uint ManagedNativeHeaderSize { get; set; }
    }

So, lets fetch the data for that virtual data directory, and read the CLR Header:

        public CLRHeader ReadCLRHeader(BinaryReader assemblyReader, PEHeader peHeader)
        {
            var clrDirectoryHeader = peHeader.Directories[(int) DataDirectoryName.CLRHeader];
            var clrDirectoryData = ReadVirtualDirectory(assemblyReader, clrDirectoryHeader, peHeader.Sections);
            using (var reader = new BinaryReader(new MemoryStream(clrDirectoryData)))
            {
                return new CLRHeader
                           {
                               HeaderSize = reader.ReadUInt32(),
                               MajorRuntimeVersion = reader.ReadUInt16(),
                               MinorRuntimeVersion = reader.ReadUInt16(),
                               MetaDataDirectoryAddress = reader.ReadUInt32(),
                               MetaDataDirectorySize = reader.ReadUInt32(),
                               Flags = reader.ReadUInt32(),
                               EntryPointToken = reader.ReadUInt32(),
                               ResourcesDirectoryAddress = reader.ReadUInt32(),
                               ResourcesDirectorySize = reader.ReadUInt32(),
                               StrongNameSignatureAddress = reader.ReadUInt32(),
                               StrongNameSignatureSize = reader.ReadUInt32(),
                               CodeManagerTableAddress = reader.ReadUInt32(),
                               CodeManagerTableSize = reader.ReadUInt32(),
                               VTableFixupsAddress = reader.ReadUInt32(),
                               VTableFixupsSize = reader.ReadUInt32(),
                               ExportAddressTableJumpsAddress = reader.ReadUInt32(),
                               ExportAddressTableJumpsSize = reader.ReadUInt32(),
                               ManagedNativeHeaderAddress = reader.ReadUInt32(),
                               ManagedNativeHeaderSize = reader.ReadUInt32()
                           };
            }
        }

Reading the IL Code

Hah, yeah right… we aren’t really ready to do this yet. We will come back to it 🙂

Reading the Strong Name Signature Hash

Back in the CLR Header, note the two properties StrongNameSignatureAddress and StrongNameSignatureSize. The address property is a Relative Virtual Address (rva), so you need to resolve it to a section (though it should be in the .text section). Oh, and if the assembly is unsigned, then this address will be 0x00000000.

So, you can read this signature hash by doing something like this:

private static byte[] ReadStrongNameHash(BinaryReader reader, uint rva, uint size, IEnumerable<SectionHeader> sections)
{
    if(rva == 0)
        return new byte[0];
    var fileOffset = AddressingUtils.RelativeVirtualAddressToFileOffset(rva, sections);
    reader.BaseStream.Seek((long)fileOffset, SeekOrigin.Begin);
    return reader.ReadBytes((int)size);
}

The call to “AddressingUtils” is:

public static class AddressingUtils
{
     public static ulong RelativeVirtualAddressToFileOffset(ulong rva, IEnumerable<Section> sections)
     {
         // find the section whose virtual address range contains the data directory's virtual address.
         var section = sections.First(s => s.VirtualAddress <= rva
             && s.VirtualAddress + s.SizeOfRawData >= rva);

         // calculate the offset into the file.
         var fileOffset = section.PointerToRawData + (rva - section.VirtualAddress);
         return fileOffset;
     }
}

More CLR Header Stuff

At this point you may have noticed that the CLR Header is mostly a bunch of these “Address and Size” pairs, just like the Strong Name Hash. You can read each one of them the same way that we just read the strong name hash. In fact maybe “ReadStrongNameHash()” isn’t a good method name here. Something more “general” would probably be better…

Reading the Metadata Header

Yep, another header. This one contains the layout info for the Metadata. This should be a common word for anyone that uses .NET. The metadata contains the signatures of all our types, methods, properties, etc.

The Address and Size for the Metadata Header is back in the CLR Header’s “MetaDataDirectoryAddress” and “MetaDataDirectorySize” properties. Use the same code from reading the Strong Name Signature above to fetch all the bytes for the Metadata Header.

The format of the Metadata Header is:

public class MetadataHeader
{
    public ulong Signature { get; set; } // always 0x424A5342 [42 53 4A 42]
    public uint MajorVersion { get; set; } // always 0x0001 [01 00]
    public uint MinorVersion { get; set; } // always 0x0001 [01 00]
    public ulong Reserved1 { get; set; } // always 0x00000000 [00 00 00 00]
    public ulong VersionStringLength { get; set; }
    public string VersionString { get; set; } // null terminated in file. VersionStringLength includes the null(s) in the length, and also is always rounded up to a multiple of 4.
    public ushort Flags { get; set; } // always 0x0000 [00 00]
    public ushort NumberOfStreams { get; set; }
}

Immediately following the MetadataHeader is a series of Stream Headers. A “stream” is to the metadata what a “section” is to the assembly. The NumberOfStreams property indicates how many StreamHeaders to read. The structure of each StreamHeader is:

public class StreamHeader
{
    public uint Offset { get; set; } // relative to start of MetadataHeader (Same as CLRHeader.MetaDataDirectoryAddress, resolved to file offset, then add this stream Offset.)
    public uint Size { get; set; }
    public string Name { get; set; } // null terminated in file, length always rounded up to divisible by 4
}

So, the two things to note here for each Stream is that its “Offset” is relative to the 1st byte of the Metadata Section, so it could be helpful to save off the file address of the Metadata Header, so you can just add the section Offset to that number. Second, the “Name” property is null-terminated, BUT it is also padded with nulls (0x00) up to the next 4-byte address (address % 4 == 0). This makes it a bit trickier to read than a normal null-terminated string. The code I wrote to read these looks something like this (I did it as a BinaryReader extension method):

public static string ReadNullTermFourByteAlignedString(this BinaryReader reader)
{
    var buffer = new List<char>();
    char nextChar;
    do
    {
        nextChar = reader.ReadChar();
        buffer.Add(nextChar);
    } while (nextChar != '\0' || reader.BaseStream.Position % 4 != 0);
    return new string(buffer.TakeWhile(b => !b.Equals('\0')).ToArray());
}

Next Time…

So we are leaving off here having read all the Stream Headers from the Metadata. Next time, we will look at parsing the actual metadata (at least, I think so…)

Stay tuned!

Advertisements
Tagged with: ,
Posted in Programming
3 comments on “Building a .NET Disassembler (Part 3) – Parsing the .text Section
  1. […] Part 3 is here. Share this:TwitterFacebookLike this:LikeBe the first to like […]

  2. […] time, in Part 3 of this series, we read in the “.text” section header, and the stream headers. One of […]

  3. mo3ez says:

    excellent article : is there any way to “skip” these headers and get the assembly as a byte array ? my goal is to create an assembly comparer that ignores the dos and PE header sections.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

CodingWithSpike is Jeff Valore. A professional software engineer, focused on JavaScript, Web Development, C# and the Microsoft stack. Jeff is currently a Software Engineer at Virtual Hold Technologies.


I am also a Pluralsight author. Check out my courses!

%d bloggers like this: