Building a .NET Disassembler (Part 5) – Reading the #Strings Stream

This one should be a quickie compared to the previous parts of this series.

Today we are just quickly covering reading the strings out of the #Strings stream.

To get the byte[] that makes up this stream, see the previous part on reading the metadata stream.

Once you have the bytes for this stream, Each of the strings is simply null terminated. So all you do is read null terminated strings to the end of the stream.

One thing to note however, is that when any of the metadata tables references a string in this stream, it does so by passing in an “offset”. This offset is the byte offset relative to the start of the #Strings stream. So you would set your reader to that position, and read a null terminated string.

In my code, I decided to load all the strings at once into memory, but to use the offsets from the metadata stream, I needed to retain their offsets (remember, it is a relative byte offset, not a simple index into the array of all strings). So I did this by just reading all the strings into a Dictionary<uint, string> where the key into the dictionary was the offset where the string started, which will be the same as the offset used by the metadata tables.

So, previously I had shown code for reading a null terminated string, but here it is again, as an extension on BinaryReader:

        public static string ReadNullTermString(this BinaryReader reader)
        {
            var buffer = new List<char>();
            char current;
            while ((current = reader.ReadChar()) != '\0')
                buffer.Add(current);
            return new string(buffer.ToArray());
        }

So now we can read all the strings like this:

    public class StringsStream
    {
        private readonly Dictionary<uint, string> _strings;

        public StringsStream(Dictionary<uint, string> strings)
        {
            _strings = strings;
        }

        public string GetByOffset(uint offset)
        {
            return _strings[offset];
        }

        public IEnumerable<string> GetAll()
        {
            return _strings.Values;
        }
    }

    public class StringsStreamReader
    {
        private readonly BinaryReader _reader;
        private readonly int _dataSize;

        public StringsStreamReader(byte[] data)
        {
            _dataSize = data.Length;
            _reader = new BinaryReader(new MemoryStream(data));
        }

        public StringsStream Read()
        {
            var strings = new Dictionary<uint, string>();
            while (_reader.BaseStream.Position < _dataSize)
            {
                strings.Add((uint)_reader.BaseStream.Position, _reader.ReadNullTermString());
            }
            return new StringsStream(strings);
        }
    }

So from there you can just use an instance of StringsStream to get a string by its offset, or get the list of all of them if you are dumping them to a UI.

So if you combine the reading of the “Module” metadata table from Part 4 of this series, that metadata table had a “Name” property that was an index into the Strings stream that we just read. So we can put it together like this, to print the name of every module to the console:

    foreach (var moduleTableRow in textSection.MetadataStream.ModuleTable)
    {
        var moduleName = textSection.StringsStream.GetByOffset(moduleTableRow.Name);
        Console.WriteLine("Module Name: " + moduleName);
    }

Next Time…

I’m actually not sure what we will cover next time. Probably reading the remaining streams (#US, #GUID, #Blob). See you next time!

Advertisements
Tagged with: , ,
Posted in Programming
5 comments on “Building a .NET Disassembler (Part 5) – Reading the #Strings Stream
  1. Frank Rosenholz says:

    Ahhhhh… I want to read the #Blob stream because it contains the data that VB.Net uses for the My.Application.Info.ProductName value and I want to change it as a post build step… where is part 6?!! XD

  2. Frank Rosenholz says:

    I followed along in a hex editor. The data that I am interested in is everything that is defined by the AssemblyInfo file in the following fields:

    They are located at the very end of the #Blob stream using UTF-8 encoding but I do not know how to parse this data let alone change it 😦

  3. Frank Rosenholz says:

    It removed the Assembly attributes because of the < > characters.
    I meant: AssemblyTitle, AssemblyDescription, AssemblyCompany, AssemblyProduct, AssemblyCopyright, AssemblyTrademark, ComVisible and Guid.

    • rally25rs says:

      Hi Frank. Thanks for the interest in the blog posts. It has actually been almost 2 years since I wrote the first parts of that series. Right around that time, I ended up changing jobs and getting really focused on client-side web development and JavaScript, and I actually never finished my decompiler project. I actually learned a lot of information from the book “Inside Microsoft .NET IL Assembler” which you can find very cheap. Since it has been so long now since I worked on that disassembler, I really don’t remember much of it or where I left off, so unfortunately I probably won’t be of much help.

      • If anyone is interested I’ve long since written something that’s as far along as this is. Structured a typing model that is focused on making a compiler in the future. Got as far as parsing the data stream for method bodies. Things get a bit dicey when you try to understand the weirder data that the compilers add in. — Let me know if you’re interested in knowing more.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

CodingWithSpike is Jeff Valore. A professional software engineer, focused on JavaScript, Web Development, C# and the Microsoft stack. Jeff is currently a Software Engineer at Virtual Hold Technologies.


I am also a Pluralsight author. Check out my courses!

%d bloggers like this: