Friday, December 3, 2010

Windows PE Header

Hi Folks, during the past days a couple of students asked me a question about Windows PE header. Well, I supposed the PE was a "kind of" well known structure, instead it seems to be pretty much obscured for most of my people.

So I am going to resume very very briefly what PE is giving some useful pictures harvested out here.

Each executable file has a Common Object File Format COFF which is used from the OS loader to run the program. Windows Portable Executable (PE) is one of the COFF available in todays OS. For example the Executable Linking File (ELF) is the main Linux COFF.


Microsoft migrated to the PE format with the introduction of the Windows NT 3.1 operating system. All later versions of Windows, including Windows 95/98/ME, support the file structure. The format has retained limited legacy support to bridge the gap between DOS-based and NT systems. For example, PE/COFF headers still include an MS-DOS executable program, which is by default a stub that displays the simple message "This program cannot be run in DOS mode" (or similar). PE also continues to serve the changing Windows platform. Some extensions include the .NET PE format (see below), a 64-bit version called PE32+ (sometimes PE+), and a specification for Windows CE.


Nowadays the Windows PE header has the following structure (Click To Make it Bigger) .



MZ are the first 2 bytes you will see in any PE file opened in a hex editor. The DOS header occupies the first 64 bytes of the file - ie the first 4 rows seen in the hexeditor in the picture below. The last DWORD before the DOS stub begins contains 00h 01h 00h 00h, which is the offset where the PE header begins.
The DOS stub is the piece of software that runs if the executable is run from DOS environment (for example DOS shell). For retro-compatibility it often executes a printf("This program must be run under Win32");.

The PE header begins with its signature 50h, 45h, 00h, 00h (the letters "PE" followed by two terminating zeroes).
If in the Signature field of the PE header, you find an NE signature here rather than a PE, you're working with a 16-bit Windows New Executable file. Likewise, an LE in the signature field would indicate a Windows 3.x virtual device driver (VxD). An LX here would be the mark of a file for OS/2 2.0. FileHeader is the next 20 bytes of the PE file and contains info about the physical layout & properties of the file e.g. number of sections. OptionalHeader is always present and forms the next 224 bytes. It contains info about the logical layout inside the PE file e.g. AddressOfEntryPoint. Its size is given by a member of FileHeader. The structures of these members are also defined in windows.inc.
The PE header is defined as follows:



Not all these section must be used, but you need to modify the NumberOfSections to add or delete sections from a PE file. The best way to analyze those section is by using PEExplorer or PEID. The following image shows the PEID in use.



EntryPoint is The Relative Virtual Addresses (RVA) of the first instruction that will be executed when the PE loader is ready to run the PE file. If you want to divert the flow of execution right from the start, you need to change the value in this field to a new RVA and the instruction at the new RVA will be executed first. Executable packers usually redirect this value to their decompression stub, after which execution jumps back to the original entry point of the app the OEP. Of further note is the Starforce protection in which the CODE section is not present in the file on disk but is written into virtual memory on execution.

ImageBase is the preferred load address for the PE file. For example, if the value in this field is 400000h, the PE loader will try to load the file into the virtual address space starting at 400000h. The word "preferred" means that the PE loader may not load the file at that address if some other module already occupied that address range. In 99% of cases it is 400000h.

SectionAlignment is the granularity of the alignment of the sections in memory. For example, if the value in this field is 4096 (1000h), each section must start at multiples of 4096 bytes. If the first section is at 401000h and its size is 10 bytes, the next section must be at 402000h even if the address space between 401000h and 402000h will be mostly unused.

FileAlignment is the granularity of the alignment of the sections in the file. For example, if the value in this field is 512 (200h), each section must start at multiples of 512 bytes. If the first section is at file offset 200h and the size is 10 bytes, the next section must be located at file offset 400h: the space between file offsets 522 and 1024 is unused/undefined.

SizeOfImage is the overall size of the PE image in memory. It's the sum of all headers and sections aligned to SectionAlignment.

SizeOfHeaders is the size of all headers + section table. In short, this value is equal to the file size minus the combined size of all sections in the file. You can also use this value as the file offset of the first section in the PE file.

DataDirectory It is the final 128 bytes of OptionalHeader, which in turn is the final member of the PE header IMAGE_NT_HEADERS. DataDirectory is an array of 16 IMAGE_DATA_DIRECTORY structures, 8 bytes apiece, each relating to an important data structure in the PE file. Each array refers to a predefined item, such as the import table. The structure has 2 members which contain the location and size of the data structure in question: VirtualAddress is the relative virtual address (RVA) of the data structure , and isize contains the size in bytes of the data structure.

Summing up the whole PE Header structure in nutshell:




Alright this was a short description of the much more complex Windows PE header. I believe this is what everybody (of course I am not talking about grandma, but security skilled guys) should know about Windows PE. After that when you need to deal with PE header obviously these information aren't enough to attack or to reverse engineer a PE header, so I suggest to look into the most authoritative guides: this, this and this.

3 comments:

Anonymous said...

Your diagrams are my work. Information should be shared freely but would have been nice to get aknowledgement or reference.

Marco Ramilli said...

hummmm... I am not sure, what you are referring to.. but I should have cited you work in the last links. If not please give me your document I'll immediately update the post with your refs.
Thank you very much.

Anonymous said...

Can you help elaborate on what the first 16 bytes of a PE file would be?

I'm taking a class on Coursera regarding Malicious Software, and the instructor is talking about the Torpeg botnet and how they traces samples using the first 16 bytes of the file.

What does the first 16 bytes include?