Thursday, December 9, 2010

Executable and Linking Format

Hi Folks,
after the previous post on Windows COFF; namely Microsoft PE, this post comes natural. Today I am going to write some sketches of mine on Executable and Linking Format (ELF). ELF was originally developed by UNIS System Laboratories (USL) as part of the big Application Binary Interface. It has been selected by Tool Interface Standards commission as a portable file format working on 32-bit Intel Architecture.

ELF is structured as follows (click to make it bigger):

An ELF header resides at the beginning and holds a ‘‘road map’’ describing the file’s organization. Sections
hold the bulk of object file information for the linking view: instructions, data, symbol table, relocation information, and so on. Descriptions of special sections appear later in Part 1. Part 2 discusses segments and the program execution view of the file. A program header table, if present, tells the system how to create a process image. Files used to build a processimage (execute a program) must have a program header table; relocatable files do not need one. A
section header table contains information describing the file’s sections. Every section has an entry in the table; each entry gives information such as the section name, the section size, etc. Files used during linking must have a section header table; other object files may or may not have one.

Being very quickly and dirty on the header description we can say that some object file control structures can grow, because the ELF header contains their actual sizes. If the object file format changes, a program may encounter control structures that are larger or smaller than expected. The ELF Header is structured as follows (click to make it bigger) :

e_ident: The initial bytes mark the file as an object file and provide machine-independent data with which to decode and interpret the file’s contents. Complete descriptions appear below, in ‘‘ELF Identification.’’

e_type: This member identifies the object file type.

e_machine: This member’s value specifies the required architecture for an individual file.

e_version: This member identifies the object file version (The value 1 signifies the original file format; extensions will create new versions with higher numbers.)

e_entry: This member gives the virtual address to which the system first transfers control, thus starting the process. If the file has no associated entry point, this member holds zero.

e_phoff: This member holds the program header table’s file offset in bytes. If the file has no program header table, this member holds zero.

e_shoff: This member holds the section header table’s file offset in bytes. If the file has no section header table, this member holds zero.

e_flags: This member holds processor-specific flags associated with the file. Flag names take the form EF_machine_flag. See ‘‘Machine Information’’ for flag definitions.

e_ehsize: This member holds the ELF header’s size in bytes.

e_phentsize: This member holds the size in bytes of one entry in the file’s program header table; allentries are the same size.

e_phnum: This member holds the number of entries in the program header table. Thus the product of e_phentsize and e_phnum gives the table’s size in bytes. If a file has no program
header table, e_phnum holds the value zero.

e_shentsize: This member holds a section header’s size in bytes. A section header is one entry in the section header table; all entries are the same size.

e_shnum: This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table’s size in bytes. If a filehas no section header table, e_shnum holds the value zero.

Alright, this was kind of cool, but how can we play with that ? First of all how can we know that a file is an ELF ? And how can we extract data and information from an ELF file ? Is there something like PEdit ?

Lets start with the following example "test.c" ($ gcc -o test test.c):

$ readelf -h test Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x80482c0 Start of program headers: 52 (bytes into file) Start of section headers: 2060 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 7 Size of section headers: 40 (bytes) Number of section headers: 28 Section header string table index: 25

What does this header tell us?

1) This executable is created for Intel x86 32 bit architecture ("machine" and "class" fields).

2) When executed, program will start running from virtual address 0x80482c0 (see entry point address). The "0x" prefix here means it is a hexadecimal number. This address doesn't point to our main() procedure, but to a procedure named _start. Never felt you had created such thing? Of course you don't. _start procedure is created by the linker whose purpose is to initialize your program.

3) This program has a total of 28 sections and 7 segments.

What is section? Section is an area in the object file that contains information which is useful for linking: program's code, program's data (variables, array, string), relocation information and other. So, in each area, several information is grouped and it has a distinct meaning: code section only hold code, data section only holds initialized or non-initialized data, etc. Section Header Table (SHT) tells us exactly what sections the ELF object has, but at least by looking on "Number of section headers" field above, you can tell that "test" contains 28 sections.

$ readelf -S test

There are 28 section headers, starting at offset 0x80c:

Section Headers:[Nr] Name Type Addr Off Size ES Flg Lk Inf Al........[ 4] .dynsym DYNSYM 08048174 000174 000060 10 A 5 1 4........[11] .plt PROGBITS 08048290 000290 000030 04 AX 0 0 4[12] .text PROGBITS 080482c0 0002c0 0001d0 00 AX 0 0 4........[20] .got PROGBITS 080495d8 0005d8 000004 04 WA 0 0 4[21] .got.plt PROGBITS 080495dc 0005dc 000014 04 WA 0 0 4........[22] .data PROGBITS 080495f0 0005f0 000010 00 WA 0 0 4[23] .bss NOBITS 08049600 000600 000008 00 WA 0 0 4........[26] .symtab SYMTAB 00000000 000c6c 000480 10 27 2c 4........

.text section is a place where the compiler put executablescode. As the consequence, this section is marked as executable ("X" onFlg field). In this section, you will see the machine codes of ourmain() procedure

$ objdump -d -j .text test
-d tells objdump to diassembly the machine code and -j tellsobjdump to focus on specific section only (in this case, .text section, but you can play with .bss, .stack, .data)

MAC OSX users can play with ELF files through otool as follows (click to enlarge):

Showing Header:

Showing Shared Libraries:

Showing .text area (code payload):

Showing .data area (local vars):
This "quick and dirty" post shows out the basic ELF structure, with a particular focus on the ELF header that is the first element cared by OS Loader. Some tools to play with have been presented. Keep in mind that those tools are very useful for the first file analysis such as malware, virus keylogger etc.

To know more about ELF structure read here, here and here


Enrico said...

Davvero utilissimo come articolo.
Grazie mille.

Robyn said...

Thanks for your article, really useful information.