i ELF FILE – CHAPTER 2: RELOCATION AND DYNAMIC LINKING – All things in moderation

ELF FILE – CHAPTER 2: RELOCATION AND DYNAMIC LINKING

In the previous chapter, I told about ELF file types, the components in ELF format and them’s struct. Today, I will show you the relocation process and the dynamic linking process. Make sure you read the previous chapter carefully. Because they are closely related to the content of this chapter.
There are three main types of ELF file. The relocatable file holds code and data suitable for linking with other object files to create an executable or a shared object file. An executable file holds a program suitable for execution; the file specifies how exec(BA_OS) creates a program’s process image. A shared object file holds code and data suitable for linking in two contexts. First, the link editor may process it with other relocatable and shared object files to create another object file. Second, the dynamic linker combines it with an executable file and other shared objects to create a process image.

1. Relocation:

Relocation is a process of connecting symbolic references with symbolic definitions. Relocatable file must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process’s program image. Relocation entries are these data and their struct are Elf32_Rel or Elf32_Rela.

The r_offset gives the location at which to apply the relocation action. A relocation action describes how to patch the code or data contained at r_offset.
The r_info gives both the symbol table index with respect to which the relocation must be made and the type of relocation to apply.
The r_addend specifies a constant addend used to compute the value stored in the relocatable field.

I will use an example in Learning Linux Binary Analysis book to explain the relocation process. The obj1.o file contains the code to call a function named foo() that is located in obj2.o. The obj1.o calls the foo() function. But, the foo() function is not located directly within that source code file; so, upon compiling, there will be a relocation entry created that is necessary for later satisfying the symbolic reference:

As we can see, the call to foo() contains the value 0xfffffffc which is the implicit addend. The number 7 is the offset of the relocation target to be patched. So when obj1.o is linked with obj2.o to make an executable, a relocation entry that points at offset 7 is processed by the linker. The linker then patches the 4 bytes at offset 7 so that it will contain the real offset to the foo() function, after foo() has been positioned somewhere within the executable.

The value of r_offset field is offset 7 and the relocation type is R_386_PC32. Relocation entries describe how to alter the following instruction and data fields. The link editor merges one or more relocatable files to form the output. It first decides how to combine and locate the input files, then updates the symbol values, and finally performs the relocation. Relocations applied to executable or shared object files are similar and accomplish the same result. Relocation types described below.

  • A : This means the addend used to compute the value of the relocatable field.
  • B : This means the base address at which a shared object has been loaded into memory during execution. Generally, a shared object file is built with a 0 base virtual address, but the execution address will be different.
  • G : This means the offset into the global offset table at which the address of the relocation entry’s symbol will reside during execution.
  • GOT : This means the address of the global offset table.
  • L : This means the place (section offset or address) of the procedure linkage table entry for a symbol. A procedure linkage table entry redirects a function call to the proper destination. The link editor builds the initial procedure linkage table, and the dynamic linker modifies the entries during execution.
  • P : This means the place (section offset or address) of the storage unit being relocated (computed using r_offset).
  • S : This means the value of the symbol whose index resides in the relocation entry.

Come back with the example, let’s look at the fnal output of our executable after compiling obj1.o and obj2.o on a 32-bit system:

The call instruction at 0x80480DE has been modified with the 32-bit offset value of 5, which points foo() function. The value 5 is the result of the R386_PC_32 relocation action:
S + A – P: 0x80480E8 + 0xFFFFFFFC – 0x80480DF = 5

2. ELF Symbols:

Symbols are a symbolic reference to a global variable or function. For instance, the printf() function is going to have a symbol entry that points to it in the dynamic symbol table .dynsym. There are two symbol tables: .dynsym and .symtab. .symtab contains all of the symbols, whereas .dynsym contains just the dynamic/ global symbols from an external source. The .dynsym section will be allocated at runtime and loaded into memory, and .symtab is not loaded into memory because it is not necessary for runtime. While the .dynsym symbol table is necessary for the execution of dynamically linked executables, the .symtab symbol table exists only for debugging and linking purposes and is often ignore to save space. A symbol table entry há the following format.

Symbol entries are contained within the .symtab and .dynsym sections, which is why the sh_entsize (section header entry size) for those sections are equivalent to sizeof(ElfN_Sym).
The st_name holds an index into object file’s symbol string table (in .dynstr section), which holds the character representations of the symbol names. If the value is zero, the symbol table entry has no name.
The st_value gives the value of the associated symbol. Depending on the context, this may be an absolute value, an address, etc.
The st_size is associated size. This member holds 0 if the symbol has no size or an unknown size.
The st_info specifies the symbol’s type and binding attributes. A list of the values and meanings appears below.

The st_other holds 0 and has no defined meaning.
The st_shndx holds the relevant section header table index.
In order to view the symbol tables, we can use the command readelf -s

3. ELF Dynamic linking:

In static linking, if a program used external library functions, the entire library was compiled directly into the executable. ELF supports dynamic linking, which is a much more efficient way to go about handling shared libraries.
When user run ELF file, the kernel begins with the process of loading the ELF image into user space virtual memory. One aspect of segment loading differs between executable files and shared objects. Executable file segments typically contain absolute code. To let the process execute correctly, the segments must reside at the virtual addresses used to build the executable file. Thus the system uses the p_vaddr values unchanged as the virtual address. On the other hand, shared object segments typically contain position-independent code. This lets a segment’s virtual address change from one process to another, without invalidating execution behavior. Though the system chooses virtual addresses for individual processes, it maintains the segments’ relative positions. Because position-independent code uses relative addressing between segments, the difference between virtual addresses in memory must match the difference between virtual addresses in the file.
The kernel notices an ELF section called .interp, which indicates the dynamic linker to be used (/lib/ld-linux.so.2), shown in below.

The ld-linux.so.2 is itself an ELF shared library, but it is statically compiled and has no shared library dependencies. When dynamic linking is needed, the kernel bootstraps the dynamic linker (ELF interpreter), which initializes itself, and then loads the specified shared objects (unless already loaded). It then performs the necessary relocations, including the shared objects that the target shared object uses. The LD_LIBRARY_PATH environment variable defines where to look for the available shared objects. When done, control is transferred back to the original program to begin its execution.
Relocation is handled through an indirection mechanism called the Global Offset Table (GOT) and the Procedure Linkage Table (PLT). These tables provide the addresses of external functions and data, which ld-linux.so.2 loads during the relocation process. This means that the code that requires the indirection (that is, uses the tables) needs no changes: only the tables require adjustment. Relocation can occur immediately upon load or whenever a given function is needed.
When a program calls a shared library function such as strcpy() or printf(), which are not resolved until runtime, there must exist a mechanism to dynamically link the shared libraries and resolve the addresses to the shared functions. When a dynamically linked program is compiled, it handles shared library function calls in a specific way, far differently from a simple call instruction to a local function. Let’s take a look at a call to the puts() function in a 32-bit compiled ELF executable.

The address 0x8049040 corresponds to the PLT entry for puts().

As we can see, there is an indirect jump to the address stored at 0x804C00C. This address is a GOT entry that holds the address to the actual puts() function in the shared library.

However, the first time a function is called, its address has not yet been resolved by the dynamic linker, when the default behavior lazy linking is being used. Lazy linking implies that the dynamic linker should not resolve every function at program loading time. Instead, it will resolve the functions as they are called, which is made possible through the .plt and .got.plt sections. This behavior can be changed to what is called strict linking with the LD_BIND_NOW environment variable so that all dynamic linking happens right at program loading time.
Let’s take a look at the relocation entry for puts():

The relocation offset is the address 0x804C00C, the same address that the puts() PLT jumps into. Assuming that puts() is being called for the first time, the dynamic linker has to resolve the address of puts() and place its value into the GOT entry for puts().

Conclusion:

In this chapter, I have presented quite a bit about dynamic linking in ELF files. In the next section, I will present some obfuscator tools for ELF file. Have fun with ELF file!

Leave a Reply