November 21, 2008 (Lecture 20)

Book About Linking

Levine, John, Linkers and Loader, Kaufmann, 1999.
This is the only book of which I know that describes the details of dynamic linkers and loaders. The pre-publication notes from this text, available via the author's Web site, were a god-send that explained what I needed to know to debug some nasty problems. Since that time, plenty of experience with "objdump", "nm" and "dbx"/"gdb" have shown the section describing the UNIX environment to be right on the money.

Why Study Linkers

Understanding how a complex program is linked can also help in debugging. If you understand how the linker has mapped variables into memory, it is possible to have an idea what an address means. Is it a variable? A constant? A function we wrote? A function in a shared library?

From Source Code to a Process

The compiler takes individual source code files (.c files) and generates an object file from each (.o file). The object files contain the "raw code" (machine code) that the compiler generated from the high-level code (like C). The linker then takes these object files, smashes them together, fixes them up, and builds one executable. The loader then converts this executable into a memory image.

Static vs. Dynamic Linking

There are two basic types of linking: static linking and dynamic linking:

Static linking generates an exact image at link time, almost no work is left for load time.
Dynamic linking resolves many things, but leaves behind some stubs at link time. These stubs are resolved by the loader. Dynamic linking loads shared objects at runtime.

A Closer Look at Object Files

What goes into an object file? Many things including the following:

Labels
Code (stored in a section)
Static data (stored in a section)
Debugging symbols (stored in the symbol table)
Line number table -- provides the asm -->C line # mapping used, for example, by debuggers
Relocation table - if code moves, what must change?

What is format for obj file? There are several different types:

COFF (Common Obj. File Format) - this format was originally used by UNIX systems, but it has since been abandoned. Microsoft has revitalized it for Windows NT. Its reincarnation is known as MS-COFF.
ELF (Executable and Link Format) This is used by most UNIX systems. It is a format both for linking and loading.
OMF (Obj. Metafile Format) - This format was used by MS-DOS and OS/2. Its sole purpose was to allow several megabytes programs to link w/in the 640K "barrier". It did this by breaking all data into 4K chunks.

How can we learn more about an obj file, get into the guts of them, and see these things?

Each OS has its own collection of utilities for cracking the files.
GNU bintools - The kind folks at FSF have provided a tool called objdump that can crack the files. It requires a library called bfd that knows about all of the object file types.

Linking process

The Linking process involves three basic steps:

Symbol resolution - Is everything there? For example, if the code calls printf, is the code for printf available?
Section creation - smashing together obj files
Relocation - reorganizing the smashed pieces so they fit in memory together.

Symbols and the Symbol Table

Before we talk about symbol resolution, we need to make sure we understand what a symbol is. Things like variable names and function names are symbols. There are different types of symbols:

Definitions (D) - This is a symbol which is defined in the current file. Only one definition of each symbol can exist in a link.
External symbols (E) - External symbols are symbols for things that don't exist within the program's object files. Instead they must be found elsewhere. printf() is one example -- we use it, but it isn't something that we write or provide.
Common symbols (C) - A common symbol is a "weak" definition. If no "strong" definition of the symbol is found, use that one. Otherwise, use the weak definition.
Sometimes conflicting defintions of symbols exist. For example, we declare "int i;" as a global in two different files. Is this an error? No. Each defintion is said to be a weak definition. The linker can only detect conflicts among weak defintions if the sizes of the types are different -- the linker doesn't actually know anything about the types themselves.
"int i = 10;" is said to be a strong defintion. The linker considers weak definitions to be tentative and overrides them with a strong defintion, if available. More than one strong definition is an error condition.
The following is a simplified example of a symbol table:

printf-E reference to printf() in a shared library

main()-D Defined function main()

i-D Definition of global "int i" as in "int i = 10;"

i-C Weak definition of global "int i", as in "int i;".

If we consider the thread library, we see one application of a weak definition. In single-thread applications, malloc() does not require synchonization. But in multithreaded applications time-consuming synchronization is required. To optimize for both performance and correctness, the standard library implementation of malloc calls a function "malloc_lock()" that is weakly defined to do nothing. The threads library contains a strong definition of this function (malloc_lock) in which it performs the expensive synchronization. When linking against the thread library, the strong definition wins. Without the thread library, thew cheap do-nothing function stub is included.
Different compilers provide different mechanisms for defining weak functions. The gcc syntax is "int myfunc() __attribute__((weak)) {...mycode...}".
It is important to remember that although the C linker doesn't care about types, C++ does -- this is because of overloaded functions. The C++ linker employs a cheap hack to include the type information -- it attaches the type information to the name of the symbol generating long and often frightening symbol names. A simplified example might be "main_void" (the actual encoding is more complex). This can lead to frightening messages from the linker, because it often doesn't decode the names for you.

Note: "nm" can show you the symbol table information within an object file, executible, or archive. "nm -C x.o" will demangle the names of symbols in the supplied object file (for C++).

Stage 1: Symbol Resolution

Every object file has a symbol table. Given the collection of symbol tables, the linker produces one universal symbol table without external references. If the link succeeds an executible is generated. If not an obscure message is generated describing the unresolved symbols.
Certain dangling references refer to shared libraries. With static linking these must be resolved at link time.
Archive libs (type .a), such as libc.a, are a collection of other libraries with a big symbol table. The linker will search the symbol table of the archive library for the right symbol and then link against the library.
This can lead to the problem of circular dependencies. It is possible that one library includes another that includes the first. If this happens, some linkers, such as those on the linux boxes, won't loop through to handle the references to a prior library in a subsequent library. If this is the case, it is a pain -- you need to include the libaries multiple times so that it finds them at the right time.
With dynamic linking and shared libraries, some external references are resolved by adding code to tell the run time librarian where to find the libraries at runtime. One such examle is libc.so. Dynamic libraries are nice, because library bug-fixes and revisions don't require recompiling applications. They also reduce the size of executibles.

Stage 2: Section creation

What is a section?
Sections are the divisions of an executbile that contain the stuff: Comments (real hackers include comments?), plus debug information, code, &c.
Flags can describe permissions used by the OS for protection (r/w/e/etc). We have different protection, for example, on the section that contains constants than we do the section that contains variables.
Several common sections might include the following:

.text (code)
.data (variables)
.rodata (read-only variables, e.g. string constants, printf ("hi");)
.stab ("stab strings", a.k.a, debug info)
.comments (anything, e.g., compiler information. FSF advertisement, etc)

Stage 3: Section Smashing

How does the linker smash these sections together?
It lays out each section, then knowing the sizes, smashes them together. Since each section had memory laid out only relative to itself, it is necessary to give them a universal memory mapping. This is done with what is called relocation entries, previously known as fixups in the OMF format.
A relocation entry is a set of pointers into the object code. When we resolve the entries, we add in new values.
Lets say that there is a .globals section (may systems have this section for globals). The linker looks through the relocation table for a symbol, e.x. i, and then adjusts its value.
An efficient relocation table design points into the symbol table. This allows the symbol table to keep all of the symbol information and the relocation table to keep all of the relocation information. Information describing the type of address (32-bit? 64-bit?) might be part of the relocation information.
When a static link is done, the relocation table may be empty -- the information is no longer needed. It may also be preserved for debugging and profiling.
The dynamic linker, however cannot resolve all of the entries. Some of the entries, for example shared libraries, are left as stubs for the loader to complete at runtime.
This stage of linking, the final stage, outputs two things:

exec file
link map

The map tells you where the linker put everything. This is useful, for example, when reading dumps -- it lets you know how to interprete the values in each memory location.

More About Dynamic Linking

The dynamic linking process maps libraries on an as needed approach during a program's execution. In place of an actual call to the function, calls to dynamically linked functions are actually calls into a special stub that calls the function, and performs the runtime linking, as necessary. The following is an example fo the stub code:
8048552:  e8 9d fe ff ff      call   malloc@@SYSVABI_1.3        [0x80483f4]
In the example above the address 0x80483f4 is the address corresponding to the called fucntions entry in the Procedure Linkage Table (PLT). There is one entry in the PLT for each dynamically linked function. These entries contain the code to call the dynamically linked function, if it has already been mapped, or the runtime linker, otherwise. Below is an example of an entry of the PLT:
80483f4:  ff 25 00 96 04 08   jmp    *0x8049600
80483fa:  68 20 00 00 00      pushl  $0x20
80483ff:  e9 a0 ff ff ff      jmp    .-0x5b     [0x80483a4]
The first line of this code is an indirect jump through the functions entry in what is known as the Global Object Table (GOT). If the library has already been mapped, the GOT entry will contain the address of the in memory of the dynamically linked function. As a result, the first jump will, in effect, call the dynamically linked function. In the example above, the GOT for the function, after it has been linked, is shown below:
0x8049600 <_GLOBAL_OFFSET_TABLE_+16>:   0x40068ae
But, before the runtime linker actually runs, the dynamically linked function is not in memory. Instead, each entry of the GOT is initialized to store the address of the second instruction of the corresponding entry of the PLT. Recall that the first instruction was the indirect jump via the GOT entry. Now, realize that the inital case is a jump to the next instruction in the PLT -- this basically makes the jump through the GOT an expensive "no op".
This turns out to be a fortunate thing, because the rest of code in the PLT entry will invoke the dynamic linker to map the function. In this way we see that once a dynamically linked function is linked, we invoke it indirectly via the GOT. But, before it is linked, we invoke the linker indirectly via the GOT. Here's what the GOT looks like after the dynamic linker runs:
0x8049600 <_GLOBAL_OFFSET_TABLE_+16>:   0x80483fa
Recall that the PLT entry looks like this:
80483f4:  ff 25 00 96 04 08   jmp    *0x8049600
80483fa:  68 20 00 00 00      pushl  $0x20
80483ff:  e9 a0 ff ff ff      jmp    .-0x5b     [0x80483a4]
So, the next instruction to execute, if the library hasn't been mapped, is the push. This push supplies a parameter to the dynamic linker -- the index into the relocation table for this function. The relocation table entry, in turn, provides a pointer to the symbol table entry and the GOT entry for the function. This tells the dyncamic linker which function to load and which GOT entry is in use.
The final instruction in the PLT entry is a jump into the 0^th entry in the PLT. This entry performs some more initialization and then invokes the dynamic linekr, which actually maps the library, updates the GOT entry, and invokes the function.
In summary, the PLT uses the GOT entry to invoke the linker on the first call to a dynamically linked function. After linking the dynamically loaded library, the runtime linker changes the GOT entry so subsequent calls via the PLT will invoke the function, itself, not the linker.

printf-E	reference to printf() in a shared library
main()-D	Defined function main()
i-D	Definition of global "int i" as in "int i = 10;"
i-C	Weak definition of global "int i", as in "int i;".