Linking and Loading

Programming involves creating files called source code files. Most programming involves two additional important types of files:

Executable files
An executable file contains all program data and instructions and associated information required for producing a memory image to execute a program.
Object files
Object files are motivated by the desire to support separate compilation: the capability of breaking up complex programs into smaller pieces that can be compiled or assembled independently.

For a program that is produced by assembling or compiling several source code files, each assembly or compilation of a source code file produces an object code file, which contains the portions of the executable file produced from that source code file. In addition, an object file contains information needed to connect the object files together.

The formats for these file types and the memory image are defined by an operating system. An operating system also provides support software called loaders and linkers for handling these file types. In modern operating systems this software is executed in part dynamically; that is, while the code is executing.

There are some executable files, not considered here, whose formats are not defined by the operating system. These files are handled by interpreters for languages such as Java, Perl, and Ruby.

The format of object and executable files depends on the operating system. Compilers and assemblers have to be adapted for different operating systems in order to generate output that conforms to the appropriate format.

On Microsoft Windows^® platforms object file names have a .obj suffix. Compiled and assembled executable file names have a .exe suffix.

On Unix based platforms object file names have a .o suffix. There is no conventional file name suffix for compiled and assembled executable files. A Unix operating system can, however, recognize these file types by special 4-byte codes at the start of the file.

Some other needs have to be considered in the design of executable file formats:

For security separate data and instructions are desirable:
an executable file uses separate data and text sections
Programmers want symbolic debugging capability:
an executable file includes sections for the symbol table and associating executed code with source code

The ability to break up source code for a program into smaller code units separate compilation has two important advantages:

It reduces compilation time for incremental changes.
It simplifies source code navigation for maintenance changes.

But separate compilation introduces two problems:

Relocation
When a compiler allocates memory locations for a source code file it starts with addresses just above the addresses set aside for the operating system. Only one of the source code files can use these addresses. The others have to be relocated and their address references have to be changed accordingly.
External references
To be useful, separately compiled files must have references to each other. For example, one file will call subprograms in another file. The addresses of these external references are not known when the files are compiled separately.

Supporting separate compilation requires operating system software to combine the code from multiple compilation steps. This software is called a link editor or, more simply, a linker.

It produces an executable file from several object files.
It relocates separately compiled code segments.
It resolves external references.

The object files are the result of compiling single source code files.

They contain data and text sections like executable files.
The start address is omitted from all object files except the one containing the main program.
They contain symbol tables for resolution of external references.
They contain relocation tables for code addresses that need to be relocated.

As is often the case in computer science, the static/dynamic distinction has the following meaning:

Static describes something done before program execution. For example, assemblers statically allocate memory for variables declared with .word, .float, .double, and .asciiz directives.
Dynamic describes something done during program execution. For example, memory allocated by a sbrk syscall is allocated dynamically.

In keeping with this common terminology, the linking and loading described earlier is called static linking and loading. Dynamic linking and loading refers to linking and loading done during program execution. Modern operating systems typically use dynamic linking and loading for programming language library functions.

Dynamic linking and loading has three important benefits:

Software always uses latest versions of shared libraries.
Executable files are smaller. They do not include the shared libraries.
The total memory footprint for multiple processes is reduced. With virtual memory, different programs using the same library function only need a single copy in physical memory. If designed carefully the shared library subprograms can have different logical addresses in different programs.

A jump table implementation of dynamic linking and loading is lazy - it defers loading and linking of each subprogram until it is needed. However, the loading and linking is only done once per subprogram. After it is loaded and linked, a subprogram can be called again as many times as needed with negligible overhead.

A jump table contains an entry for each dynamically loaded subprogram.
Each entry contains executable code.
Each call to a dynamically linked subprogram is coded as a jump and link whose target address is the appropriate entry in the jump table.
Before its subprogram is loaded, an entry just contains a system call and its setup instructions for loading the required subprogram.
After a dynamically loaded subprogram is loaded, its jump table entry is replaced by a jump to the entry address for the subprogram.

Multiple dynamically linked subprograms are typically gathered into library files called dynamic link libraries. These files typically use a .dll suffix in the Windows operating system and a .so suffix on Unix-based operating systems.