Programming involves creating files called source code files. Most programming involves two additional important types of files:
An executable file contains all program data and instructions and associated information required for producing a memory image to execute a program.
Object files are motivated by the desire to support separate compilation: the capability of breaking up complex programs into smaller pieces that can be compiled or assembled independently.
For a program that is produced by assembling or compiling several source code files, each assembly or compilation of a source code file produces an object code file, which contains the portions of the executable file produced from that source code file. In addition, an object file contains information needed to connect the object files together.
The formats for these file types and the memory image are defined by an operating system. An operating system also provides support software called loaders and linkers for handling these file types. In modern operating systems this software is executed in part dynamically; that is, while the code is executing.
There are some executable files, not considered here, whose formats are not defined by the operating system. These files are handled by interpreters for languages such as Java, Perl, and Ruby.
The format of object and executable files depends on the operating system. Compilers and assemblers have to be adapted for different operating systems in order to generate output that conforms to the appropriate format.
On Microsoft Windows® platforms object file names have a .obj suffix. Compiled and assembled executable file names have a .exe suffix.
On Unix based platforms object file names have a .o suffix. There is no conventional file name suffix for compiled and assembled executable files. A Unix operating system can, however, recognize these file types by special 4-byte codes at the start of the file.
A loader takes an executable file and copies its sections into memory. Then it produces a process control block to control program execution. Finally, it starts executing the code, usually by jumping to its main address.
A loader must be able to
Relevant information must be included in the executable file format.
Some other needs have to be considered in the design of executable file formats:
an executable file uses separate data and text sections
an executable file includes sections for the symbol table and associating executed code with source code
The ability to break up source code for a program into smaller code units separate compilation has two important advantages:
But separate compilation introduces two problems:
When a compiler allocates memory locations for a source code file it starts with addresses just above the addresses set aside for the operating system. Only one of the source code files can use these addresses. The others have to be relocated and their address references have to be changed accordingly.
To be useful, separately compiled files must have references to each other. For example, one file will call subprograms in another file. The addresses of these external references are not known when the files are compiled separately.
Supporting separate compilation requires operating system software to combine the code from multiple compilation steps. This software is called a link editor or, more simply, a linker.
The object files are the result of compiling single source code files.
As is often the case in computer science, the static/dynamic distinction has the following meaning:
In keeping with this common terminology, the linking and loading described earlier is called static linking and loading. Dynamic linking and loading refers to linking and loading done during program execution. Modern operating systems typically use dynamic linking and loading for programming language library functions.
Dynamic linking and loading has three important benefits:
A jump table implementation of dynamic linking and loading is lazy - it defers loading and linking of each subprogram until it is needed. However, the loading and linking is only done once per subprogram. After it is loaded and linked, a subprogram can be called again as many times as needed with negligible overhead.
Multiple dynamically linked subprograms are typically gathered into library files called dynamic link libraries. These files typically use a .dll suffix in the Windows operating system and a .so suffix on Unix-based operating systems.