Chapter 7: Runtime Environments

Review note for Compiler Theory

Previous chapters focus on the properties of the source language. We study how a compiler generates the target language in this chapter.

Runtime environment: the structure of the target computer’s registers and memory. There are three kinds of runtime environments:

  • Fully static environment, like FORTRAN77
  • Stack-based environment, like C, Pascal and Ada
  • Fully dynamic environment, like LISP

A compiler must maintain the environment indirectly by generating code to perform the necessary maintenance operations during program execution.

7.1 Memory Organization During Program Execution

In most compiled languages, the code area and the data area in the RAM are conceptually separate. The code area can be visualized as follows:

image-20220505180354688

However, the data allocation is mostly unfixed (or dynamic) at compile time, except for global and static data.

  • Small compile-time constants (like 0 or 1) are inserted directly into the code and are not allocated any data space.
  • Large constants, especially string literals, are usually allocated memory in the global or static area.

The runtime storage of the whole memory can be:

image-20220505212322245

A procedure activation record must contain the following sections:

image-20220505212452938

Activation records may be allocated in the static area (FORTRAN77), the stack area (C), or the heap area (LISP). Activation records kept in the stack area are called stack frames.

Calling sequence: the sequence of operations that must occur when a procedure or a function is called, e.g., allocating memory for the activation record, computing and storing arguments, and storing and setting necessary registers to effect the call.

Return sequence: the similar set of operations that occur when a procedure or a function returns.

Important aspects of the design of the calling sequence:

  • How to divide the calling sequence between the caller and the callee
  • To what extent to rely on processor support for calls rather than generating explicit code for each step of the calling sequence

7.2 Fully Static Runtime Environments

The simplest kind of runtime environment in which all data are static. Such an environment can be used to implement a language that:

  • Does not have pointers or dynamic allocation (because all variables can be accessed via fixed addresses)
  • Does not support recursive invocation of functions or procedures (because each procedure has only a single activation record allocated statically prior to execution)

7.3 Stack-Based Runtime Environments

In a language that supports recursion, activation records must be allocated in a stack-based fashion. New activation records are pushed upon invocation of procedure, and are popped when the call exits.

7.3.1 Stack-Based Environments Without Local Procedures

In a language where all procedures are global, a stack-based environment requires:

  • A pointer to the current activation record (called the frame pointer, usually kept in the register)
  • A pointer in the current activation record to the caller’s activation record (called the control link or dynamic link), to allow the recovery of the previous activation record when the call exits
  • Sometimes, the stack pointer which points to the top of the allocated call stack

Access to names: parameters and local variables cannot be accessed at a fixed location. Instead, they are found by offset from the current frame pointer. The offset is often statically computed by the compiler. Nonlocal and static names are accessed in other ways (directly, or by offset from some other base pointer).

The calling sequence

  1. Compute the arguments and store them in the correct position in the new activation record (usually by pushing them in order onto the stack)
  2. Push the current fp as the control link in the new activation record
  3. Change the fp so it now points to the top of the stack (by copying sp to fp)
  4. Store the return address in the new activation record
  5. Perform a jump to the code of the called procedure

The exiting sequence

  1. Copy the fp to the sp (this effectively pops any local variables, the return address, etc.)
  2. Load the control link into thefp
  3. Perform a jump to the return address
  4. Change the sp to pop the arguments

Dealing with variable-length data:

  • If the number of arguments varies from call to call, we may:
    • Push the arguments in reverse order onto the stack, and specify the number of remaining parameters in the first parameter. This ensures that the first parameter is located at a fixed offset from the fp.
    • Use a processor mechanism such as the argument pointer.
  • If a variable-length array is supported, we use a level of indirection – store a fix-sized pointer to the array, and perform the allocation in a way that can be managed by the sp. This requires the compiler to calculate the size of the local variables.

Nested declarations: variables declared in a nested code block (in C) are allocated on the stack as the block is entered and deallocated on exit.

7.3.2 Stack-Based Environments with Local Procedures

The runtime environment described above does not support nonlocal, nonglobal variables.

We use an extra piece of bookkeeping information called the access link (static link) to locate nonlocal, nonglobal variables. The access link points to the activation record that represents the defining environment of the procedure. For example:

function p() {
  let n: integer;
  
  function q() {
    console.log("Do something");
  }
  
  function r(n: number) {
    q();
  }
  
  n = 1;
  r(2);
}

p();

The access links of both q and r will point to the activation record of p. Since p is a global procedure, it does not need an access link.

Access chaining: repeatedly fetch the access link to search for a variable. To make access chaining work, the compiler must determine the nesting levels before accessing the name locally. For example:

function p() { // nesting level: 0
  let x: number; // nesting level: 1
  
  function q() { // nesting level: 1
    function r() { // nesting level: 2
      let y: number; // nesting level: 3
      x = 2;
      if (/* Some condition */) {
        p();
      }
    }
    
    r();
  }
  
  q();
}

p();

If r() wants to access x, two hops along the access links are required. Variables at nesting level 0 can be accessed by the direct methods previously discussed.

The calling sequence: during the call, the access link is pushed onto the runtime stack just before the fp, and after an exit, the sp must be adjusted by an extra amount to remove the access link as well as the arguments. The access link is obtained during runtime by using the (compile-time) nesting level information attached to the declaration of the procedure being called.

7.3.3 Stack-Based Environments with Procedure Parameters

If a procedure can be passed as a parameter, it is impossible for a compiler to generate code to compute the access link at the time of the call. The access link for the procedure to be passed must be precomputed and passed along with a pointer to the code for the procedure. This means that a procedure parameter must contain two pointers, an instruction pointer and an environment pointer. This pair of pointers is called a closure.

For example:

function p(a: Function) {
  a();
}

function q() {
  let x: number;
  
  function r() {
    console.log(x);
  }
  
  x = 2;
  p(r);
}

q();

A compiler may avoid the distinction between ordinary procedures and procedure parameters and keep all procedures as closures in the environment. That is, the <ip, ep> representation is required for all procedures.

7.4 Dynamic Memory

7.4.1 Fully Dynamic Runtime Environments

Problems of stack-based environments: dangling references (to variables and local procedures).

A fully dynamic environment can deallocate activation records only when all references to them have disappeared. This requires activation records to be dynamically freed at an arbitrary time during execution. It involves the tracking of references during execution and the ability to find and deallocate inaccessible areas of memory at arbitrary times during execution (known as garbage collection).

The additional complexity of a dynamic environment can be encapsulated in a memory manager.

7.4.2 Dynamic Memory in OO Languages

OO languages can have varied requirements for the runtime environment. For example, Smalltalk requires a fully dynamic environment, while C++ maintains a stack-based environment. In both languages, an object is a cross between a record structure and an activation record.

Ways of implementing an object:

  • Copy all the currently inherited features (data and methods) into the record structure; extremely wasteful of space.
  • Use an inheritance graph: keep a description of the class structures in memory, with inheritance maintained by superclass pointers and method pointers kept as fields in the class structure. Each object then keeps, along with its instance variables, a pointer to its defining class, through which all methods (local & inherited) are found. Con: the methods do not have predictable offsets and must be maintained by a lookup table. (why?)
  • Compute the list of code pointers for methods, and store this in (static) memory as a virtual function table. Each object now holds a pointer to the virtual function table instead of the class structure. This has the advantage that each method has a predictable offset, so lookups can be saved.

7.4.3 Heap Management

Heap management is important, even in a stack-based environment. A heap provides two operations, allocate and free. allocate takes a size parameter and returns a pointer to a block of memory of the correct size. free takes a pointer to an allocated block of memory and marks it as being free again.

Heaps can be implemented by a circular linked list. There are two main issues:

  • If the user passes an invalid pointer (that is not allocated by allocate) to free, the heap can get corrupted.
  • Without coalescing the free list, the heap can get fragmented and allocation of a large block will fail.

7.4.4 Automatic Management of the Heap

Garbage collection methods:

  • Mark and sweep: no memory is freed until a call to malloc fails, at which a garbage collector looks for all storage that can be referenced and frees all unreferenced storage. The process consists of two passes: mark used memory space by iterating all pointers, and sweep the unused memory space. Defragmentation required.
  • Stop and copy: similar to above, but move all reached blocks to one end of the heap, so that only one pass is required, and defragmentation is saved.
  • Generational garbage collection: adds a permanent storage area to the reclamation scheme. Allocated objects that survive long enough are copied into the permanent storage and never deallocated. Only a small amount of objects need to be considered.

7.5 Parameter Passing Mechanisms

Parameters correspond to locations in the activation record, which are bound to arguments (parameter values) by the caller. There are several parameter passing mechanisms.

Parameters can be evaluated from left to right, from right to left, or in arbitrary order.

7.5.1 Pass by Value

The arguments are expressions that are evaluated at the time of the call, and their values become the values of the parameters during the execution of the procedure.

In some languages (like Ada), parameters behave as constants and cannot be assigned to or used as local variables. In some other languages (like C and Pascal), parameters can be viewed as initialized local variables, but changes do not affect the caller. To achieve nonlocal changes, C offers a mechanism to explicitly pass the address of a variable.

Pass by value requires no special effort on the part of the compiler, because it is straightforward.

7.5.2 Pass by Reference

The arguments must be variables with allocated locations. This scheme passes the location of the variable so that the parameter becomes an alias for the argument, and changes to the parameter occur to the argument as well.

Pass by reference does not require a copy to be made. This can be significant when the value to be copied is a large structure.

Pass by reference requires that the compiler compute the address of the argument, which is then stored in the local activation record.

7.5.3 Pass by Value-Result

The value of the argument is copied and used in the procedure, and then the final value of the parameter is copied back out to the location of the argument when the procedure exits. This achieves the same effect as passing by reference.

Pass by value-result is distinguishable from pass bye reference in the presence of aliasing:

void p(int x, int y) {
  ++x;
  ++y;
}

int main() {
  int a = 1;
  p(a, a);
  return 0;
}

a is 3 if pass by reference is used, while a is 2 if pass by value-result is used.

Passing by value-result requires some modifications to the structure of the runtime stack and the calling sequence:

  • The activation record cannot be released by the callee, because the local values to be copied out must be available to the caller.
  • The caller must either push the addresses of the arguments as temporaries onto the stack, or it must recompute these addresses on return from the called procedure.

7.5.4 Pass by Name

Also called delayed evaluation, because, in this scheme, the argument is not evaluated until its actual use (as a parameter) in the called program.

int i;
int a[10];

void p(int x) {
  ++i;
  ++x;
}

int main() {
  i = 1;
  a[1] = 1;
  a[2] = 2;
  p(a[i]);
  return 0;
}

The result of the call to p is that a[2] is set to 3 and a[1] is left unchanged.

The text of an argument at the point of call is viewed as a function in its own right, which is evaluated every time the corresponding parameter name is reached in the code of the called procedure. The argument will always be evaluated in the environment of the caller, while the procedure will be executed in its defining environment.

留下评论

注意 评论系统在中国大陆加载不稳定。

回到顶部 ↑