Sub-pages:

Algorithm

This is the Algorithm that we have as far as the Milestone 3 Report. This section also relies on the topic of Data Dependencies, covered here.

Once the instructions are in memory, the chip should search in ascending order to find an instruction with latency, i.e., a load or a store.  Once this is found, an independent instruction should be placed after the load/store if possible, eliminating the latency.  To do this, we perform these steps:

1.     Set up an array of dependency bits (one per instruction) initialized to 0.

2.     Go through the subsequent instructions, determining whether each is dependent on the latent instruction. If so, the dependency bit is set to 1. Once a bit is set to 1, it cannot be set to 0.

3.     After going through these instructions, the chip then checks if the instruction immediately following the latent instruction is dependent.  If not, then the latency has been eliminated and the job is finished.

4.     If it is dependent, it needs to check the lines following the instruction after the latent instruction and mark whether these are dependent on this instruction.

5.     Then, the chip again checks whether the instruction immediately after this new checkpoint is dependent.  If not, it is moved to the latency position and the job is done.

6.     Otherwise, the cycle continues until a viable solution is found or the chip reaches the last of the input instructions.

The core algorithm may be summarily described with the following pseudo-code:


x=0;     // start with initial instruction
while (x < Last instruction) {

    if instruction[x] != Latent instruction {
         x++;
         continue;     // resume searching for latent instructions
    }

    clear Dependency[] table;      // reset all dependency bits
    Found = 0;     // found an instruction to fill the latency?
    y=x;     // start y at the latent instruction

    while (!Found and y < Last Instruction)
    {

      Jump_destination = 0;    // a jump destination has been found?
      Jump_source = 0;     // a jump source has been found?
      trace = y+1;
      while (trace <= Last Instruction) {

        if instruction[trace] dependent on y
             Dependency[trace] |= 1;
        if instruction[trace] = jump destination
             Jump_destination = 1;
        if instruction[trace] at edge of jump boundary
             Jump_source = 1

        // don’t reorder instructions after a jump src/dest.
        if Jump_destination = 1  or  jump_source = 1
             Dependency[trace] |= 1;

        trace++;

      }
      If Dependency[y+1]==0
      Found=1;
      Else
      y++;

    }

    // reorder the instructions
    if instruction[y+1] is jump source {
         calculate new offset
         if offset overflows
             continue;    // resume searching for latent instructions
    }
    temp_instruction = mem[y+1];
    for (  ; y > x; y--) {
         mem[y+1] = mem[y];
    }
    mem[y+1] = temp_instruction;

    x++;     // keep looking for latent instructions

}
 
An explanation is still required. First, the variables follow as such:

  • x – is the head address that points to latent instructions.
  • y – is the address used to locate movable instructions which have no dependencies that require them to stay put
  • trace – is the address that is used to scan the instructions and mark a dependency bit indicating that they have some sort of dependency that prevents them from being moved.
  • Dependency[] – is a table that contains one bit per instruction indicating whether or not it can be moved
  • instruction[] – is the table containing the original buffered instructions.
  • Found – indicates when a movable instruction is found
  • Jump_destination – indicates when a jump destination has been found during a particular trace.
  • Jump_source - indicates when a jump source has been found during a particular trace.

The algorithm tracks through the instructions trying to find any that have a latency hole that needs to be filled.  When it finds one, it searches down for an instruction that can fill this hole. Since instructions can exist in such a fashion that reordering them will cause the code to behave differently, we have to mark each instruction on whether or not there is another instruction preventing it from being moved into the latency hole. This is the dependency sweep done with the address “trace,” which uses the instruction at address “y” for reference in determining a dependency.  See the section on dependency logic for a further explanation.  After this dependency sweep, the first instruction that isn’t marked with a dependency bit can be moved into the latency hole

A simple process of copying addresses down and then filling the hole made will complete the reorder for the latent instruction.  Then we continue searching. Once finished, we are ready to output the reordered instructions by going through the reorder table.  See the section on the output controller for more details.
 

We have detailed the algorithm as both a state machine and as a flowchart, which you can find here.  There you will also find details on the FSM designed for the algorithm.