Control Flow in Assembly

So far, all the assembly programs we’ve written have been straight-line code, in the sense that they always run one instruction after the other. That’s like writing C without any control flow: no if, for, while, etc. This lecture is about the instructions that exist in RISC-V to implement control-flow constructs.

Branch If Equal

For most instructions, when the processor is done running that instruction, it proceeds onto the next instruction (incrementing the program counter by 4 on RISC-V, because every instruction is 4 bytes). A branch instruction is one that can choose whether to do that or to execute some other instruction of your choosing instead. One example is the beq instruction, which means branch if equal:

beq rs1, rs2, label

The first two operands are registers, and beq checks whether the values are equal. The third operand is a label, which we’ll look closer at in a moment, but it refers to some other instruction. Then:

  • If the two registers hold equal values, then go to the instruction at label.
  • If they’re not equal, then just go to the next instruction (add 4 to the PC) as usual.

Labels appear in your assembly code like this:

my_great_label:

That is, just pick a name and put a : after it. This labels a specific instruction so that a branch can refer to it.

Here’s an example:

  beq x1, x2, some_label
  addi x3, x3, 42
some_label:
  addi x3, x3, 27

This program checks whether x1 == x2. If so, then it immediately executes the last instruction, skipping the second instruction. Otherwise, it runs all 3 instructions in this listing in order (it adds 42 and then adds 27 to x3).

In other words, you can imagine this assembly code implementing an if statement in C:

if (x1 != x2) {
  x3 += 42;
}
x3 += 27;

Other Branches and Jumps

You should read the RISC-V spec to see an exhaustive list of branch instructions it supports. Here are a few, beyond beq:

  • bne rs1, rs2, label: Branch if the registers are not equal.
  • blt rs1, rs2, label: Branch if rs1 is less than rs2, treated as signed (two’s complement) integers.
  • bge rs1, rs2, label: Like that, but with “greater than.”
  • bltu and bgtu are similar but do unsigned integer comparisons.

You will also encounter unconditional jumps, written j label. Unlike branches, j doesn’t check a condition; it always immediately transfers control to the label.

Implementing Loops

We have already seen how branches in assembly can implement the if control-flow construct. There are also all you need to implement loops, like the for and while constructs in C. We’ll see a worked example in this section.

Consider this loop that sums the values in an array:

int sum = 0;
for (int i = 0; i < 20; i++) {
  sum += A[i];
}

And imagine that A is declared as an array of ints:

int A[20];

Imagine that the A base pointer is in x8. Here’s a complete implementation of this loop in RISC-V assembly:

  add x9, x8, x0         # x9 = &A[0]
  add x10, x0, x0        # sum = 0
  add x11, x0, x0        # i = 0
  addi x13, x0, 20       # x13 = 20
Loop:
  bge x11, x13, Done
  lw x12, 0(x9)          # x12 = A[i]
  add x10, x10, x12      # sum += x12
  addi x9, x9, 4         # &A[i+1]
  addi x11, x11, 1       # i++
  j Loop
Done:

The important instructions for implementing the loop are the bge (branch if greater than or equal to) and j (unconditional jump) instructions. The former checks the loop condition i < 20, and the latter starts the next execution of the loop.

We have included comments to indicate how we implemented the various changes to variables. Here are some observations about this implementation:

  • We have chosen to put sum in register x10 and i in x11.
  • The x13 register just holds the number 20. We need it in a register so we can compare i < 20 with the bge instruction.
  • The x9 register is a little funky. It starts out storing the A base address, but then the pointer moves by 4 bytes on every loop iteration (with addi). The idea is that it always stores the address &A[i], i.e., a pointer to the \(i\)th element of the A array on the \(i\)th iteration. So to load the value A[i], we just need to load this address with lw.