Calling Functions in Assembly
Pseudo-Instructions
While assembly languages mostly have a 1-1 correspondence to some processor’s machine code, sometimes it’s helpful for the assembly language to have a few convenient features that just make it easier for humans to read and write. The primary such feature in RISC-V assembly is its pseudo-instructions. A pseudo-instruction is an assembly-language instruction that does not actually correspond to any distinct machine-code instruction (with its own opcode and such).
Here are some common pseudo-instructions:
mv rd, rs1
: Copy the value of registerrs1
into registerrd
.li rd, imm
: Put the immediate valueimm
into registerrd
.nop
: A no-op: do nothing at all.
All three of these pseudo-instructions are equivalent to special cases of the addi
instructions:
mv rd, rs1
does the same thing asaddi rd, rs1, 0
li rd, imm
isaddi rd, x0, imm
nop
isaddi x0, x0, 0
Try to convince yourself that these addi
instructions do in fact work to implement these pseudo-instructions’ semantics.
The RISC-V assembler translates pseudo-instructions into their equivalent real instructions for you. So you can write li x11, 42
and that will translate to exactly the same machine-code bits as addi x11, x0, 42
.
Why doesn’t RISC-V implement these pseudo-instructions as real, distinct instructions? By keeping the number of instructions small, it simplifies the hardware—especially the decode stage—making it smaller, faster, and more efficient.
Functions in Assembly
With branching control flow, we can accomplish a lot in RISC-V assembly.
We can “fake” if
statements, for
loops, and so on.
But one thing we can’t do yet is call functions.
That’s what this lecture is about.
Here’s an example C program we can work with:
int addfn(int a, int b) {
return a + b;
}
int main() {
int sum1, sum2;
sum1 = addfn(1, 2);
sum2 = addfn(3, 4);
printf("sum1=%d and sum2=%d\n", sum1, sum2);
}
You already know how to implement the body of the addfn
function in RISC-V.
But nothing we’ve done so far will let us call that code multiple times with different arguments, as main
does in this example.
Calling a function is a multi-step process, and it requires collaboration between both the caller code and the callee code (the function being called). At a high level, every function call needs to follow these steps:
- The caller puts arguments in a place where the callee function can access them.
- The caller transfers control to the callee (i.e., it jumps to the first instruction in the function).
- The function creates a stack frame to hold its own local variables.
- The function actually does stuff: i.e., the function body.
- The function puts the return value in a place where caller can access it. It also restores any registers it used to the state the caller expects. And finally, it releases the stack frame that holds its local variables.
- The callee returns control to the caller (i.e., jumps to the next instruction in the caller right after the function call).
The caller and callee need to agree on all the details for how this multi-step process works. For example, they must agree on which registers hold the arguments and which registers hold the return value. A standardized protocol for how to implement all these details is called a calling convention. The RISC-V ISA itself defines a particular calling convention, which we will learn about in this lecture. C compilers that generate RISC-V code also use the same calling convention to implement function definitions and function calls—and because it’s standardized, even functions compiled by different C compilers can call each other.
The RISC-V Calling Convention
We’ll break down the components next, but here are the most important parts of the RISC-V calling convention:
- Arguments go in registers
a0
througha7
(a.k.a.x10
throughx17
). (In fact, that is why these registers have an alternative name starting with an “a”! It’s for argument.) - Return values also go in registers
a0
anda1
. (Yes, this means that functions overwrite their arguments with their return values before they return.) - Register
ra
(a.k.a.x1
) holds the return address: the address of the next instruction to run after the function call finishes. - Registers
s1
throughs11
(a.k.a.x9
, andx18
throughx27
) are callee-saved registers. This means that callers can safely expect that, after they make a call and the call returns, the registers will be carefully restored to the value they had before the call. - Registers
t0
throught6
(a.k.a.x5
tox7
, andx28
throughx31
) are temporary registers. This means that callee functions can use these registers without saving them. If the caller needs the contents of these temporary registers after the callee returns, then the caller has to save them before making a function call to the callee. As a result, these temporary registers are called caller-saved registers.
Control Flow for Call and Return
Let’s start with the basic mechanism for transferring control:
jumping from the caller to the callee and then back.
The interesting thing is that the branch instructions we’ve seen so far, such as beq
, won’t suffice.
The problem is that functions, by their very nature, can be called from multiple locations.
Like in our example above:
sum1 = addfn(1, 2);
sum2 = addfn(3, 4);
Imagine that we implemented both of these calls with a plain unconditional jump, j
, like this.
Then the calls might look like this:
li a0, 1;
li a1, 2;
j addfn;
mv <register containing sum1>, a0;
mv a0, 3;
mv a1, 4;
j addfn;
mv <register containing sum2>, a0;
All those li
instructions would take care of setting up the argument registers and mv
consuming the return-value register.
We imagine here that addfn
is an assembly-language label that points to the start of the addfn
function’s instructions.
There’s a problem.
In the implementation of the addfn
function, how do we know where to jump back to?
After each call is done, we need to transfer control to the next instruction after the jump.
Even if we inserted labels on those instructions, if there is only a single block of instructions to implement addfn
, those instructions would need to contain j <label>
to return.
But somehow it would need to pick a different label for each call, which is impossible!
The solution is to designate a register to hold the return address for the call.
Instead of just using j
to call a function, we’ll do two things:
- Record the next instruction’s address as the return address, in register
ra
. - Jump to the first instruction of the called function.
Then, to return, the function just needs to jump to the instruction address in register ra
.
Regardless of who called the function, doing this will suffice to transfer control to the point right after the call.
RISC-V has instructions to support these strategies: both the call and the return.
For the call, you use the jal
instruction (the mnemonic stands for jump and link):
jal rd, label
The jal
instruction does the two things we need for a call:
- Put the address of the next instruction after the
jal
into registerrd
. - Unconditionally jump to
label
.
So our function calls will generally look like jal ra, <function label>
.
Then, to return from a function, we’ll use the jr
instruction (the mnemonic means jump register):
jr rs1
The jr
unconditionally jumps to the address stored in the register rs1
.
So function returns generally look like jr ra
.
In fact, this pattern is so common that RISC-V has pseudo-instructions for function calls and returns:
jal label
: short forjal ra, label
call label
: like the above, but with an extraauipc
instruction so it supports larger PC offsetsret
: short forjr ra
(Going one level deeper, it turns out that jr rs1
is itself a pseudo-instruction that is short for jalr x0, 0(rs1)
. But that’s not really important for learning about function calls.)
Managing the Stack
Beyond just jumping around, functions also have another important responsibility: they need to keep track of the their local variables. As you already know, local variables go in stack frames on the call stack. You also know that the stack is a region in memory grows downward (from higher memory addresses to lower ones) when we call functions, and it shrinks when function calls return. This section is about the bookkeeping that functions must to do create and use their stack frames.
The central idea is that we must use a register to keep track of the address of our current stack frame.
According to the RISC-V calling convention, register sp
(a.k.a. x2
) contains the address of the top (the smallest address since the stack grows down) of the current stack frame. Further, the RISC-V calling convention has a frame pointer register, fp
, that contains the address of the bottom of the stack frame (the fp has a higher address than the fp since the stack grows down).
Code interacts with sp
and fp
in three main ways:
- At the beginning of the function, it will “push a stack frame onto the call stack” by moving
sp
downward to make space for its own stack frame. Remember, this stack frame will contain the function’s local variables. - During the execution of the function, it will use (positive) offsets on
sp
to locate each of its local variables. So you’ll see stuff likeld a7, 16(sp)
andsd a9, 40(sp)
to load and store local variables using offsets fromsp
. Equivalently, negative offsets can be used with thefp
to access any local variable within a stack frame. The advantage of using thefp
versus thesp
is that the offsets to values on the stack are constant relative to thefp
, where as the offsets may change relative to thesp
. Note that according the RISC-V calling convention,fp
is optional, but in the cs3410 2025sp it is required. - At the end of the function, before it returns, it will “pop the stack frame off the call stack” by moving
sp
back up to wherever it used to be, “destroying” its stack frame. No memory literally gets destroyed, of course, but adjustingsp
back to its pre-call value indicates that we’re done using all our local variables, and it lets the caller locate its own stack frame.
This means that functions usually look like this:
func_label:
addi sp, sp, -16
sd ra, 8(sp)
sd fp, 0(sp)
addi fp, sp, 8
...
ld fp, 0(sp)
ld ra, 8(sp)
addi sp, sp, 16
ret
or, equivalently:
func_label:
addi sp, sp, -16
addi fp, sp, 8
sd ra, 0(fp)
sd fp, -8(fp)
...
ld fp, -8(fp)
ld ra, 0(fp)
addi sp, sp, 16
ret
The addi
at the top and bottom of the function “creates” and “destroys” (a.k.a. “push” and “pop”) the stack frame.
The function’s code must know how big its stack frame needs to be:
in this case, it’s 16 bytes, so we move the stack pointer down by 16 bytes at the beginning and back up by the same 16 bytes at the end.
The stack frame size needs to be big enough to contain the function’s local variables, for instance, space the return address and frame pointer, ra
, fp
;
C compilers compute this stack-frame size for you by adding up the size of all the local variables you declare.
Further, when the stack frame is “created” (“pushed”), the return address, ra
, and frame pointer, fp
, are stored on the stack, then the ra
and fp
are restored before the stack frame is “destroyed” (“popped”).
- Why is
ra
stored on the stack? Storingra
on the stack allows functions to be called recursively. For instance, assume we did not storera
on the stack andmain
callsaddfn
andaddfn
callsprintf
, what would happen tora
? Whenmain
callsjal addfn
(orcall addfn
),ra
will contain the return address inmain
. Then, whenaddfn
callsprintf
,jal printf
(orcall printf
) will overwritera
. Next, whenprintf
returns toaddfn
andaddfn
wants to return tomain
the contents ofra
will have been “clobbered” and there will be no way foraddfn
to return tomain
. Fortunately, however, by storingra
on the stack,addfn
will restorera
from the stack, which will contain the address back tomain
.
Passing Arguments
RISC-V provides a consistent way of passing arguments and receiving the result of a subroutine invocation.
In particular, args a0
to a7
are used for arguments and a0
and a1
are used for return values. Note that a0
and a1
are both argument and value-return registers; as a result, the contents of argument registers in general are “clobbered” and not preserved.
If a function has more than eight arguments, then the arguments are “spilled” to the stack. The calling convention allocates space for all arguments on the child stack frame, placing the first eight args in registers a0
to a7
and “spills” any remaining args to the child stack frame. This means that space is allocated on the stack for the first eight args, even though that space is not initially used since the arg registers are used instead. Allocating space on the stack for all args is particular useful for functions with variable length inputs such as printf(“Scores: %d %d %d\n”, 1, 2, 3);
and to treat the arguments as an array in memory.
Let’s see an example for passing ten arguments:
int addfn(int a, int b, int c, int d, int e, int f, int g, int h, int i, int j) {
return a + b + c + d + e + f + g + h + i + j;
}
int main(){
sum = addfn(0, 1, 2, 3, 4, 5, 6, 7, 8 9);
printf("%d\n", sum);
}
assembly for main
calling addfn
:
main:
li a0, 0
li a1, 1
...
li a7, 7
li t0, 8
sd t0, -16(sp)
li t0, 9
sd t0, -8(sp)
jal addfn
The stack with respect to the caller will look like:
-8(sp): 9
-16(sp): 8
-24(sp): space for a7
-32(sp): space for a6
-40(sp): space for a5
-48(sp): space for a4
-56(sp): space for a3
-64(sp): space for a2
-72(sp): space for a1
-80(sp): space for a0
In particular, the caller passes the first eight args in registers a0-a7
and “spills” the ninth and tenth args to the stack and makes room for all ten args on the stack. Further, note that args are passed on the callee (child) stack frame.
Leaf Functions
Note that if a function does not call another function, then it is a leaf function. addfn
functions above are all leaf functions. It is possible for leaf functions not to push or pop a stack frame. That is, not to adjust the sp
, or save the ra
, fp
, any args on the stack. A leaf function can use temporary caller-save (t
) registers since they do not need to be saved before using them. But, a leaf function that does not have a stack frame cannot use callee-save (s
) registers since callee-save registers require saving them on the stack before using them.
Calling Convention Example
Let’s go through a couple calling convention examples. First, assume that we have the code below:
int test(int a, int b) {
int tmp = (a&b)+(a|b);
int s = sum(tmp,1,2,3,4,5,6,7,8);
int u = sum(s,tmp,b,a,b,a);
return u + a + b;
}
Next, let’s pretend that we are the RISC-V C compiler and write the assembly for the above test
function:
To proceed, we will complete the following steps:
- write the assembly for the Body of the function
- Determine stack frame size
- Complete Prologue/Epilogue that performs the stack frame push/pop
Calling Convention Body Example
In this first step, we will write the Body for test
# Prologue:
# stack frame size = sizeof(registers) bytes x (2x args + 2x (ra/fp) + 0x #callee-save registers [+ 1x of temporary caller-save regsters stored on the stack])
# = 8 bytes x 5 = 40 bytes
#
# stack frame layout
# 32(sp): a1 (b)
# 24(sp): a0 (a)
# 16(sp): ra
# 8(sp): fp
# 0(sp): t0
# Body
# store args a and b
SD a0, 24(sp) # a
SD a1, 32(sp) # b
# int tmp = (a&b)+(a|b);
AND t0, a0, a1
OR t1, a0, a1
ADD t0, t0, t1
# store tmp
SD t0, 0(sp)
# int s = sum(tmp,1,2,3,4,5,6,7,8);
MV a0, t0
LI a1, 1
LI a2, 2
...
LI a7, 7
LI t1, 8
SD t1, -8(sp) # spill ninth arg to the child stack frame
JAL sum
# restore tmp, a, b
LD t0, 0(sp) # tmp
LD t1, 24(sp) # a
LD t2, 32(sp) # b
# int u = sum(s,tmp,b,a,b,a);
MV a0, a0 # s
MV a1, t0 # tmp
MV a2, t2 # b
MV a3, t1 # a
MV a4, t2 # b
MV a5, t1 # a
JAL sum
# restore a and b
LD t1, 24(sp) # a
LD t2, 32(sp) # b
# add u (a0), a (t1), b (t2)
ADD a0, a0, t1 # u + a
ADD a0, a0, t2 # u + a + b
# a0 = u + a + b
# Epilogue
Several notes for the above assembly of test
.
a
andb
were stored in the space allocated for them on the stack.a
andb
had to be restored several times becausea0
anda1
are temporary caller-save. I.e. after the call tosum1
andsum2
,a
andb
had to be restored.tmp
, stored int0
, needed to be saved in thetest
stack frame sincet0
is a temporary caller-save register andt0
(tmp
) is needed after the first call tosum
returns.- The ninth argument (value
8
) had to be spilled to the child stack frame. InstructionsLI t1, 8
andSD t1, -8(sp)
store the value8
on the child stack frame.
Calling Convention Prologue/Epilogue Example
Next, let’s take a look how to create and destory (push and pop) the stack frame for test
in the prologue and epilogue, respectively.
# stack frame layout
# 32(sp): b (a1)
# 24(sp): a (a0)
# 16(sp): ra
# 8(sp): fp
# 0(sp): t0
test:
# Prologue
ADDI sp, sp, -40 # allocate stack frame
SD ra, 16(sp) # save ra
SD fp, 8(sp) # save old fp
ADDI fp, sp, 32 # set new frame pointer
# Body
...
#Epilogue
LD fp, 8(sp) # restore fp
LD ra, 16(sp) # restore ra
ADDI sp, sp, 40 # dealloc frame
ret # JR ra
The test
stack frame size is 40 bytes, which is space to store the two args, a
and b
, ra/fp
, and tmp
variable. Further, in the prologue and epilogue, only ra
and fp
are stored. The arguments for test
, a
and b
, and tmp
(t0
) are stored on the stack in the # Body
.
Another consideration is the total number of stores and loads for this implementation of test
. Specifically, there are two stores and two loads in the prologue/epilogue and three stores and five loads in the body for a total of five stores (SD
) and seven loads (LD
).
Calling Convention Example 2
Now let’s look at a different implementation for test
. It is the same C code for test
, but a different assembly implementation. In this assembly, we will use callee-save registers (s
) to save on access to memory, and, hopefully, reduce the number of stores/loads (SD/LD
). The stack size may increase because we need to save the callee-save registers before we use them, but there may be less overall stores/loads.
# Prologue
# stack frame size = sizeof(registers) x (2x args + 2x (ra/fp) + 3x callee-save registers [+ 0x temporary caller-save regsters stored on the stack])
# = 8 bytes x 7 = 56 bytes
#
# stack frame layout
# 48(sp): b
# 40(sp): a
# 32(sp): ra
# 24(sp): fp
# 16(sp): s3
# 8(sp): s2
# 0(sp): s1
# Body
# store args in callee-save registers s1 and s2
MV s1, a0 # a
MV s2, a1 # b
# int tmp = (a&b)+(a|b);
AND s3, a0, a1
OR t1, a0, a1
ADD s3, s3, t1 # store tmp in a callee-save register s3
# int s = sum(tmp,1,2,3,4,5,6,7,8);
MV a0, s3
LI a1, 1
LI a2, 2
...
LI a7, 7
LI t1, 8
SD t1, -8(sp) # spill ninth arg to the child stack frame
JAL sum
# int u = sum(s,tmp,b,a,b,a);
MV a0, a0 # s
MV a1, s3 # tmp
MV a2, s2 # b
MV a3, s1 # a
MV a4, s2 # b
MV a5, s1 # a
JAL sum
# add u (a0), a (s1), b (s2)
ADD a0, a0, s1 # u + a
ADD a0, a0, s2 # u + a + b
# a0 = u + a + b
# Epilogue
In this assembly, there is space allocated for args a
and b
; however, we use callee-save registers s1
and s2
for a
and b
instead. As a result, the body of test
has one store (SD
) and zero loads (LD
) in the body. Note that test
still needs to spill the ninth argument on the stack before calling sum
.
Calling Convention Prologue/Epilogue Example 2
Now, let’s take a look at the prologue and epilogue to push and pop the test
stack frame for this second implementation.
# stack frame layout
# 48(sp): b
# 40(sp): a
# 32(sp): ra
# 24(sp): fp
# 16(sp): s3
# 8(sp): s2
# 0(sp): s1
test:
# Prologue
ADDI sp, sp, -56 # allocate stack frame
SD ra, 32(sp) # save ra
SD fp, 24(sp) # save old fp
SD s3, 16(sp) # store callee-save reg s1
SD s2, 8(sp) # store callee-save reg s2
SD s1, 0(sp) # store callee-save reg s3
ADDI fp, sp, 48 # set new frame pointer
# Body
...
#Epilogue
LD s1, 0(sp) # restore s1
LD s2, 8(sp) # restore s2
LD s3, 16(sp) # restore s3
LD fp, 24(sp) # restore fp
LD ra, 32(sp) # restore ra
ADDI sp, sp, 56 # dealloc frame
ret # JR ra
In this assembly, the test
stack frame size is 56 bytes, which is space to store the two args, a
and b
, ra/fp
, and space for three callee-save (s
) registers. We store s1-s3
so that we can use them a
, b
, and tmp
. variable.
In terms of the total number of stores and loads, there are five stores and five loads in the prologue/epilogue and one store and zero loads in the body for a total of six stores (SD
) and five loads (LD
), reducing the total number of loads by two compared to the prior assembly.
Summary and Cheat Sheet for the RISC-V Calling Convention
- first eight args passed in registers
a0
,a1
, … ,a7
- Space for args passed in childs’s stack frame
- return value (if any) in
a0
,a1
- stack frame at
sp
- contains
ra
(clobbered on JAL to sub-functions) - contains
fp
- contains local vars (possibly clobbered by sub-functions)
- contains space for incoming args
- contains
- Saved registers (callee save regs) are preserved
- Temporary registers (caller save) regs are not
- Global data accessed via
gp
RISC-V Registers
- Return address:
x1
(ra
) - Stack pointer:
x2
(sp
) - Frame pointer:
x8
(fp/s0
) - First eight arguments:
x10-x17
(a0-a7
) - Return result:
x10-x11
(a0-a1
) - Callee-save free regs:
x18-x27
(s2-s11
) - Caller-save free regs:
x5-x7
,x28-x31
(t0-t6
) - Global pointer:
x3
(gp
) - Thread pointer:
x4
(tp
)