System Calls, Signals, & Interrupts

On the previous episode, we began our journey to understand how the OS and hardware work together to work on multiple tasks concurrently. Recall that a process is a currently running instance of a program. Today, we will discuss how processes communicate with the OS.

System Calls

On their own, the only things that processes can do are run computational instructions and access memory. They do not have a direct way to manage other processes, print text to the screen, read input from the keyboard, or access files on the file system. These are privileged operations that can only happen in kernel space. This privilege restriction is important because it puts the kernel in charge of deciding when these actions should be allowed. For example, the OS can enforce access control on files so an untrusted user can’t read every other user’s passwords.

Processes can ask the OS to perform privileged actions on their behalf using system calls. We’ll cover the ISA-level mechanisms for how system calls work soon. For now, however, you can think of a system call as a special C function that calls into kernel space instead of user space. (Calling a “normal” function always invokes code within the process, i.e., either code you wrote yourself or code you imported from a library.)

Each OS defines a set of system calls that it offers to user space. This set of system calls constitutes the abstraction layer between the kernel and user code. (For this reason, OSes typically try to keep this set reasonably small: a simpler OS abstraction is more feasible to implement and to keep secure.)

In this class, we’re using a standardized OS abstraction called POSIX. Many operating systems, including Linux and macOS, implement the POSIX set of system calls. (We’ll colloquially refer to it as “Unix,” but POSIX is the actual name of the standard.)

For a list of all the things your POSIX OS can do for you, see the contents of the unistd.h header. That’s a collection of C functions that wrap the actual underlying system calls.

For example, consider the write function. write is a low-level primitive for writing strings to files. You have probably never called write directly, but you have used printf and fputc, both of which eventually must use the write system call to produce their final output.

Process Management

There are system calls that let processes create and manage other processes. These the big ones we’ll cover here:

exit terminates the current process.
fork clones the current process. So after you fork, there are two nearly identical processes (e.g., with nearly identical heaps and stacks) running that can then diverge and start doing two different things.
exec replaces the current process with a new executable. So after you exec a new program, you “morph” into an instance of that program. exec does not create or destroy processes—the kernel’s list of PCBs does not grow or shrink. Instead, the current process transforms in place to run a different program.
waitpid just waits until some other process terminates.

`fork`

The trickiest in the bunch is probably fork. When a process calls fork(), it creates a new child process that looks almost identical to the current one: it has the same register values, the same program counter (i.e., the same currently-executing line of code), and the same memory contents (heap and stack). A reasonable question you might ask is: do the two processes (parent and child) therefore inevitably continue doing exactly the same thing as each other? What good is fork() if it can only create redundant copies of processes?

Fortunately, fork() provides a way for the new processes to detect which universe they are living in: i.e., to check whether they are the parent or the child. Check out the manual page for fork. The return value is a pid_t, i.e., a process ID (an integer). According to the manual:

On success, the PID of the child process is returned in the parent, and 0 is returned in the child.

This is why I kept saying the two copies are almost identical—the difference is here. The child gets 0 returned from the fork() call, and the parent gets a nonzero pid instead.

This means that all reasonable uses of fork() look essentially like this:


#include <stdio.h>
#include <unistd.h>

int main() {
    pid_t pid = fork();
    if (pid == 0) {  // Child.
        printf("Hello from the child process!\n");
    } else if (pid > 0) {  // Parent.
        printf("Hello from the parent process!\n");
    } else {
        perror("fork");
    }
    return 0;
}

In other words, after your program calls fork(), it should immediately check which universe it is living in: are we now in the child process or the parent process? Otherwise, the processes have the same variable values, memory contents, and everything else—so they’ll behave exactly the same way, aside from this check.

Another way of putting this strange property of fork() is this: most functions return once. fork returns twice!

`exec`

The exec function call “morphs” the current process, which is currently executing program A, so that it instead starts executing program B. You can think of it swapping out the contents of memory to contain the instructions and data from executable file B and then jumping to the first instruction in B’s main.

There are many variations on the exec function; check out the manual page to see them all. Let’s look at a fairly simple one, execl. Here’s the function signature, copied from the manual:


int execl(const char *path, const char *arg, ...);

You need to provide the executable you want to run (a path on the filesystem) and a list of command-line arguments (which will be passed as argv in the target program’s main).

Let’s run a program! Try something like this:


#include <stdio.h>
#include <unistd.h>

int main() {
    if (execl("/bin/ls", "ls", "-l", NULL) == -1) {
        perror("error in exec call");
    }
    return 0;
}

That transforms the current process into an execution of ls -l. There’s one tricky thing in the argument list: by convention, the first argument is always the name of the executable. (This is also true when you look at argv[0] in your own main function.) So the first argument to the execl call here is the path to the ls executable file, and the second argument to execl is the first argument to pass to the executable, which is the name ls. We also terminate the variadic argument list with NULL.

`fork` + `exec` = spawn a new command

The fork and exec functions seem kind of weird by themselves. Who wants an identical copy of a process, or to completely erase and overwrite the current execution with a new program?

In practice, fork and exec are almost always used together. If you pair them up, you can do something much more useful: spawn a new child process that runs a new command. You first fork the parent process, and then you exec in the child (and only the child) to transform that process to execute a new program.

The recipe looks like this:

fork()
Check if you’re the child. If so, exec the new program.
Otherwise, you’re the parent. Wait for the child to exit (see below).

Here that is in code:


#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    pid_t pid = fork();
    if (pid == 0) { // Child.
        if (execl("/bin/ls", "ls", "-l", NULL) == -1) {
            perror("error in exec call");
        }
    } else if (pid > 0) { // Parent.
        printf("Hello from the parent!");
        waitpid(pid, NULL, 0);
    } else {
        perror("error in fork call");
    }
    return 0;
}

This code spawns a new execution of ls -l in a child process. This is a useful pattern for programs that want to delegate some work to some other command. (Don’t worry about the waitpid call; we’ll cover that next.)

`waitpid`

Finally, when you write code that creates new processes, you will also want to wait for them to finish. The waitpid function does this. You supply it with a pid of the process you want to wait for (and, optionally, an out-parameter for some status information about it and some options), and the call blocks until the process somehow finishes.

It’s usually important to waitpid all the child processes you fork. Try deleting the waitpid call from the example above, and then compile and run it. What happens? Can you explain what went wrong when you didn’t wait for the child process to finish?

Signals

Whereas system calls provide a way for processes to communicate with the kernel, signals are the mechanism for the kernel to communicate with processes.

The basic idea is that there are a small list of signal values, each with its own meaning: a thing that the kernel (or another process) wants to tell your process. Each process can register a function to run when it receives a given signal. Then, when the kernel sends a signal to that process, the process interrupts the normal flow of execution and runs the registered function. Some signals also instruct the kernel to take specific actions, such as terminating the program.

There are also system calls that let processes send signals to other processes. (In reality, that means that process A asks the kernel to send the signal to process B.) This way, signals act as an inter-process communication/coordination mechanism.

Here are the functions you need to send signals:

kill(pid, sig): Send sig to process pid.
raise(sig): Send sig to myself.

To receive signals, you set up a signal handler function with the signal function. The arguments are the signal you want to handle and a function pointer to the code that will handle the signal.

Here’s an example of a program that handles the SIGINT signal:


#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>

void handle_signal(int sig) {
    printf("Caught signal %d\n", sig);
    exit(1);
}

int main() {
    signal(SIGINT, handle_signal); // Set up the signal handler for SIGINT.
    while (1) {
        printf("Running. Press Ctrl+C to stop.\n");
        sleep(1);
    }
    return 0;
}

The important bit is this line:


signal(SIGINT, handle_signal);

This line asks the kernel to register a function we’ve written so that it will run in response to the SIGINT signal.

Interrupts

We just discussed signals: the mechanism that the kernel uses to communicate with user-space processes. Recall that, when your process receives a signal, it interrupts the normal flow of execution and runs the signal-handler function that you previously registered. How does this actually work? How does the kernel interfere with the execution of a process in between instructions, take control, and forcibly move the program counter to some other code?

Signals use a more general (and extremely important) mechanism called interrupts. As the name implies, they are the mechanism that the kernel uses to interrupt the execution of a running process, which is otherwise minding its own business and running one instruction after another, and make it do something else.

Here’s a conceptual way to think about how interrupts work. You can think of a CPU as executing a loop: fetch an instruction, execute that instruction, and then go back to the top of the loop. To deal with interrupts, CPU add an extra step to this conceptual loop: fetch an instruction, execute that instruction, check to see if there are any interrupts to handle, and then go back to the top of the loop. That is, you can imagine that there is some place where the CPU can look to see if there is an interrupt to deal with, and it checks for this indicator between the execution of adjacent instructions. When there is an interrupt to handle, the CPU transfers control to some code that can handle the interrupt.

What Are Interrupts For?

The OS and hardware uses interrupts to deal with exception conditions (what happens if your program runs out of memory? or executes an illegal instruction that the CPU cannot interpret?) and to support kernel-mediated services like I/O. Here are a few reasons why interrupts are helpful:

They are more efficient than busy-waiting, i.e., just looping until something happens. If you’re waiting for a packet to arrive from the network, for example, you can execute other work until the packet arrives—at which point the OS can interrupt you to deliver the packet.
They make it possible to handle events in the real world immediately. When the mouse moves, for example, the OS and hardware can interrupt the currently executing process to make sure the cursor appears to move on screen (instead of waiting patiently for the currently-running program to be done, which would make for a terribly janky mouse cursor).
Interrupts are critical for multitasking, i.e., running multiple processes at once. Interrupts are what OS kernels use to perform periodic context switches between concurrent processes to fairly share CPU time between them.

As a result, systems use interrupts for a very wide variety of reasons, some of which are “exceptional” (e.g., when a program tries to execute an illegal instruction or references an unmapped virtual memory address) and others that are totally normal (e.g., to handle I/O or when it’s time to do a context switch).

Requesting Interrupts with System Calls

We also previously discussed system calls: the mechanism that user-space code uses to invoke kernel-space functionality. The underlying mechanism for system calls also uses interrupts. The ISA typically provides a special instruction that processes can use to request an interrupt. When the hardware executes this instruction, it immediately transitions to kernel mode to handle the system call.

To decide which system call to make and to pass arguments to it, OSes define a syscall-specific calling convention. This is different from the ordinary calling convention that governs the calling of ordinary functions. If you’re curious, Linux’s manual page for the syscall C function lists its calling conventions for every architecture that Linux supports.

In RISC-V, the special instruction for making system calls is named ecall. It has no operands. The Linux syscall convention for RISC-V says:

a7 contains the system call number. This decides which kernel functionality we want to invoke. For example, the syscall number for write is 64, and the number for execve is 221.
Arguments to the system call go in a0 through a5.
The return value goes in a0, just like in the “ordinary function” calling convention.

You can see a full list of available system calls on the syscalls(2) manual page. Then, to find the corresponding syscall number, the authoritative source is the unistd.h header file in the Linux source code: search for #define __NR_<call> <number>. You can also try this big, searchable syscall table that covers all the architectures Linux supports (use the “riscv64” column). The corresponding manual page tells you the arguments for the syscall, expressed as a C function signature.

An Example

Let’s handcraft a system call in RISC-V assembly using ecall.

We will use the Linux write system call to output characters to the console. If we look in unistd.h, it tells us that the syscall number for write is 64. The manual page says that this system call takes 3 arguments:


ssize_t write(int fd, const void buf[.count], size_t count);

There is the file descriptor, a pointer to the characters to output, and the number of characters. The file descriptor 0 is the standard output stream, i.e., it’s how we print to the console. Let’s write a function that always outputs to file descriptor 0 and always prints exactly 1 character. Here are the assembly instructions we need:


  addi a7, x0, 64  # syscall number: write
  addi a0, x0, 0   # first argument: fd
  mv   a1, t0      # second argument: buf
  addi a2, x0, 1   # third argument: count
  ecall

We set the syscall number register, a7, to 64. Then we provide the three arguments: file descriptor 0, a pointer (here I’m assuming it comes from t0), and length 1. Finally, we use ecall to actually invoke the syscall.

Here’s a complete assembly file that wraps these instruction in a function for printing one-character strings:


.global printone
printone:
  mv t0, a0        # save the function argument: a character pointer

  # Make a system call: write(0, t0, 1)
  addi a7, x0, 64  # syscall number: write
  addi a0, x0, 0   # first argument: fd
  mv   a1, t0      # second argument: buf
  addi a2, x0, 1   # third argument: count
  ecall

  ret

You can use this assembly from C code by writing a function declaration for it, like this:


int printone(char* c);

int main() {
    printone("h");
    printone("i");
    printone("\n");
    return 0;
}

You can compile and run the whole program by combining the C file and the assembly file:


$ rv gcc -o printone printone.c printone.s

This program prints something to the console without ever importing any headers or using the C standard library at all. Pretty cool!

CS 3410