A11: `shall`

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

Overview

With this assignment, we hope to demystify how the essential, OS-adjacent parts of a computer system work. Your task in this assignment is to implement a minimal but functional command-line shell called shall (“shall we execute some commands?”). The shall syntax resembles that of the original Bourne shell (sh) and its successor, bash. Further, it supports a realistically large subset of the capabilities provided by these real-world tools, including I/O redirection, foreground/background process execution, and a limited form of interrupt handling.

Provided Files

The source files included with the release of this assignment contain most of the code for running shall commands. In particular, all the code associating with reading and parsing shell commands is complete. Your job is to write the code for spawning processes and redirecting I/O. In addition, you should improve the code for reading files.

myspin.c, myint.c, mysplit.c — starter code for Task 0; all of your lab work will be done in these files.
shall.h — header file used by all other source files in this distribution.
shall.c — main driver for shall; you won’t modify this file
parser.c, token.c — code for lexical analysis and parsing shall commands; again, you won’t modify anything in these files
exec.c — code for running the built-in commands, spawning new processes, I/O redirection, etc.; almost all of your work will be done in this file
reader.c — code for implementing a very basic, 1-character-at-a-time input reader; you will make an improved version of this

Run rv make to create the executable ‘shall’, and then run rv qemu shall to run it.

Getting Started

To get started, obtain the release code by cloning your assignment repository from GitHub:


$ git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/<NETID>_shall.git

Replace <NETID> with your NetID. All the letters in your NetID should be in lowercase.

Background

Real-World Shells

In this section, we’ll use examples from a real-world shell, bash, which is perhaps the most common shell program in the Unix/Linux world. The syntax of the shall you implement is close to, but not quite the same as, the examples given in this section.

It’s easy to forget that a shell, despite its centrality as the user interface to a computer, is “just another normal program.” It is not part of the kernel, and it does not have any kind of special privileges. It processes commands that you type and then uses standard OS facilities to launch other programs on your behalf.

For our purposes, a shell is an interactive command-line interpreter. It prints a prompt, waits for the user to enter a command line on its standard input stream, and then carries out some action that the command describes.

A command is a string, consisting of whitespace-separated words, like ls -l somedir. The first word (ls in our example) is the command: either a special built-in command name or the name of an executable file to launch. The remaining words (-l and somedir in the example) are arguments to pass to the command. The command receives this list of arguments as strings and can do anything it likes with them. (That’s what the argc and argv arguments to C’s main function receive.)

Built-in commands are implemented and run by the shell itself. Some shell built-ins in “real” Unix shells you may have used before include set, source, exit, and alias.

Most of the time, however, commands refer to actual executable files (i.e., compiled programs) that exist in the filesystem. For example, ls, though seemingly an essential piece of functionality, is the name of an executable file in most shells, not a built-in command. Your shell has a set of directories it looks in to find executables.

Built-in Command Or Separate Program?

Most shells have a which command that shows you the location of a program. You can also use it to distinguish between shell commands that are built in vs. actual executables.


$ which exec
$ which source
$ which alias
$ which python
/home/jhl287/anaconda3/bin/python
$ which ls
/usr/bin/ls
$ which which
/usr/bin/which

You can also type the full path to any executable to use it. On my machine, for example, /usr/bin/ls -l _<somedir>_ is equivalent to ls -l _<somedir>_.

A shell’s main purpose is to launch and manage processes that execute these shell commands. In general, a single command might entail launching multiple processes—together, we’ll call this group of processes a job. For example, you can type ls | head -n 2 to combine the ls and head executables (if you want to see only the first 2 files in a directory); the job for this command consists of an ls process and a head process with the standard output of the first connected to the standard input of the second.

Foreground / Background Jobs

Usually commands run in the foreground: the shell waits for them to complete before showing you another prompt. Unix shells also support launching long-running commands in the background. This way, you can continue to use the shell while the background command is executing.

To run a command in the background, put a & at the end. For example, try typing this in your computer’s “real” shell:


$ sleep 5

The sleep command runs for 5 seconds in the foreground, during which you can’t type any new commands. Now try this version:


$ sleep 5 &

Your shell will print out some information about the background job it launched, and then it will immediately print another prompt and let you type more commands.

Job Control

Because you can have any number of background jobs running at once, real-world shells provide job control features to manage them. In Linux systems, the most useful of these are the built-in commands jobs, fg, and bg. You can also interrupt the current foreground job by typing CTRL-C; similarly, CTRL-Z will pause the current foreground job and send it to the background. It is an interesting and highly rewarding exercise to implement these features, but doing so is beyond the scope of this assignment.

Signals, Interrupts

Recall that signals are the mechanism that the kernel uses to communicate with processes. In C, you can use the following two functions from signal.h to handle and send signals:

signal(signum,sig_handler) — register the function sig_handler to respond to the signal signum
kill(pid,sig) — send sig to process pid

Task 0 (Lab): Implementing Test Programs with System Calls

View the lab slides

here.

In lab, you will write three C programs that will help get you accustomed to writing programs which make system calls.

myspin.c: Sleeps for n seconds in 1-second chunks.
myint.c: Sleeps for n seconds and sends SIGINT to itself.
mysplit.c: Fork a child process that sleeps for n seconds in 1-second chunks.

These programs will also help you test your shell implementation. The problem with “normal” programs, like /bin/ls, is that they usually consist of a single process that finishes immediately—so they aren’t very useful for testing the way background jobs behave.

These test programs sleep repeatedly for 1-second intervals. That means making n calls to the C sleep() function in a loop. The reason is that this strategy will make the programs more responsive to signals—they can handle signals between adjacent sleep() calls.

Step 1: `myspin`

First, you’ll write a program named myspin which sleeps for n seconds in 1-second chunks. Starter code has been provided for you in myspin.c.

You will need to use the sleep() function to tell the program to go to sleep. Since your program should only sleep in one-second chunks, you should only ever call sleep(1).

myspin should print out spinning ... before sleeping for n 1-second chunks. After myspin is done sleeping, it should print out spun. on the same line, and then a newline. You will not be penalized for extra spaces or newlines. The program should exit with a return status of 0.

You may also want to browse the C standard library headers for other useful functions.

You can compile myspin by running rv make myspin. Running rv qemu myspin <N> should sleep for N seconds.


$ rv qemu myspin 3
spinning ... spun.

Once you’re done, check-in with a TA before moving on.

Step 2: `myint`

Next, you’ll implement myint, which sleeps for n seconds and then sends SIGINT to itself (which will cause the program to exit). This program should again only sleep in one-second intervals. Starter code has been provided for you in myint.c.

Use the kill() function from signal.h to send the signal. (Contrary to the name, the kill() function can send any signal—not just SIGKILL.) The first argument to the kill() function is the process ID (pid) where the signal should be sent; the getpid() function function can help with this. The second argument is the signal to send; use the SIGINT macro for this value.

Test your program by compiling it with rv make myint and then run rv qemu myint 2 or similar. If you want to check that your program is actually getting interrupted, try this:


$ rv sh -c 'qemu myint 2 ; echo $?'
130

The special $? shell variable stores the exit code of the command that exited last. An exit code of 130 means that the process received SIGINT.

Step 3: `mysplit`

Finally, you’ll write mysplit which spawns a subprocess that sleeps for N seconds (again in 1-second chunks) and then waits for that subprocess to finish.

For example,


$ rv qemu mysplit 3
parent creating child and waiting
child is sleeping
child woke up
parent is done

Use the fork() function from unistd.h to launch a subprocess. Your program will need to behave differently in the parent process and in the child process. In the child process, use your same old sleep loop to wait for N seconds. In the parent process, use the waitpid() function to block until the child process finishes.

The waitpid() function takes three arguments: the process ID to wait for, an “out-parameter” stat_loc for the status of the subprocess, and an options parameter for extra flags. You don’t need either of the latter things, so you can pass a null pointer and 0 for them—so waitpid(your_child_process_id, NULL, 0) will suffice; indeed you can even get away with the simpler command wait(NULL), which is equivalent in this case (though in general systems programming, using waitpid is a better idea).

mysplit should print the following four lines in order:

parent creating child and waiting should be printed right before calling fork() in the parent process.
child is sleeping should be printed by the child process right before it goes to sleep.
child woke up should be printed by the child process immediately after waking up from its N 1-second naps.
parent is done should be printed by the parent process after the child process terminates.

You can compile mysplit by running the command rv make mysplit. It behaves similarly to myspin; it should sleep for N seconds when you pass N as the command-line argument.

Here’s one way that you can check that your mysplit implementation is actually launching a subprocess:


$ rv sh -c 'qemu mysplit 5 & (sleep 1 ; ps ax; sleep 5)'
parent creating child and waiting
child is sleeping
    PID TTY      STAT   TIME COMMAND
      1 pts/0    Ss     0:00 /sbin/docker-init -- sh -c qemu mysplit 5 & (sleep 1 ; ps ax; sleep 5)
      7 pts/0    S+     0:00 sh -c qemu mysplit 5 & (sleep 1 ; ps ax; sleep 5)
      8 pts/0    Sl+    0:00 qemu mysplit 5
      9 pts/0    S+     0:00 sh -c qemu mysplit 5 & (sleep 1 ; ps ax; sleep 5)
     12 pts/0    Sl+    0:00 qemu mysplit 5
     14 pts/0    R+     0:00 ps ax
child woke up
parent is done

The above invocation launches mysplit 5 in the background, sleeps for a second, and then uses the ps command to list the processes running on the machine. If mysplit is working correctly, you should see two different qemu mysplit 5 processes running with different pids. If you try this with myspin instead, there should only be one corresponding process.

Using `shall`

This section describes how you can interact with the shall shell. You’ll need to refer back to this section for Task 1. Shall we begin?

Built-in Commands

As in bash and other “real-world” shells, shall has a short list of built-in commands: cd, source, exec, and exit. exec and exit have already been implemented for you. You’ll implement cd and source in Task 1.


cd dir
    change the working directory to directory 'dir'

source script
    read commands from file script

exec cat exec.c
    same as 'cat exec.c', but without forking, so the 'shall' is
    replaced with 'cat exec.c' and doesn't return

exec > exec.out
    this doesn't run anything, but redirects standard output to the
    file exec.out. Further commands that are executed now have their
    standard output redirected to file exec.out.

exit 3
    exit 'shall' with status 3. If no status is specified, 'shall'
    exits with status 0.

Other Commands

shall also has access to every executable file that the parent shell process from which you launched shall does. For example,


$ rv qemu shall
-> ls -l
total 212
-rw-rw-r-- 1 1001 1001   342 Apr  7 14:48 Makefile
-rw-rw-r-- 1 1001 1001  8159 Apr  7 17:37 exec.c
-rw-r--r-- 1 root root 20592 Apr  7 16:10 exec.o
-rwxr-xr-x 1 root root  9040 Apr  7 16:10 myint
-rw-rw-r-- 1 1001 1001   706 Apr  2 17:31 myint.c
-rwxr-xr-x 1 root root  9032 Apr  7 16:10 myspin
-rw-rw-r-- 1 1001 1001   546 Apr  2 17:26 myspin.c
<output omitted>
-> cat exec.c
<output omitted: contents of exec.c>
-> which ls
/usr/bin/ls
-> qemu myspin 3
spinning ... spun.
-> apt
apt 2.4.13 (amd64)
Usage: apt [options] command

apt is a commandline package manager and provides commands for
<output ommitted>

Real-World Shells

Anything that you can run from the shell invoked by the rv alias can be similarly invoked at the shall prompt. However, programs like emacs, vim,etc., which make extensive use of TTY escape characters, will probably misbehave.

Shortcuts aren’t supported

Shells like bash, sh, Windows Powershell, and others usually provide a set of features to simplify the typing of commands. The ones you’re most familiar with are probably tab completion and wildcard patterns (for example, “ls *.c” to display all and only the C source files in a folder). Most of these “extras” aren’t supported by the shall parser. Another one you won’t be able to try is the use of parentheses to group commands together (as in the “qemu mysplit 5 & (sleep 1 ; ps ax)”) example, above.

Background commands

shall also supports running commands as a background process. To do so, terminate your command with an &:


-> qemu myspin 5 &
Process 11 running in background
-> spinning ...
-> spun.

You can even invoke multiple background commands by using the ; separator. For example,


-> qemu myspin 5 & ; qemu myspin 3 &
Process 106 running in background
Process 108 running in background
-> spinning ... spinning ... spun.
spun.

I/O Redirection

By default, a shell command receives its input from a keyboard, and it displays conventional and diagnostic output to a terminal window. All three of these are actually file handles that the process reads from (stdin) or writes to (stdout and stderr). It is possible to change a program’s input source and/or either of its output streams, using a variety of built-in redirection operators. You will implement a fairly complete set of the standard redirection operators, though in some cases, we have changed the syntax to simplify the parsing tasks:

cat exec.c > exec.out — execute cat exec.c but create the file exec.out, and write all output to it, instead of stdout. If exec.out already exists, this replaces the original contents with the output of the cat exec.c command.
cat exec.c >> exec.out — execute cat exec.c, but append output to file exec.out, if it already exists. If exec.out does not yet exist, it should be created.
cat nosuchfile.c {2}> exec.err — execute cat nosuchfile.c but redirect error output only (file descriptor 2) to file exec.err (for example, in case nosuchfile.c does not exist). Note that the format is a little different from the standard shell, where the command would have been cat nosuchfile.c 2> exec.err.
cat < exec.c — execute cat without arguments, but take standard input from file exec.c.
cat exec.c > exec.err {2}>{1} — write both error messages and standard output to file exec.err. In the standard shell, the command is cat exec.c > exec.err 2>&1.

Task 1

Complete the implementation of shall by implementing the following functions in exec.c:

void interrupts_disable() — Disable responses to the SIGINT signal. This one has been completed for you.
void interrupts_enable() — Enable handling of the SIGINT signal. For the simple version of signal handling here, it suffices to use the SIG_DFL handler.
void interrupts_catch() — Install a custom signal handler for SIGINT. You can use the implementation of sighandler that is part of the assignment distribution.
void redir_fd(fd1,fd2) — This implements redirection of fd1 output to the input of fd2 (the > operator).
void redir_file(name, fd,flags) — Used for redirection of input and/or output when fd is itself a file descriptor.
void spawn(command, background) — Spawn and run command, in either the background or foreground.
void cd(command) — The built-in cd command; change the current working directory to command->argv[1], or to the directory in environment variable $HOME if command->argv[1] is null.
void source(command) — The built-in source command; read and execute commands from the specified files in the list of arguments of the command.

Your work in each procedure is delimited by the // BEGIN and // END pairs. After this task, you should have a working shall that can execute the examples above. You can build shall by running rv make. Then, you can run shall by executing rv qemu shall.

Important Details

shall is usually in one of two modes: it is either waiting for input, or it is waiting for processes to finish executing. It cannot wait for both at the same time.
At any point in time, there is at most one foreground process. If there is a foreground process, shall waits for the foreground process to finish executing before going back to waiting for input. Only input can cause shall to start processes.
shall itself should catch interrupts and print a message when it happens, such as “got signal 2” (2 is the signal number of interrupts).
shall should disable interrupts for commands that run in the background (using &). For command that run in the foreground, interrupts should generally cause the programs to finish executing.

Hints

When running a command in the background, have shall print its process identifier. For example
```
process 36877 running in background
```
If a command finishes normally with exit status 0, shall simply prints a new “prompt” (-> in the case of shall). However, if a program returns a non-zero status or a program exits abnormally (say, due to an interrupt), have shall print a message. Examples include:
```
process 36689 exited with signal 2
process 36889 exited with status 1
```
It may also be useful for shall to print information about background processes that terminate, even if they exit normally with exit status 0:
```
process 36889 exited with status 0
```
For the spawn() function, you are expected to use the fork() and wait() system calls as well as macros WIFSIGNALED() and WEXITSTATUS(). You should invoke interrupts_disable() for processes that run in the background.
Use wait() rather than waitpid(), as you will want to wait for any process terminating, including background processes. There’s no need to catch SIGCHLD. Note that termination of background processes will only be discovered and reported while waiting for a foreground process to terminate.
To execute a command, first invoke redir(command) (which redirects I/O), then execute(command) (which invokes system call execv() to run the command).
Most of the redir() function is already written, but you need to fill in the code in functions redir_file() and redir_fd(). All of the execute() function is already written. If redirection fails, use _exit(1) to exit the forked process with a failure status of 1. In case of redir_file(), don’t forget to close the original file descriptor returned by open().
For cd(), you are expected to use the C library function getenv() to obtain the $HOME environment variable in case no directory is specified in the cd command.
For source(), use open() and close(). Once you have a file descriptor fd, you can use the following code to invoke the interpreter:
```
reader_t reader = reader_create(fd);
interpret(reader, 0);
reader_free(reader);
```
Don’t forget to implement the remaining interrupt functions.

Task 2

The reader struct defined in reader.c uses the read() system call to read one character at a time. This is highly inefficient because each time read() is called, a system call is made, which then causes a context switch to occur.

Your job in this task is to modify the reader struct and associated functions reader_create, reader_next, and reader_free to read up to 512 characters from the file descriptor fd at a time.

However, reader must function the same way as before, meaning, the reader’s interface must stay the same. For example, reader_next() should still return a single character.

Submission

You need to submit five files to Gradescope:

myspin.c, myint.c, and mysplit.c (Task 0)
exec.c, which will contain your solution to Task 1
reader.c, with your improved reader (Task 2)

Rubric

Task 1: 61 points
Task 2: 15 points

Attribution

This assignment was adapted from a project by Prof. Robbert van Renesse.