A11: shall
Instructions:
Remember, all assignments in CS 3410 are individual.
You must submit work that is 100% your own.
Remember to ask for help from the CS 3410 staff in office hours or on Ed!
If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt
file along with your submission.
The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.
Overview
With this assignment, we hope to demystify how the essential, OS-adjacent parts of a computer system work. Your task in this assignment is to implement a minimal but functional command-line shell
called shall
(“shall we execute some commands?”). The shall
syntax resembles that of the original Bourne shell (sh
) and its successor, bash
. Further, it supports a realistically large subset of the capabilities provided by these real-world tools, including I/O redirection, foreground/background process execution, and a limited form of interrupt handling.
Provided Files
The source files included with the release of this assignment contain most of the code for running shall
commands. In particular, all the code associating with reading and parsing shell commands is complete. Your job is to write the code for spawning processes and redirecting I/O. In addition, you should improve the
code for reading files.
myspin.c
,myint.c
,mysplit.c
— starter code for Task 0; all of your lab work will be done in these files.shall.h
— header file used by all other source files in this distribution.shall.c
— main driver forshall
; you won’t modify this fileparser.c
,token.c
— code for lexical analysis and parsingshall
commands; again, you won’t modify anything in these filesexec.c
— code for running the built-in commands, spawning new processes, I/O redirection, etc.; almost all of your work will be done in this filereader.c
— code for implementing a very basic, 1-character-at-a-time input reader; you will make an improved version of this
Run rv make
to create the executable ‘shall’, and then run rv qemu shall
to run it.
Getting Started
To get started, obtain the release code by cloning your assignment repository from GitHub:
$ git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/<NETID>_shall.git
Replace <NETID>
with your NetID. All the letters in your NetID should be in lowercase.
Background
In this section, we’ll use examples from a real-world shell, bash
, which is perhaps the
most common shell program in the Unix/Linux world. The syntax of the shall
you implement
is close to, but not quite the same as, the examples given in this section.
It’s easy to forget that a shell, despite its centrality as the user interface to a computer, is “just another normal program.” It is not part of the kernel, and it does not have any kind of special privileges. It processes commands that you type and then uses standard OS facilities to launch other programs on your behalf.
For our purposes, a shell is an interactive command-line interpreter. It prints a prompt, waits for the user to enter a command line on its standard input stream, and then carries out some action that the command describes.
A command is a string, consisting of whitespace-separated words, like ls -l somedir
.
The first word (ls
in our example) is the command:
either a special built-in command name or the name of an executable file to launch.
The remaining words (-l
and somedir
in the example) are arguments to pass to the
command. The command receives this list of arguments as strings and can do anything it likes with them.
(That’s what the argc
and argv
arguments to C’s main
function receive.)
Built-in commands are implemented and run by the shell itself.
Some shell built-ins in “real” Unix shells you may have used before include set
, source
, exit
, and alias
.
Most of the time, however, commands refer to actual executable files (i.e., compiled programs) that exist in the filesystem. For example, ls
, though seemingly an essential piece of functionality, is the name of an executable file in most shells, not a built-in command. Your shell has a set of directories it looks in to find executables.
Most shells have a which
command that shows you the location of a program. You can also use it to distinguish between shell commands that are built in vs. actual executables.
$ which exec
$ which source
$ which alias
$ which python
/home/jhl287/anaconda3/bin/python
$ which ls
/usr/bin/ls
$ which which
/usr/bin/which
You can also type the full path to any executable to use it.
On my machine, for example, /usr/bin/ls -l _<somedir>_
is equivalent to ls -l _<somedir>_
.
A shell’s main purpose is to launch and manage processes that execute these shell commands.
In general, a single command might entail launching multiple processes—together, we’ll call this group of processes a job.
For example, you can type ls | head -n 2
to combine the ls
and head
executables (if you want to see only the first 2 files in a directory);
the job for this command consists of an ls
process and a head
process with the standard output of the first connected to the standard input of the second.
Foreground / Background Jobs
Usually commands run in the foreground: the shell waits for them to complete before showing you another prompt. Unix shells also support launching long-running commands in the background. This way, you can continue to use the shell while the background command is executing.
To run a command in the background, put a &
at the end.
For example, try typing this in your computer’s “real” shell:
$ sleep 5
The sleep
command runs for 5 seconds in the foreground, during which you can’t type any new commands.
Now try this version:
$ sleep 5 &
Your shell will print out some information about the background job it launched, and then it will immediately print another prompt and let you type more commands.
Because you can have any number of background jobs running at once, real-world shells provide job control features to manage them. In Linux systems, the most useful of these are the built-in commands jobs
, fg
, and bg
. You can also interrupt the current foreground job by typing CTRL-C
; similarly, CTRL-Z
will pause the current foreground job and send it to the background. It is an interesting and highly rewarding exercise to implement these features, but doing so is beyond the scope of this assignment.
Signals, Interrupts
Recall that signals are the mechanism that the kernel
uses to communicate with processes. In C, you can use the following two
functions from signal.h
to handle and send signals:
signal(signum,sig_handler)
— register the functionsig_handler
to respond to the signalsignum
kill(pid,sig)
— sendsig
to processpid
Task 0 (Lab): Implementing Test Programs with System Calls
View the lab slides
here.
In lab, you will write three C programs that will help get you accustomed to writing programs which make system calls.
myspin.c
: Sleeps forn
seconds in 1-second chunks.myint.c
: Sleeps forn
seconds and sends SIGINT to itself.mysplit.c
: Fork a child process that sleeps forn
seconds in 1-second chunks.
These programs will also help you test your shell implementation.
The problem with “normal” programs, like /bin/ls
, is that they usually consist of a single process that finishes immediately—so they aren’t very useful for testing the way background jobs behave.
These test programs sleep repeatedly for 1-second intervals.
That means making n
calls to the C sleep()
function in a loop.
The reason is that this strategy will make the programs more responsive to signals—they can handle signals between adjacent sleep()
calls.
Step 1: myspin
First, you’ll write a program named myspin
which sleeps for n
seconds in
1-second chunks. Starter code has been provided for you in myspin.c
.
You will need to use the sleep()
function to tell the program to go
to sleep. Since your program should only sleep in one-second chunks, you should
only ever call sleep(1)
.
myspin
should print out spinning ...
before sleeping for n
1-second chunks.
After myspin
is done sleeping, it should print out spun.
on the same line, and then a newline.
You will not be penalized for extra spaces or newlines.
The program should exit with a return status of 0.
You may also want to browse the C standard library headers for other useful functions.
You can compile myspin
by running rv make myspin
. Running rv qemu myspin <N>
should sleep for N
seconds.
$ rv qemu myspin 3
spinning ... spun.
Once you’re done, check-in with a TA before moving on.
Step 2: myint
Next, you’ll implement myint
, which sleeps for n
seconds and then sends
SIGINT to itself (which will cause the program to exit). This program should again only sleep in one-second intervals.
Starter code has been provided for you in myint.c
.
Use the kill()
function from signal.h
to send the signal.
(Contrary to the name, the kill()
function can send any signal—not just SIGKILL.)
The first argument to the kill()
function is the process ID (pid) where the signal should be sent; the getpid()
function function can help with this.
The second argument is the signal to send; use the SIGINT
macro for this value.
Test your program by compiling it with rv make myint
and then run rv qemu myint 2
or similar.
If you want to check that your program is actually getting interrupted, try this:
$ rv sh -c 'qemu myint 2 ; echo $?'
130
The special $?
shell variable stores the exit code of the command
that exited last.
An exit code of 130 means that the process received SIGINT
.
Step 3: mysplit
Finally, you’ll write mysplit
which spawns a subprocess that sleeps for N
seconds (again in 1-second chunks) and then waits for that subprocess to
finish.
For example,
$ rv qemu mysplit 3
parent creating child and waiting
child is sleeping
child woke up
parent is done
Use the fork()
function from unistd.h
to launch a subprocess.
Your program will need to behave differently in the parent process and in the child process.
In the child process, use your same old sleep
loop to wait for N
seconds.
In the parent process, use the waitpid()
function to block until the child process finishes.
The waitpid()
function takes three arguments: the process ID to wait for,
an “out-parameter” stat_loc
for the status of the subprocess,
and an options
parameter for extra flags.
You don’t need either of the latter things, so you can pass a null pointer and 0 for them—so
waitpid(your_child_process_id, NULL, 0)
will suffice; indeed you can even get away with the simpler
command wait(NULL)
, which is equivalent in this case (though in general systems programming, using
waitpid
is a better idea).
mysplit
should print the following four lines in order:
parent creating child and waiting
should be printed right before callingfork()
in the parent process.child is sleeping
should be printed by the child process right before it goes to sleep.child woke up
should be printed by the child process immediately after waking up from itsN
1-second naps.parent is done
should be printed by the parent process after the child process terminates.
You can compile mysplit
by running the command rv make mysplit
.
It behaves similarly to myspin
; it should sleep for N
seconds when you pass N
as the command-line argument.
Here’s one way that you can check that your mysplit
implementation is actually launching a subprocess:
$ rv sh -c 'qemu mysplit 5 & (sleep 1 ; ps ax; sleep 5)'
parent creating child and waiting
child is sleeping
PID TTY STAT TIME COMMAND
1 pts/0 Ss 0:00 /sbin/docker-init -- sh -c qemu mysplit 5 & (sleep 1 ; ps ax; sleep 5)
7 pts/0 S+ 0:00 sh -c qemu mysplit 5 & (sleep 1 ; ps ax; sleep 5)
8 pts/0 Sl+ 0:00 qemu mysplit 5
9 pts/0 S+ 0:00 sh -c qemu mysplit 5 & (sleep 1 ; ps ax; sleep 5)
12 pts/0 Sl+ 0:00 qemu mysplit 5
14 pts/0 R+ 0:00 ps ax
child woke up
parent is done
The above invocation launches mysplit 5
in the background, sleeps for a second, and then uses the ps
command to list the processes running on the machine.
If mysplit
is working correctly, you should see two different qemu mysplit 5
processes running with different pids.
If you try this with myspin
instead, there should only be one corresponding process.
Using shall
This section describes how you can interact with the shall
shell. You’ll need
to refer back to this section for Task 1. Shall we begin?
Built-in Commands
As in bash
and other “real-world” shells, shall
has a short list of built-in
commands: cd
, source
, exec
, and exit
. exec
and exit
have already
been implemented for you. You’ll implement cd
and source
in Task 1.
cd dir
change the working directory to directory 'dir'
source script
read commands from file script
exec cat exec.c
same as 'cat exec.c', but without forking, so the 'shall' is
replaced with 'cat exec.c' and doesn't return
exec > exec.out
this doesn't run anything, but redirects standard output to the
file exec.out. Further commands that are executed now have their
standard output redirected to file exec.out.
exit 3
exit 'shall' with status 3. If no status is specified, 'shall'
exits with status 0.
Other Commands
shall
also has access to every executable file that the parent shell process
from which you launched shall
does. For example,
$ rv qemu shall
-> ls -l
total 212
-rw-rw-r-- 1 1001 1001 342 Apr 7 14:48 Makefile
-rw-rw-r-- 1 1001 1001 8159 Apr 7 17:37 exec.c
-rw-r--r-- 1 root root 20592 Apr 7 16:10 exec.o
-rwxr-xr-x 1 root root 9040 Apr 7 16:10 myint
-rw-rw-r-- 1 1001 1001 706 Apr 2 17:31 myint.c
-rwxr-xr-x 1 root root 9032 Apr 7 16:10 myspin
-rw-rw-r-- 1 1001 1001 546 Apr 2 17:26 myspin.c
<output omitted>
-> cat exec.c
<output omitted: contents of exec.c>
-> which ls
/usr/bin/ls
-> qemu myspin 3
spinning ... spun.
-> apt
apt 2.4.13 (amd64)
Usage: apt [options] command
apt is a commandline package manager and provides commands for
<output ommitted>
Anything that you can run from the shell invoked by the rv
alias can be similarly invoked at the shall
prompt. However, programs like emacs, vim,etc., which make extensive use of TTY escape characters, will probably misbehave.
Shells like bash
, sh
, Windows Powershell, and others usually provide a set of features to simplify the typing of commands. The ones you’re most familiar with are probably tab completion and wildcard patterns (for example, “ls *.c
” to display all and only the C source files in a folder). Most of these “extras” aren’t supported by the shall
parser. Another one you won’t be able to try is the use of parentheses to group commands together (as in the “qemu mysplit 5 & (sleep 1 ; ps ax)
”) example, above.
Background commands
shall
also supports running commands as a background process. To do so,
terminate your command with an &
:
-> qemu myspin 5 &
Process 11 running in background
-> spinning ...
-> spun.
You can even invoke multiple background commands by using the ;
separator. For example,
-> qemu myspin 5 & ; qemu myspin 3 &
Process 106 running in background
Process 108 running in background
-> spinning ... spinning ... spun.
spun.
I/O Redirection
By default, a shell command receives its input from a keyboard, and it displays conventional and diagnostic output to a terminal window. All three of these are actually file handles that the process reads from (stdin
) or writes to (stdout
and stderr
). It is possible to change a program’s input source and/or either of its output streams, using a variety of built-in redirection operators. You will implement a fairly complete set of the standard redirection operators, though in some cases, we have changed the syntax to simplify the parsing tasks:
-
cat exec.c > exec.out
— executecat exec.c
but create the fileexec.out
, and write all output to it, instead ofstdout
. Ifexec.out
already exists, this replaces the original contents with the output of thecat exec.c
command. -
cat exec.c >> exec.out
— executecat exec.c
, but append output to fileexec.out
, if it already exists. Ifexec.out
does not yet exist, it should be created. -
cat nosuchfile.c {2}> exec.err
— executecat nosuchfile.c
but redirect error output only (file descriptor 2) to fileexec.err
(for example, in casenosuchfile.c
does not exist). Note that the format is a little different from the standard shell, where the command would have beencat nosuchfile.c 2> exec.err
. -
cat < exec.c
— executecat
without arguments, but take standard input from fileexec.c
. -
cat exec.c > exec.err {2}>{1}
— write both error messages and standard output to fileexec.err
. In the standard shell, the command iscat exec.c > exec.err 2>&1
.
Task 1
Complete the implementation of shall
by implementing the following functions
in exec.c
:
void interrupts_disable()
— Disable responses to theSIGINT
signal. This one has been completed for you.void interrupts_enable()
— Enable handling of theSIGINT
signal. For the simple version of signal handling here, it suffices to use theSIG_DFL
handler.void interrupts_catch()
— Install a custom signal handler forSIGINT
. You can use the implementation ofsighandler
that is part of the assignment distribution.void redir_fd(fd1,fd2)
— This implements redirection offd1
output to the input offd2
(the>
operator).void redir_file(name, fd,flags)
— Used for redirection of input and/or output whenfd
is itself a file descriptor.void spawn(command, background)
— Spawn and runcommand
, in either the background or foreground.void cd(command)
— The built-incd
command; change the current working directory tocommand->argv[1]
, or to the directory in environment variable$HOME
ifcommand->argv[1]
isnull
.void source(command)
— The built-insource
command; read and execute commands from the specified files in the list of arguments of thecommand
.
Your work in each procedure is delimited by the // BEGIN
and // END
pairs. After this task,
you should have a working shall
that can execute the examples above. You can
build shall
by running rv make
. Then, you can run shall
by executing rv qemu shall
.
Important Details
-
shall
is usually in one of two modes: it is either waiting for input, or it is waiting for processes to finish executing. It cannot wait for both at the same time. -
At any point in time, there is at most one foreground process. If there is a foreground process,
shall
waits for the foreground process to finish executing before going back to waiting for input. Only input can causeshall
to start processes. -
shall
itself should catch interrupts and print a message when it happens, such as “got signal 2” (2 is the signal number of interrupts). -
shall
should disable interrupts for commands that run in the background (using &). For command that run in the foreground, interrupts should generally cause the programs to finish executing.
Hints
-
When running a command in the background, have
shall
print its process identifier. For exampleprocess 36877 running in background
-
If a command finishes normally with exit status 0,
shall
simply prints a new “prompt” (->
in the case ofshall
). However, if a program returns a non-zero status or a program exits abnormally (say, due to an interrupt), haveshall
print a message. Examples include:process 36689 exited with signal 2 process 36889 exited with status 1
-
It may also be useful for
shall
to print information about background processes that terminate, even if they exit normally with exit status 0:process 36889 exited with status 0
-
For the
spawn()
function, you are expected to use thefork()
andwait()
system calls as well as macrosWIFSIGNALED()
andWEXITSTATUS()
. You should invokeinterrupts_disable()
for processes that run in the background. -
Use
wait()
rather thanwaitpid()
, as you will want to wait for any process terminating, including background processes. There’s no need to catchSIGCHLD
. Note that termination of background processes will only be discovered and reported while waiting for a foreground process to terminate. -
To execute a command, first invoke
redir(command)
(which redirects I/O), thenexecute(command)
(which invokes system callexecv()
to run the command). -
Most of the
redir()
function is already written, but you need to fill in the code in functionsredir_file()
andredir_fd()
. All of theexecute()
function is already written. If redirection fails, use_exit(1)
to exit the forked process with a failure status of 1. In case ofredir_file()
, don’t forget to close the original file descriptor returned byopen()
. -
For
cd()
, you are expected to use the C library functiongetenv()
to obtain the$HOME
environment variable in case no directory is specified in thecd
command. -
For
source()
, useopen()
andclose()
. Once you have a file descriptorfd
, you can use the following code to invoke the interpreter:reader_t reader = reader_create(fd); interpret(reader, 0); reader_free(reader);
-
Don’t forget to implement the remaining interrupt functions.
Task 2
The reader
struct defined in reader.c
uses the read()
system call to read one
character at a time. This is highly inefficient because each time read()
is
called, a system call is made, which then causes a context switch to occur.
Your job in this task is to modify the reader
struct and associated functions
reader_create
, reader_next
, and reader_free
to read up to 512 characters
from the file descriptor fd
at a time.
However, reader
must function the same way as before, meaning, the reader
’s interface must stay the same.
For example, reader_next()
should still return a single character.
Submission
You need to submit five files to Gradescope:
myspin.c
,myint.c
, andmysplit.c
(Task 0)exec.c
, which will contain your solution to Task 1reader.c
, with your improved reader (Task 2)
Rubric
Attribution
This assignment was adapted from a project by Prof. Robbert van Renesse.