Lecture 4: more process state, threads, show and tell, scheduling

More process state (in PCBs)
Threads
Show and tell: linux processes, PCBs, system calls
- ps, /proc filesystem, strace, fork, exec
Scheduling
- fcfs, rr, preemption, quanta
Board image
annotated console log
Spring slides

PCB contents

Previously, we talked about the PCB holding the registers (including CPU flags and base/limit registers). The OS may want to associate other state with each process as well:

process identifier
initial command line
permissions
scheduling information (more soon)
process state: one of new, ready, running, waiting, or zombie
usage information (e.g. how long the process has run for)
state of resources that have been allocated to the process (file descriptors, network sockets, etc.)
signal handlers and other configuration
parent process and children processes, or process groups
anything else the OS designer desires

Doing useful things while you wait

We introduced interrupts because the processor may want to do other useful work while waiting for I/O.

Processes may also want to do other useful work while waiting for I/O (instead of yielding to a different process). There are several features that operating systems provide to enable this.

Signals are like interrupts, except that instead of the hardware notifying the processor (and causing a jump), the operating system notifies the program (by changing its instruction pointer, effectively causing a jump). Just like the OS can program the interrupt vector, processes can update a table to tell the OS what signal handlers to invoke in various situations.
Nonblocking I/O system calls ask the system to initiate I/O, but instead of blocking until the I/O completes, they return immediately. The application can then do other tasks; in the future it can either poll to see if the I/O is complete, or decide to block until some data arrives.
Threads are processes that share an address space. Each runs independently (it has its own local variables and instruction pointer), but threads can communicate by reading and writing global state. A process that wants to wait for multiple inputs can have a separate thread waiting for each one, and other threads that continue processing while they wait.

Show and tell

We used some Linux tools to explore processes and system calls. You are enouraged to play with these on your own, and to look into the Linux system calls (or system calls for your favorite operating system).

ps -A: list all running processes. -A instructs ps to output all running processes, instead of just subprocesses started from the running terminal
the /proc filesystem is a way to examine data from the PCBs of the currently running processes. Note that there are no on-disk files corresponding to the /proc filesystem; rather the kernel responds to read requests to files in this directory by returning data read from the PCBs
the strace command runs a program as a subprocess and prints out all of the system calls that the subprocess executes. We straced the "true" program (a trivial program that does nothing and returns true) and "bash" (a shell). To strace bash interactively we ran "strace bash 2> log.txt" in one terminal window, and "tail -f log.txt" in another. The first command redirects the output from strace into the file log.txt, the second displays log.txt as it changes.

We noted some interesting facts about the linux system call interface:

to create a new linux process, there are actually two system calls required:
- fork: creates a new process by duplicating the calling process. The new process, referred to as the child, is an exact duplicate of the calling process, referred to as the parent, except for the following points:
- The child has its own unique process ID
- The child's parent process ID is the same as the parent's process ID.
- fork returns 0 in the child's process and the PID of the child in the parent's process
- exec: replaces the state of the current process with a new process created by reading a program from a file
we noted the exit system call which is used to terminate the currently running process, and the kill system call which sends a signal to another running process. Processes can register signal handlers (which are similar to interrupt handlers in the kernel); signals cause the process to jump to the signal handler.
(important for the homework) Unix exit codes are integers that the parent process can use to determine how a process exited. By convention, 0 indicates success, and a non-zero value indicates an error. In the shell, you can see the exit code of the most recent process by using the variable "$?". For example:

true; echo $? 0 false; echo $? 1
Linux uses read and write system calls to special files to perform many tasks, including communicating with devices and performing input and output on the terminal

Scheduling

Thus far when discussing time-sharing between processes, we've simply said that when it is time to switch processes, the operating system selects a new process and then runs it. The details of "when it is time" and which process to select can have major impacts on system behavior.

We would like a scheduler that satisfies the following criteria:

simplicity: the scheduler should be easy to reason about, fast, and must not use too many resources
fairness: no process can be starved forever
responsiveness: user-facing processes should respond quickly to input
low waiting time: the total waiting time for all processes should be short
flexible priority: perhaps it is useful for users to be able to indicate that some processes are "more important" than others
predictability: bounded variation in any of the above metrics

We discussed the following algorithms: - First-come, first-served (FCFS): whenever a process becomes ready, it is placed at the tail of a queue. Whenever a process relinquishes the CPU, a new process is taken from the head of the queue and scheduled. - Pros: simple, fair (no process starves). - Cons: I/O bound and CPU/bound processes are treated the same, so waiting time, responsiveness, priority and predictability can be poor.

Round-robin (RR): FCFS with preemption. Before starting a process, the OS sets a timer to a fixed quantum. When the timer expires, the currently running process is placed at the tail of the queue and a new process is selected.
- Pros: simple, fair, more responsive and predictable than FCFS
- Cons: can have bad waiting time if there is a mix of long and short processes, not responsive if there are many processes