Virtual Memory

We have previously said that part of the operating system’s job is give each process the illusion that it is running alone on the hardware. This concept is called virtualization: the OS runs on the physical hardware and provides an abstraction of virtual hardware for each process to run on. The OS virtualizes a single CPU by scheduling multiple concurrent processes to interleave their execution and orchestrating context switches between them.

This lecture is about how to virtualize the memory: i.e., how the OS creates the illusion, for every process, that the process has exclusive access to its own memory. The goal of a virtual memory system is that every process should have its own memory address space. In other words, we want the address 0xCAFED00D in process A to refer to different data from 0xCAFED00D in process B. (Maybe you can think about how bad life would be without virtual memory. Every process would need to carefully avoid using any addresses in use by any other process. And any process could freely access the memory of any other process. Shockingly, this is how many popular OSes worked until as late as the ’90s, and it was as terrible as it sounds.)

Virtual vs. Physical Memory Addresses

Here’s the overall strategy. We will make a distinction between the virtual address space for each process and the physical address space for the actual machine:

Each process will operate in its own address space, meaning that it thinks in terms of its own \(2^{64}\) memory locations. We will call these addresses virtual addresses.
The actual main memory has some number of bytes available—probably much fewer than \(2^{64}\). We will call the addresses of these “real” storage locations physical addresses.

The OS and hardware will collaborate to construct a mapping between virtual and addresses and physical addresses. That is, for every process, we will create a table that describes, for every virtual address, the physical address where that data can be found. The hardware has a special structure, called the memory management unit (MMU), that can translate from virtual to physical addresses. Whenever a process tries to load or store an address V (e.g., it uses an lw or sb instruction with memory address V), the hardware will automatically perform a virtual-to-physical memory address translation to find the corresponding physical address P. It will then load or store the “real” memory location P.

This scheme means that programs never see physical addresses. They only know about virtual addresses, and all their instructions load and store those addresses. The hardware transparently translates all of these loads and stores into physical addresses to find the actual data. This way, processes can remain blissfully unaware of where their data is actually stored in the hardware and just think in terms of their own, private address space.

The data structure that describes the virtual-to-physical address translation is called the page table (for reasons we will see in a moment). The OS is responsible for setting up the page table and putting it into (physical) memory so the hardware knows where it is. When user-space code is running, the hardware then uses the page table to perform address translation. This is how the OS and hardware collaborate to implement virtual memory.

Pages and Page Tables

Let’s take a closer look at how page tables and address translation work.

An extremely inefficient way to set up a page table would be to explicitly record, for every virtual address in use, the corresponding physical address. This would mean that every single byte in a process’s virtual address space has its own, special mapping onto a specific byte in physical memory. This strawperson scheme is too fine grained: for one thing, it would require 8 bytes of address-mapping metadata for every byte of data!

Instead, VM systems divide all of memory up into chunks called pages. To give you a rough idea of the granularity, an extremely popular page size is 4 kB (4,096 bytes). You can imagine all of a process’s virtual address space, and all of the physical address space, divided up into these equally-sized chunks. Page tables work by mapping entire virtual pages (4 kB ranges of virtual addresses) onto physical pages (4 kB ranges of physical addresses).

As with cache blocks, this mapping works by dividing up the memory address. 4,096 is (2^{12}), so we will divide all memory addresses into the most-significant 52 bits and the least-significant 12 bits. The least-significant 12 bits are the offset within the page. The remaining (most-significant) 52 bits are the page number.

Some terminology: we will use virtual page number (VPN) and physical page number (PPN) when we’re talking about those non-offset bits in the address, depending on whether we’re referring to virtual or physical memory.

The page table then maps VPNs to PPNs. To translate a virtual address to a physical address, do these steps: split it into the page number (VPN) and the offset, translate the page number (from VPN to PPN), and then add the offset back on. Now you have a physical address.

The Memory Management Unit

The memory management unit (MMU) is the hardware structure that is responsible for translating virtual addresses to physical addresses. It uses a page table to perform this translation. But each process has its own page table—so how does the MMU know where to find the right page table at any given time?

The OS stores each process’s page table in main memory. (The kernel has the special privilege of using physical addresses directly, so it does not need to worry about address translation for its own accesses!) Then, when it performs a context switch, the OS needs to tell the hardware which page table is currently active for the process it is about to switch to. There is a special register that stores the (physical) address of the currently-active page table. The OS sets this register during each context switch to point to the relevant page table. Then, the MMU uses this register whenever it needs to perform address translations.

In RISC-V, this register is called satp. You can read more about it in the privileged instruction manual.

Fancier Page Tables

You now know the basic mechanism for virtual memory: how the OS creates the illusion that every process is running in isolation. The rest of this lecture is about various extensions that build on the basic VM mechanism to do other cool stuff that systems need to do.

To support all of these extensions, VM systems enrich the page table with more metadata. Remember that the main thing that a page table needs to do is to map VPNs to PPNs: i.e., a basic version is nothing more than an array of PPNs indexed by VPN. In real systems, the page table also includes other stuff, like this:

A valid bit, indicating whether the virtual page is mapped at all. (Kind of like the valid bit in a cache.) It is an error to access an address within an unmapped page.
Protection bits. The OS can decide whether each page can be read, written, and/or executed. Think of this as 3 extra bits, named R, W, and X. It is an error, for example, to try to store to an address within a page whose W bit is 0. The X bit is especially important for security: the OS can prevent processes from executing instructions within writable memory (sometimes called the W^X restriction) to make it harder to exploit bugs that would otherwise trick the program into running malicious instructions.

You may also be worried that this sounds like a lot of data. If there are really \(2^{52}\) virtual pages, do we really need \(2^{52}\) entries in our page table? In practice, systems will compress this data structure using a multi-level page table, which lets the system omit chunks of entries for large ranges of invalid addresses. The details of these compressed data structures is out of scope for CS 3410.

Swap & Page Faults

There is one cool thing that virtual memory system enables that goes beyond isolating processes. VM can also let you transparently “overflow” your memory. If you run a bunch of programs that, all together, use more memory than you actually have available in your machine, the OS can transparently move some of their data to the disk. This mechanism is called swap, i.e., it works by swapping chunks of processes’ memory out to disk. (This mechanism is also called paging, because it involves moving pages around. Pages that are in memory are paged in or swapped in and pages that are relegated to the disk are paged out or swapped out.)

Processes do not need to be aware that their data has been swapped out. They can continue to pretend that they have unlimited access to all their memory. The OS takes care of moving data between main memory and the disk. Remember, accesses to the disk are much, much slower than main memory—so the OS tries to intelligently place frequently-accessed data in memory. The goal ends up very much like CPU caches: it exploits temporal and spatial locality to maximize the number of accesses that go to main memory, not to disk.

The strategy for implementing swapping is to mark paged-out memory as invalid in the page table. Remember that, when the CPU tries to access any virtual address, it must first consult the page table to perform protection checks (and to do the address translation). When the program accesses an invalid virtual page, a page fault occurs. The CPU uses an interrupt to transfer control to the kernel to handle the page fault.

There are many reasons that a page fault could occur. It could be that the address is just unallocated: the process never malloc’d that address. (If you have ever gotten a segmentation fault error when running your C program (who hasn’t?), that’s what this means.) The OS looks at its internal data structures to decide what happened: i.e., to check whether the invalid virtual page is actually stored somewhere on disk. If so, it pages in that data and then lets the process continue.

To page in new data, the OS reads the page from disk and places it into physical memory. This can mean evicting a different virtual page of data; the OS needs a replacement policy, just like in an associative cache, to decide which page to evict.

Because disks are so much slower than memory, swapping a page in takes a long time—think tens of milliseconds, roughly, or tens of thousands of clock cycles. So frequent swapping can seriously harm a program’s performance. And it’s enough time that the OS scheduler will likely try to find other work to do while the disk request is outstanding.

At a high level, swap lets disk join the memory hierarchy at a level below main memory. DRAM is sort of a cache for the hard disk; the CPU cache acts the same way for the DRAM; registers are kinda like a cache for the cache. It’s caches all the way down.

Here is another cool thing that virtual memory enables. The page table translates virtual addresses to physical addresses; there is nothing intrinsic that requires this mapping to be injective. That is, if virtual address A in process X maps to physical address P, then it’s totally possible for virtual address B in process Y to map to exactly the same physical address P!

This observation implies a scheme where different processes can share the same data, without actually duplicating the data in main memory. Say that N different processes happen to need the same B bytes of data. With virtual memory, we can do this by spending only B total bytes of physical memory! Without VM, each process would need its own copy, for a total of N×B bytes.

There are a few situations where this kind of sharing is extremely useful in practice:

Libraries. Multiple processes often need the same library code; they can share a read-only memory region to save space that would otherwise be duplicated for the library’s code.
Inter-process communication. Process A can communicate with process B by writing into a memory region that the two processes share.

CS 3410