Lecture 20: ARP, DHCP, IP

Addressing within a network

Ethernet packets are addressed to a particular device on the network. Devices are identified by Media Access Control (MAC) addresses (also called a hardware address). MAC addresses are built in to every network device. They are typically written as a sequence of six two-character hex numbers, separated by colons, for example the MAC address for the ethernet card in my laptop is f0:1f:af:2a:7d:be.

Often it is useful to have logical addresses that are not bound to a specific device, but instead to a specific function. In addition to its fixed hardware address, a host can also have have an IP address assigned to it (more on the structure and function of IP addresses in the next lecture). IP addresses are typically written as a collection of four decimal numbers between 0 and 255, separated by periods (for example, my laptop's current IP address is 10.148.6.10).

To send a packet to a device having a specific IP address, a host must first find the MAC address of that device. To find this information, it uses the Address Resolution Protocol (ARP). It broadcasts an ARP request asking for the MAC address of the host with the desired IP address. The host having that IP address will respond with a packet containing its MAC address.

IP addresses can be assigned in a variety of ways. One is to configure each machine with its own IP address. This has the advantage of being completely decentralized; no central service needs to manage IP addresses. However, often you want a central service to manage IP addresses (to avoid collisions, for example, or to prevent having a network administrator from manually configuring each machine).

The Dynamic Host Configuration Protocol solves this problem: a specific host is designated as the DHCP server. When a new machine connects to the network, it broadcasts a DHCP request containing its MAC address. The DHCP server will respond to this request by sending back the host's IP address (and other configuration information).

Routing

The network layer (also called the internet or IP layer) is responsible for delivering packets between hosts on different local area networks.

Analogy: the mail room in the department is like a local area network, the postal service routes packets (letters) from the CS department mailroom to mailboxes all over the world. The postal service is analogous to the network layer.

The Internet Protocol (IP) is the network-layer protocol that runs the internet. There are two versions of IP in use: version 4 and version 6. We will describe version 4.

Each host on the internet has a 32-bit IP address (typically written as four decimal numbers separated by periods; IPv6 uses 64-bit addresses). An IP packet contains a destination IP address; the goal of the IP layer is to deliver packets to their destinations, by routing them through many networks. A router is a machine that reads packets from one network interface and forwards them on another.

One way to accomplish this is to use source routing: before sending a packet the sender examines the network and selects a path, encoding the path into the packet. Source routing is impractical for a number of reasons:

Instead, IP uses path routing. With path routing, the packet contains only the destination address; routers decide which "next hop" to forward each packet on to get it closer to its destination. Each router makes this decision locally, based only on the destination address and its local configuration.

routing tables

One way to store this configuration is using routing tables. A routing table contains several entries, each containing a destination network and a next hop. The destination network is specified by an address / netmask pair. An IP address x is in the network y/m if x & m = y (here & is bitwise and). For example, the address 192.168.3.4 is in the network 192.0.0.0/255.0.0.0, and is also in the network 192.168.0.0/255.255.0.0, but is not in the network 192.0.0.0/255.255.255.0.

To determine the next hop for a given packet, the router will compare it to each of the entries in the routing table (by anding it with the netmask and comparing it to the network address). It will forward the packet to the first next-hop that matches.

For example, suppose a router is connected to four networks, n1, n2, n3, and n4, and that it has the following routing table:

dest. addr netmask next-hop
1. 2. 3. 0 255.255.255. 0 n1
1. 2. 0. 0 255.255. 0. 0 n2
1. 3. 0. 0 255.255. 0. 0 n3
1. 4. 6. 2 255.255.255.255 n4
0. 0. 0. 0 0. 0. 0. 0 n1

While routing a packet destined for 1.2.3.4, it will compare it to the first row, and find that it matches (because 1.2.3.4 & 255.255.255.0 = 1.2.3.0), so the packet will be routed to n1. If the packet is destined for 1.2.5.6, the first row will not match, but the second will, so it will be forwarded to n2.

Similarly, a packet destined for 1.4.6.5 will be routed to n1, while a packet destined for 1.4.6.2 will be routed to n4.

Routing tables are a very primitive method for configuring networks. They work well for a small network, but are error prone, and don't handle routing packets across multiple paths (for example to split a stream of traffic across two different paths). Modern routers have much more sophisticated methods for deciding how to route traffic.

Where do routing tables come from?

Good routing tables require that packets are forwarded "closer" to their destinations. Routers can discover this information by communicating with their neighbors.

One such algorithm proceeds as follows. Each router maintains a local table containing the distance and next hop to each destination network. Periodically, each router r shares its entire table with its neighbors n. Each neighbor n compares this table to its own, to see if there is a shorter path to each destination d that passes through r. If so, n updates its entry, recording that the next hop to get to d is r (and that the distance from n to d is one plus the distance from r to d).

By iterating this process, each router will converge on a routing table that will give the shortest path to each destination network. This algorithm is referred to as a distance vector protocol, because each router maintains a vector of distances to endpoints.

If the network topology changes while the routes are being calculated, it is possible to create a routing loop, where each router in the loop thinks the next is the right place to forward a packet to; packets stuck in such a loop will never be delivered. This can be avoided in various ways; one approach is to store a path vector instead of a distance vector. Each router maintains the path to each other endpoint; when updating their path vectors, routers can detect loops and avoid entering them into their routing tables.

The Border Gateway Protocol (BGP) is an application-layer path vector protocol that is used to configure routing information on the internet. BGP does not operate at the level of individual routers; instead the nodes in the BGP graph represent entire internet service providers, which are referred to as autonomous systems (AS). Each ISP may use a different protocol for determining routes within their own network (some use "internal BGP", but others use more sophisticated schemes); BGP is used to establish paths that span multiple ISPs.

Fragmentation

Network layer protocols (like IP) also have to deal with the fact that different networks have different transmission properties. In particular, different physical layers can have different maximum transmission units (MTUs): the maximium size of a single packet. This may mean that the IP layer needs to split large packets into smaller packets so that they can be sent along the next hop. Splitting packets is referred to as fragmentation.

Each IP header contains an identifier to indicate which original packet it is a fragment of, as well as its offset within the packet. When the end host receives a fragmented packet, it waits until it receives all fragments, and then reassembles them and delivers them to the next layer.

If it does not receive all fragments of a packet, the packet is discarded.