Your task in this project is to extend your networking package with reliable delivery.
The previous assignment developed an unreliable networking layer, with ports as endpoints. Data sent from one host to another could be lost in the network. The applications would not know if their data made it across, or whether it got lost. Needless to say, this is not a good programming model for most applications. In practice, best-effort delivery is often an insufficient guarantee for application writers; only a few applications, such as streaming video, are able to adapt to missed packets at the receiver side, without requiring retransmissions. Adding the notion of connection also helps user level programming since the communication between the applications can be viewed as a stream.
While it would be possible to require every application designer to write a reliable network protocol on top of our UDP-like protocol, there are powerful reasons to provide a standard reliable protocol. First, it is unquestionably more convenient: writing an efficient protocol is nontrivial. Second, there is an important incentive in standardizing the one reliable transport protocol, since these protocols must incorporate mechanisms for flow control and congestion avoidance. The more protocols which are in use, the harder it is to predict how they will contribute to the overall behavior of the network. One of the reasons why TCP has been so successful is that its behavior under adverse conditions is well understood.
For this project, you need to implement minisockets, a connection-based reliable communication channel abstraction. A correct implementation is sufficient - you don't need to worry too much about performance. For instance, it's fine to build a system that has only one packet in transit at any given time on any given socket (that is, a window size of one). Obviously, this makes the roundtrip time of the network the dominating factor in performance, but it simplifies the implementation greatly.
You should remove the previous project's restriction on the maximum acceptable message size. With minisockets, it should be possible to send a message of any size, instead of one limited by the network's maximum datagram size.
For this project you have to deal with connection based communication. What this means is that you need the notion of connection, i.e. binding between ports on two machines. Before actual communication starts a connection has to be made (both parties have to agree that packets send from particular ports belong to the connection), communication is made on the connection (the user program has to specify on what connection the communication is made, i.e. the minisocket_t, not the source and destination ports as in the previous assignment) and the connection is closed.
To open and close a connection and to acknowledge sent packets you need special packets, that we will call control packets, that usually contain no user level data. Also every packet has to contain some extra information (like sequence number). This extra information that cannot be accommodated in the miniports header should be stored in an extra header, that is used only in packets belonging to connection based communication. Since the unreliable, connectionless communication provided by miniport has to coexist with the reliable, connection based communication provided by minisockets, you have to add a "type" field to the miniports headers that specifies the type of communication. Your implementation should use the information in this field when a packet is received to determine if it should be handled by your previous code (miniports) or by the new code (minisockets).
To ensure reliable delivery, you will have to cope with packet losses and duplicates. This means that the receiver has to acknowledge each packet which arrives, and maintain information about the packets it has already seen, to detect duplicates. The sender has to set schedule a timeout when it sends a packet, so that it can detect when an acknowledgement doesn't arrive "fast enough", and resend it (since the ack may have been lost, this is an additional source of duplicates at the receiver).
Sending a message should be a blocking operation, completing when the message is acknowledged, or when the minimsg layer "gives up" and returns a partial result or an error. Obviously, introducing reliable sends adds the problem that a sender may block indefinitely, retransmitting a message to a receiver which does not exist or is not responding. To avoid indefinite blocking, you should use the following scheme for timeouts and retransmissions:
For reliable delivery in the face of lost packets, you will have to retransmit data packets if you don't get an acknowledgement from the remote side within a given timeout period. Note that if the ACK is lost, the receiver may get multiple copies of the data packet. Consequently, the receiver will have to keep track of which date packets it has seen, and suppress duplicates. Also note that the sender might get duplicate ACKs, especially if some acks get delayed in the network.
Removing the maximum message size on send operations means that you
now have to fragment large messages into packets, each of which is no
bigger than the MAX_NETWORK_PKT_SIZE
constant. This
introduces a problem: how do you reassemble messages at the receiver's
side? There are two possible approaches: (i) a "message-oriented"
scheme, or (ii) a "stream-oriented" scheme. In the former, you
would preserve message boundaries at the receiver, so that it is
possible to reconstruct the lengths of the messages sent by the sender.
With the latter, you would ignore message boundaries at the
receiver. For this assignment, you will implement a stream
abstraction. A stream abstraction simplifies your implementation in
many ways, and parallels TCP's behavior. For instance, if you chop a
large send into multiple small packets, you don't need to wait for
all of the pieces to arrive on the receiving side before you wake up
the receiving thread.
You should follow the following guidelines in implementing the stream-oriented protocol:
It's crucial that systems code be correct and robust. You must test
your code with reasonable and unreasonable test cases, and ensure that
it behaves correctly. At a minimum, you should test sending large
numbers of messages and check if they're all received. Does your
systemwork correctly with really large send operations ?
See if the
receiver gets all the data the sender sends if it performs
minimsg_receive()
operations with a buffer which is too
small. The test cases for project 3 are not sufficient to test your
work: part of the challenge of this project is to develop your own tests,
making sure that you've written code which implements the
specification correctly.
You should also test your code by randomly dropping packets and seeing if your networking stack continues to deliver messages. If you randomly drop packets with 10% probability (i.e. pick a random number between 0 and 9, drop if you got 1), you would not expect to see any delivery failures unless you send around 10^8 packets (why?). So any delivery failures when you have low packet loss indicate a problem.
A common error is to wait for a timeout before waking up a sending thread, even though an ack may have arrived. Make sure you build your network layer so that the sending thread is awakened immediately if the ack arrives, and that it is awakened eventually by the timeout mechanism if the ack does not arrive.
Another common error is to mismanage the timeout periods. Make sure that the initial timeout period is 100 ms. for the first try for any given packet, and that it doubles with every subsequent retry.
The project is due at midnight on April 8th.
You should follow the generic submission guidelines to submit this project. Make sure you use the correct project number (4).
Add a message window of fixed size (WS). Measure the bandwidth your code achieves with WS set to 1 and WS set to 4.
Dynamically adjust the window size. Increase it by one if you are getting all the packets acked, and divide it by half when a packet is not acked.