David Gutierrez

Progress

We have implemented much of the functionality necessary to allow for transmission of RTP audio packets over a network. The current implementation is able to deliver 20 ms sound sample over the network via UDP encapsulated in the RTP header format. As an optimization, silence is detected and not actually transmitted over the wire, instead it is deduced by the remote end and inserted artificially. The code is also capable of reading and writing WAV files containing 8 bit mu-law samples. Work needs to be done in the area of error correction and control.

CommunicationsFacade

Design Goal

Our component is divided into several subsystems, however to facilitate the use our code we have created a wrapper object designed to trivialize the use of our object. All control over the data flow is accomplished by interacting with the CommunicationsFacade interface.

Usage

There are three methods that act upon CommunicationsFacade: initialize, activate and stop. Calling initialize on the object allows one set where data flows to and from, and using activate() and stop() determines when it flows. With the aid of the Pipe hierarchy, CommunicationsFacade can use files, audio, and the network in any combination of source and destination allowing for any direction of data transfer necessary. For example, CommunicationsFacade could get audio from the microphone and send it on the network, then on a remote machine, receive data from the network and write it to a file.

Implementation

In order to transfer data, CommunicationsFacade first encapsulates the local and remote Pipes into PacketFactorys, creating two factories, one for data coming from the local source (be it a file, the soundcard, or the network) and other for destination data (again either a file, the soundcard or the network). It then creates a DataMediator class with the local and remote PacketFactorys. The DataMediator is then told to mediate data and a new thread controlling this transfer is created. The stop() function simply terminates the actions of DataMediator.

Pipes (Network/File/Audio)

Design Goal

The source of data should be totally transparent to all the subsystems involved in our code. Reading from a file should be no different from reading from the sound card. To reach this design goal we developed a single interface defining how data is read and written and named this interface Pipe.

Usage

This abstract base class provides some of the basic functionality necessary to create a pipe which can read from the network, a file, or the soundcard. However the concrete classes NetworkPipe, FilePipe and AudioPipe are the ones useful to the programmer. Each has its own unique constructor, but implements the methods read/write/isAvailable identically to Pipe. So, whenever a method requests a Pipe we simply pass it an instance of one of the three available Pipes.

Implementation

Pipes are really just wrappers for streams and the derived classes are responsible for creating an input and output stream for their respective data sources and passing it to the constructor of Pipe. Each class needs to add some intelligence of its own in order to make this happen. For example, AudioPipe must interface with the local sound hardware in order to read and write audio samples. NetworkPipe has to be able to bind to sockets and transmit data in that way and the FilePipe has to deal with errors that may occur when transferring data to a file.

PacketFactory

Design Goal

Because pipes deal with raw byte streams we wanted it to be easier to create RTPPackets. PacketFactory does just that. Given a pipe, PacketFactory is capable of reading from the pipe and using the data to manufacture a new RTPPacket or transforming an RTPPacket into a byte stream for transfer through the pipe.

Usage

Simply call produce() to get a new RTPPacket from the Pipe and consume() to send an RTPPacket through the Pipe.

Implementation

The PacketFactory determines if the given pipe contains raw data or RTPPacket data and then passes it to the appropriate RTPPacket constructor, thus creating a new packet object. However, it performs some analysis on the packet data in an effort to remove silence and thus minimize network traffic.

DataMediator and Related Sub Systems

The DataMediator is responsible for moving data appropriately from the sub systems described above and the additional analysis subsystems described below.

The PacketAnalyzer handles resequencing, timing, silence reconstruction, tracking losses, multicast, and calculates the playout delay. When the first packet is received, the PacketAnalyzer creates a PlayoutMediator and runs it as a separate thread. The PlayoutMediator handles two operations. First, it reads from a queue of received packets and feeds them to the local Pipe every 20 ms, ensuring a steady stream of sound. Second, it inserts silence for the playout delay before talkspurt packets as determined by the PacketAnalyzer. Currently we do not yet have a dynamic playout delay being calculated.

Resequencing, timing, and silence reconstruction are all handled by a circular buffer. Multicast information is handled by a hashtable of hosts seen. Not all hosts will begin at precisely the same time, so when a packet from a new host is received, the current time being played is entered into a hashtable as that host's start time. Then whenever a packet comes in from that host, its timestamp is normalized against its initial start time. Then the normalized timestamp is checked against the current time to determine if it is too old. If it is a valid packet, it is added into the circular buffer according to its timestamp. This both resequences the packets and reconstructs silence, since silence is represented by a null packet. At this point, we are unsure of how to add audio signals together in order to have true multiparty conferencing.

Schedule

We are not on schedule and for this project, never will be. Our initial schedule was overly optimistic, given the extent to which we college students procrastinate. However, we believe that we can have basic functionality completed on time.

Problems

In the past couple of weeks we have been in close contact with the other teams that will require our code, allowing us at last to finalize our interfaces. In the beginning it was sometimes difficult to get a good response from the other teams. However, this was probably also true of ourselves, since we also did not have good answers for other teams early in the semester. The problem was that no one started early enough.

Currently we are still having problems reading from and writing to the sound card at the same time. Linux does not allow full duplex access to /dev/audio, and we are unsure of how to overcome this limitation. One method would be to open /dev/audio for read, read a packet, close it, then open it for write... etc. Unfortunately, this method produces audible clicks every time /dev/audio is opened or closed.

All of the problems relating to Linux audio are somewhat irrelevant at this point, however. Only recently we discovered that we are expected to be running on a Wintel platform rather than Linux. While we have code to read and write audio on a PC (using JNI), we may need to perform some kind of encoding on it before transmitting over the wire.

What we learned

Since the last report we have learned a great deal about audio. Most importantly, we have an understanding of audio encoding schemes, include mu-law and why they work. This understanding allowed us to write a simple silence detection function. To detect silence, we take the average of all the samples over a 20 ms slice and check if the average is beyond some threshold. After figuring out a good threshold by trial and error, we were able to cut the transmitted size of a file in half. The transmitted file was indistinguishable from the original when silence was reconstructed. In addition we have a good understanding of wav files and how they are built. This understanding was necessary to be able to output and input from files.

What we would do differently

As in our second report, the first thing we would have done differently was to start earlier. Had we started in on the project right away, we would have had plenty of time to finish everything. We also would have given everyone else in our group an edge in their own work. Since our code is the lowest layer of communications along with Signaling, our interfaces should have been the first to be completed.

Sources

http://www.cs.columbia.edu/~hgs/rtp/
http://www.inria.fr/rodeo/fphone/
http://www-nrg.ee.lbl.gov/vat/
http://www.javasoft.com
RFC 1889
Other team reports