Sunday, September 27, 1998

Cornell University, Monday, May 10, 1999

Data Transfer in Internet Telephony

Naveen K Sastry (nks1@cornell.edu)

Pratap Vikram Singh (pvs5@cornell.edu)

Table of contents

Table of contents *

Introduction *

Capture of audio data *

Playback of audio data *

Converting the audio data into packets to be transferred over the network *

Sending and receiving audio data packets on the Internet using RTP *

Provide network statistics to the user of the data transfer layer *

Forward error correction, buffering and adaptive playout *

Defining interfaces for the data transfer layer *

Audio transfer statistics *

Introduction

Internet telephony is revolutionizing telecommunications through the convergence of voice, video, fax, and data. This exciting new technology promises to drastically reduce long-distance costs and provide unprecedented opportunities for resellers, developers, service providers, and end users alike. The telephone network supports a reliable data stream once a connection is established. This is not true for the Internet as it only provides a best effort service, as quality of service is not supported in the Internet.

The ITX data transfer layer tries to address some of the issues involved in reliable data transfer over the Internet. Some of the issues are:

Audio data capture from various audio sources.

Playback of audio data to various destinations.

Converting the audio data into packets to be transferred over the network.

Sending and receiving audio data packets on the Internet using RTP.

Provide network statistics to the user of the data transfer layer.

Forward error correction, buffering and adaptive playout.

Defining appropriate interfaces to the data transfer layer so that applications can easily use it for transferring data over the Internet.

Making a generic data layer abstraction so that any future devices and services can be plugged in easily.

Figure 1: Components of the data transfer module.

Capture of audio data

Audio data capture is done by any kind of Source. The simple sources can be audio devices, like the microphone, or can be a file or a network source. The simple sources that are supported by the data transfer later are:

MicrophoneSource – Gets data from the audio port.

FileSource – Reads stored data from a file.

NetworkSource – Gets audio data packets from the network.

The Source is actually an interface that takes in a channel reference and keeps writing audio data to it. All sources that way are push sources. Any device that implements the Source interface can act as a valid source for the data transfer layer. Thus a multitude of sources can be made such as a Mixer source that takes in a number of simple sources and mixes the audio streams provided by each. This would be helpful in a multi-party conference.

The other sources that are useful for the ITX system is the GatewaySource, which provides data from the telephone line and the HandheldDeviceSource, that records data from a hand-held device and gives it to the audio channel. The actual implementations for the audio devices are done is the Jaudio layer. This layer abstracts the underlying audio device driver calls in Java and makes it easier to write concrete Java classes.

Playback of audio data

Audio data playback is done by any kind of Destination. The simple destinations can be audio devices, like the speaker, or can be a file or a network destination. The simple destinations that are supported by the data transfer later are:

SpeakerDestination – Plays data to the audio port.

FileDestination – Writes audio data to a file.

NetworkDestination – Writes audio data packets to the network.

The Destination is actually an interface that has a write() method. This method is called by the Channel when data needs to be written to a destination. The destination implements this method in the specific way that it wants. All destinations that way are push destinations. Any device that implements the Destination interface can act as a valid destination for the data transfer layer.

The other destinations that are useful for the ITX system is the GatewayDestination, which writes data to the telephone line and the HandheldDeviceDestination, that writes data to a hand-held device. The actual implementations for the audio devices are done is the Jaudio layer. This layer abstracts the underlying audio device driver calls in Java and makes it easier to write concrete Java classes.

Support for multiple destinations:

As a Channel is used to transfer data from a source to a destination, it was very easy to implement the "single source and multiple destination" model. This enables the user of the data transfer layer to make a channel that has one source and many destinations. This is useful in making multi-conference applications and implementing multicast on networks that do not allow it. Destinations can be added and removed on the fly when a Channel is active.

Converting the audio data into packets to be transferred over the network

Audio data is captured and played as a stream. This means that it looks like a continuous flow of data from the source or to the destination. When the same data needs to be transferred over the network, it needs to be broken down into smaller pieces (packetization) so that the network can easily transmit. This functionality was also to be provided by the data transfer layer if a NetworkDestination is used. This is completely transparent to the user of the data layer. The opposite (depacketization) is also done when data is received by the NetworkSource.

Figure 2: A wave stream broken up into smaller packets.

Sending and receiving audio data packets on the Internet using RTP

When discreet packets are sent over the network, the network does not differentiate between them. That means that the packets can get lost and reordered during transmission. To overcome this the data transfer layer uses a protocol to sequence the data and find out if they are getting delayed. This protocol is called the Real Time Protocol (RTP). This protocol is implemented by putting a unique source identifier in each packet that is sent from a Source. It also puts in the time that the packet was generated with the sequence number of the packet. The time helps the data transfer layer to determine the delay that the packet encountered during transmission and the sequence number helps in detecting reordering and packet losses.

Provide network statistics to the user of the data transfer layer

It is very useful for the user of the data transfer layer to see how the data transfer is performing. This means that the data transfer layer has to keep track of various statistics of the audio data transfer. Some of the useful statistics would be the speed at which data is being transferred, the average size of the data packets, the number of lost packets and the mean delay between packets. All this can be used by the application to tune itself and also display the data visually to the end user.

Forward error correction, buffering and adaptive playout

As the data being transferred over the network can be lost or delayed or reordered, the data transfer layer needs to make corrections to it if possible. Audio data has some characteristics that need to be kept in mind. In case of lost packets, the data transfer layer can extrapolate the data and recreate the stream. In case of delayed packets, the layer can use buffering to store data and wait till some time before releasing it to the applications.

Forward error correction is implemented by the data layer. In this, previous audio data samples are packetized with the current audio data sample and sent over the network. Thus, if some packets are lost in network transmission, they can be reconstructed from the following packets.

The network also introduces variable latency according to its state. Due to this the continuous stream generated by a source is disrupted and is jittered on the final destination. The simplest way to solve this is to have a buffer on the destination that smoothens out the data packet stream. Keeping a fixed size buffer is not a good idea as the network delays might be really small at times and buffering introduces delay in playout. The size of the buffer should be a function of the delay characteristics of the network. This is known as adaptive buffering.

The data transfer layer implements a BufferQueue class that implements the buffering at the NetworkSource. This class does the following:

Removes the RTP header from the incoming data packets.

Strips out the extra audio samples in the packet and discards the ones that are not needed.

Inserts each audio sample in a buffer according to its sequence number.

Inserts silence packets for the samples that have not come in as yet.

Plays the audio samples according to the time when they should be played at.

Defining interfaces for the data transfer layer

The data transfer layer was meant to be used as a library by the applications so the key idea was to provide convenient interfaces to them. We made interfaces for the source and destination that could also be used by other applications to integrate with the data transfer. The interfaces to use the data transfer layer are very simple and as few as possible so that the applications have no problems in using them. Utility methods are provided to create default AudioConnection objects that sends data from the microphone to the network and receives data from the network and plays it out on the speaker. If the applications need to find out about the data transfer statistics, an interface is provided to them for that.

Audio transfer statistics

For every device transfer statistics are collected. Each device defines its appropriate statistics, e.g. Network devices have the size of the packets, the number of packet received, number of lost packets, etc.

Future work

The initial implementation of the data transfer layer has most of the basic features that were needed by applications but more functionality can be added easily. The data transfer layer can be modified in a lot of ways. Some of the issues that remain are:

Support for multicast.

Pure JavaSound solution.

Good device property negotiation.

Performance improvements.

Silence detection.

Adaptive FEC and audio sampling rate.

Reduction in the network bandwidth.