CS 519 Course Project

Final Report

Group 2: Keshav's Kids

Salman Qureshi, Shen-Ban Meng

Task: The Sports Server (Team 10)

 

 

 

  1. What did you set out to do?

Use a figure and accompanying text to describe what you set out to do. You can reuse part of your first report if you wish. You should have sufficient detail so that someone who is not familiar with the project would still be able understand the scope of your work. For instance, imagine giving this to a friend who is not in the class--write something that this person would be able to understand without having to ask you for help.

Our initial goal was to implement an audio sports server. The server was to have the following capabilities:

  1. Phone-to-PC and PC-to-PC communication
  2. Streaming of audio data (both recorded and live) over its clients' links
  3. Real-time mixing of various data sources
  4. Dynamic 'chat-room' capability

In order to broadcast audio data, we intended to use RealPlayer, and the RealServer set of tools. The administrator of our sports server would be required to encode and streams or files into Real format before making them available online.

We intended to use telephone menuing as an equivalent to web links. Users could choose which stream they wanted to hear. They would of course be played a message indicating the choices they had.

Users could also choose a chat room that they wished to join. Chat rooms would have pre-recorded or have live audio playback in the background which is set by the administrator.

Real-time mixing would allow various input sources at various input rates to be multiplexed and sent to the clients that requested them. A client could receive background commentary and a certain chat channel, whereas another could playback a pre-recorded audio file narrating statistics. The client's speaker characteristics would need to be determined. A simpler way would be to enforce certain characteristics for the client's speaker. Then all clients may be treated equally.

The system would be completely dynamic allowing user's to add and leave any service they wished. The server could at any time refuse admission to certain 'banned' clients. This information would be set by the administrator.

 

 

2. What did you actually accomplish?

Write down exactly what you managed to get done by the time of the demo. Did you achieve all the goals you set out to do? Did you do something that you didn't expect to be doing when you started?

Data transfer gave us the following functionality:

1) The ability to read from different types of sources

2) The ability to send data to more than one destination.

Signaling allowed us to talk to computers as well as phones.

The above two allow to send audio data to more than one client. However, we had no way to have more than one source for the client. In other words, we had to implement multiplexing/mixing from multiple sources, on our own.

Mixing was much more complicated then we thought. We have, at present, implemented another type of source (in addition to file, network and microphone), called SourceMixer. This source accepts a vector list of sources. These sources are currently multiplexed, and copied to destinations.

We tried out about a dozen different approaches, and even our best approach is of fairly poor audio quality. However, the pitch of the audio sources is unchanged, and any sampling rate up to a minimum of 8000 samples per second, is allowed.

The following figures illustrate our approaches.

Multiplexing sources and sending them to a set of destinations is analogous to a 'static stream'. In other words, the administrator can set the files to be multiplexed, and they will be sent to all the clients currently connected.

Milestones

Mixing is implemented. There is a certain amount of noise introduced by the network. This noise is magnified by the fact that the streams are being compressed by the multiplexer. We managed to get two streams somewhat coherently playing on the local speaker, by the time of the demo. Any more streams, and/or more lossy streams would only give worse sound quality.

Shortfalls

Mixing is fairly crude. While the approach is the one that gives the best results, we have a number of approaches that deserve consideration (but we ran out of time).

There's a problem in the signaling interface that doesn't alert the server to client connections. We do not expect Signaling to take a look into this at this late stage, however. We attempted to implement a TCP/IP based null interface instead in the time left but couldn't finish it.

The phone-to-PC interface is not up at the time of writing was not done in time for us to incorporate it.

Finally, the machine on which we developed our code is half-duplex so we haven't got a chance to mix in microphone input yet. Finally, our GUI is 'rushed' and not as robust or 'refined' as we'd like. It is basically in the debug phase, not the release phase.

3. Problems:

It is possible that you did not actually accomplish what you set out to do :-( If so, why? What problems did you run into? Include both technical problems, and problems in coordinating your work with others. Be as specific as you can about technical problems. This is the only way we have to determine whether you really tried to solve the problem, instead of just giving up with token effort.

Technical Problems:-

  1. Mixing was a much bigger task than we had anticipated. It was also highly dependent on the nature of the files being tested. Files that contain the human voice only, tend to be better in quality than files with music. The human voice tends to 'echo' while music tends to 'ring' after passing through the mixer. There is also a large amount of static that we couldn't find the source of.
  2. Interfacing with signaling was a problem. Currently, a request signal is supposed to call a certain function that resides on the server. This function may be from a PC or a phone. However, the connection is not being established between the two test machines we have. The establishment of sockets and opening of connections is handled by signaling. Perhaps signaling has yet to interface with some other dependencies. In any case, we do not have network ability yet.
  3. File formats. Accounting for different destination characteristics (speaker characteristics) make the mixing algorithm more complex, and hence introduces more delay and lower quality. Hence, we always multiplex to an 8000bps output rate, now.
  4. Pitch problems. Introducing averages of lost packets in the packets that are actually sent raises the pitch of the output slightly. At present we reserve one byte in eight as representing the average of the discarded packets.

Coordination:-

  1. We initially wanted to have real tools for streaming audio. However, data transfer ran its code using its own streaming implementation. Therefore we discarded that option, after some consideration.
  2. The later code releases from DataTransfer and Signaling had little impact on our implementation, however.
  3. We have not used telephone menuing as the signaling interface because it was not clearly defined until just before we presented.
  4. One group member was not familiar with Java and it was difficult to split the work up evenly among us, in this repect. This resulted in us not meeting our goals at the time of the demo.

4. What you learnt:

What did you learn by working on the project? Include details on technical things you learnt, as well as what you learnt about dealing with large projects, collaborating with other busy people, and dealing with other coursework.

  1. We learnt a great deal about audio compression and impact on audio of manipulating or dropping bytes. Since we had to decrease quality so a certain extent, in trying to minimize this we learnt a great deal. We learnt that there is a certain 'granularity of representation' in a multiplexed stream that is optimal. We tried to achieve this size by trial and error. We also tried some averaging techniques.
  2. The best approach, however, would be to copy over the entire uncompressed arrays into a temporary buffer, interleave them, drop out the packets required to normalize the data, and perform a 'smoothing' algorithm on the entire resulting stream, before sending it on. This has two primary concerns

    a) Sometimes some noise is preferable to a slighty altered voice characteristic. However, the noise in our current implementation is too great, anyway.

    b) The smoothing process takes time, and this might be enough to introduce a 'jittering' effect.

  3. We had several false starts (RealPlayer) in this project, as well as some setup procedures and installations (SDK3.1 required administrator privileges) that held up our work for quite some time. Finally there were only two machines in the Meng lab that had the required SDK installed eventually, and there was some contention for these machines throughout the implementation of this project. We would like to make our project more modular to start with so that we can continue working if one module takes too much time.
  4. Null interfaces. We implemented null interfaces for our data transfer, which we then
  5. integrated with the data transfer layer's implementation. However, our null interfaces with signaling needed to be re-written when signaling provided a new format. Also, we had just begun signaling null interfaces, when the actual interfaces were ready.

    As of now, we cannot talk over the network because the version of signaling we use does not work between our two test machines, though they can talk through TCP/IP. We do not have time, as of now to implement the TCP/IP interface. Next time we would like to start earlier and finish off the null interfaces even though the actual interfaces are ready, because the actual interfaces may not work, or may be under modification.

  6. Dealing with other coursework. This project took up a tremendous amount of time. We had a very ambitious initial proposal. Also, the exact requirements weren't as clear as they were for teams other than applications. This contributed to our 'losing our way' sometimes during the design and implementation.

We have implemented a subset of our initial proposal. However, we have managed to provide an interface for a new kind of source, and we feel that that was very significant.

5. What would you do differently next time:

Everyone makes mistakes. But good learners don't repeat mistakes, they make all new ones! This project gave you a lot of freedom to make mistakes. What will you look out for the next time you undertake a large collaborative project?

  1. We have learnt a very important rule with regard to large projects..Start early! This leaves room for the inevitable false starts, and lack of facilites that occur later on. We spent quite some time on the RealNetworks SDK (mentioned in our earlier reports) before discarding it.
  2. We have also learnt the importance of thorough research before actual design. This is a little more difficult when you are waiting for interfaces from other code, and the performance of that code is uncertain. However, one's own approach must be clear. We needed the most efficient implementation possible. We could have done so if we had researched and read up on how 'smoothing' can be done efficiently. However, we relied more on trial and error, and some intuition.
  3. We learnt the importance of well-documented code. We have tried to document our code reasonably. However, the interfaces we are using needed to be very well documented. Otherwise we would have wasted time searching for the author/s of the code and receiving explanations. Often a test example or two is tremendously helpful.
  4. Regular interaction with the teams you will be sharing interfaces with is very important. It is also helpful to know what they have in mind to implement in the future.
  5. Sometimes a feature from the team providing interfaces that we had been anticipating (touch-tone menuing), would be implemented only to have a more basic feature stop working basic networking. We must be prepared for this with working null interfaces.

6. Interfaces

We are providing a SourceMixer class, as an implementation of the abstract class Source defined by DataTransfer.

The SourceMixer class, at present takes in a Vector of Strings, each representing a source file.

Hence:

/**********************************

Code sample: Here we add two sources,

which are multiplexed onto the local speaker

***********************************/

Vector sources= new Vector();

sources.addElement("satch3.wav"); //a sample instrumental

sources.addElement("wwweb083.wav"); //recorded radio commentary

SourceMixer src= new SourceMixer(sources);

Destination d1= new DestinationSpeaker(waveform, this);

connection.setSource(src);

connection.addDestination(d1);

connection.Activate( );

/**********************************

End of code sample

***********************************/

Because we needed to read a certain number of bytes each time, we needed to read bytes from stream ourselves, hence overriding the DataTransfer's own implementation. However, we use DataTransfer's read for reading from the microphone because

  1. The data input is asynchronous
  2. Making sure the microphone is released every time, and available the next time, is better handled in the DataTransfer layer's own implementation.

We haven't handled network sources because the network connection through PCs isn't working for us..yet. As for the microphone, it does work, however, we have been unable to test mixing because the sound cards here are half-duplex.

We provide no other interfaces. The grouping of clients and the connection between the client and the server modes is an application-specific implementation.

References:-

www.real.com RealNetworks web site has resources on real-time audio manipulation

www.developer.com/reference/library/

Plenty of Java source code. It was helpful for understanding multithreading, and how it fit in to our application.

www.cs.cornell.edu/cs519

Report format and other guidelines and information

www.javasoft.com/jdc The Java Developer’s Connection has several resources that we used, especially to test the current capabilities of Java’s Sound and AudioClip APIs. We spent time on this approach but didn't use it in the final implementation.