FINAL REPORT
December 18, 1998
Group: Phonetics (Group #4)
Team Members: Joy Alamgir (ID # 379220)
Pemith Randima Fernando (ID # 379814)
Task: Team 7 (Voice-Email Gateway)
[1] - What We Set Out To Do
We set out to write a voice-mail system to match and extend the specifications delineated in the course web page project description. Our goal was to come up with an application and framework that would satisfy these specifications, but yet be simple and streamlined so that ordinary users would be able to implement it easily. In addition, we tried to make our goals realistic so that we would be able to achieve them. More specifically, we aimed to implement the following features:
- a touch-tone operated on-line voice menu
- user authentication
- allow users to access their e-mail through the telephone network
- allow users to access voice messages left for them
- allow users to record personalized voice greetings
- ability to record telephone messages for users to be sent as e-mail attachments
A flowchart that depicts our system can be found in Appendix A. Below is a step-by-step explanation of that flowchart:
- Wait for a call from signaling:
we run in a continuous loop while waiting for signaling to tell us that there is an incoming call
- Play introduction:
play an introductory message that gives the user two choices: (a) leave a message for another user, or (b) enter an access code to reach the main voice menu.
- If (a) is chosen, the user is prompted to record a message (this process can be repeated multiple times for different users), and then we jump to Step 5.
- If (b) is chosen, the given access code is verified. If the code is correct, we proceed to Step 3, otherwise we jump to Step 5.
- Play main voice menu:
play out the various choices for the user, and then get a selection. The choices in the voice menu are:
- Check e-mail
- Check voice messages
- Leave a voice message
- Record a personalized greeting
- Quit
- Get selection and take appropriate action:
- If Check e-mail is chosen, we play out new e-mails to the user
- If Check voice messages is chosen, we play out new messages to the user
- If Leave a voice message is chosen, we allow the user to record a voice message for another party (this is just like in Step 2a)
- If Record a greeting is chosen, we allow the user to record a personalized greeting which is essentially the user’s "answering machine message" (that people hear when they try to leave a message for him/her)
- If Quit is chosen, we go to Step 5.
- Quit:
save the database, close the connection, notify underlying layers, and jump back to Step 1.
[2] – What We Accomplished
Through the course of the semester, we accomplished a great deal. We worked on a variety of different areas:
- Refining interfaces with other teams.
Throughout the semester, we coordinated closely with all the other teams to develop and refine interfaces. The nature of our application was such that we had to coordinate with Datapath, Signaling, Gateway, Directory Services, Billing, and Management to make things work.
- Implementing working interfaces with Datapath, Signaling, Gateway, Billing, and Directory Services.
In particular, we worked closely with the Datapath, Signaling, Gateway, Billing and Directory Services teams. At this time, all the interfaces have been successfully integrated and tested. (More information about interfaces is described in the Interfaces section and slides.)
- Added simulation for other teams.
Because we had to work closely with so many other teams, we added a number of Boolean DEBUG variables (called DEBUG_DATAPAH, DEBUG_GATEWAY, etc…). These variables told the main code whether to simulate the other teams or not. (For example, when simulating, we would just print out "Playing WAV file" instead of actually calling the Datapath function to do it.) In this way, we were able to test our interaction with particular teams even if others were not present.
- Being the central testing application.
The DEBUG variables also allowed us to change dependencies very quickly, without having to comment/uncomment large sections of code. Because of our simulating capability (coding carefully to take the DEBUG variables into account) and because we had carefully defined interfaces to all the other teams, all teams in our group were able to test their functionality through us without having to coordinate with everyone. This made it a lot easier for everyone and allowed us to work even if not everyone was present.
- Making the main voice menu system.
We implemented the actual menus and functionality of the voice menu system. Right now, it includes an initial access code, reads the user’s mail to him (body text is conveniently extracted from e-mails so that annoying header and attachment information is not read), allows the user to record messages for other users, and record a personalized greeting for other users when they leave messages.
To give feedback to the user, and because of the large number of possible scenarios, we had to sit and record many audio files (altogether we recorded 28 different audio files (including the digits 0 – 9). We took care to record them carefully with a microphone, and we even got outside people to contribute because we thought it would be pleasant to hear voices other than our own in our menu. J
Although the audio file recording might sound like a simple task, there are a lot of subtleties involved in recording high-quality audio. Even though the microphones were really excellent, we had to be very careful when recording words that started with "p" because of the air that was sent out. It recorded very poorly. To fix this problem, the speaker made sure to pronounce "p" without letting much air out… "Please press pound" is a hard thing to say…
- Implementing a database to score voice-mail related information.
To keep track of all the information for voice-mail users, we created a special database Java class. This class has full functionality with respect to keeping track of e-mails, messages, access codes, phone numbers, and user names for all the users of our system. In addition, users can be easily added or removed to the database on the fly – there is no need to recompile code. (Please see the Mistakes We Made section for some things we learned by implementing this database.) The database implements the following functions (which have been copied directly from the code – the function names should be pretty self-explanatory):
Overall Database-Related
public void ClearDatabase()
public void PrintDatabase()
public int LoadDatabase(String path)
public int SaveDatabase(String path)
public void FillVMUser(int index, VMailUser vmu)
public int getMaxEmails()
public int getMaxMessages()
public int getMaxRecords()
User Info Lookups
public int VerifyCode(int code)
public int VerifyNumber(int number)
public String getUsernameForCode(int code)
public String getUsernameForNumber(int number)
public int getIndexForUsername(String username)
E-mail Related
public void ClearEmails(String username)
public int EmailExists(String username, String path)
public String EmailPath(String username, int email_index)
public void AddEmail(String username, String path)
public void DeleteEmail(String username, String path)
public boolean isParsed(int number, String fname)
public void setParsed(int number, String fname, boolean val)
public boolean EmailIsRead(int number, String fname)
public void EmailSetRead(int number, String fname, boolean val)
public boolean EmailIsProcessed(int number, String fname)
public void EmailSetProcessed(int number, String fname, boolean val)
Message Related
public String AddMessage(int number)
public String AddMessage(String username)
public String IsMessage(int number, int msg_num)
public int DeleteMessage(int number, int msg_num)
public boolean MessageIsRead(int number, int index)
public void MessageSetRead(int number, int index, boolean val)
public void addGreeting(int number, String fname)
public void deleteGreeting(int number, String fname)
- Established a semaphore system to ensure mutual exclusion for the database.
To make sure that the database was accessed with mutual exclusion, we implemented a simple system of file semaphores that allowed instances of the voice mail server to know if the database file was in use (being save/loaded/updated). Because of this, we were prepared to try running multiple instances of the voice mail system at a time for multiple users. However, we didn’t have time to do this because there were more important things to concentrate on.
- Worked on test application to check the interaction between Datapath and Signaling.
Although this wasn’t part of our job, since we were the first application group to come up with working interfaces with signaling, we along with Management and Datapath team members developed a simple client-receiver application to make telephone calls. This application was based on our application design and interface. This application was used to test communication between Datapath, the Application and Signaling (which essentially was a successful test of our interface and design).
- Worked on extracting text from e-mails.
We wrote routines to look at e-mails and extract the body text from the e-mails. These routines work even for MIME attachments and UUencoded attachments as well. In addition, irrelevant header information is stripped off so that it will not be read out because it would be time consuming and annoying.
- Worked on a text to speech synthesis.
We went through the various available utilities, software development kits, and executable programs to find out which would be most convenient for us to use. Our final choice was the Microsoft Text-to-Speech Development Kit (MSAPI). (Please see the Mistakes We Made section for a discussion of possible alternatives.) We needed to develop a program to transform text to sound-files (to be played out to voice-mail clients as requested). The big problem here was to make the program change the e-mail text into a WAV file silently… none of the code given with MSAPI seemed to be able to do this. However, we found a sample program called destfile that allowed the user to enter some text and save it as a WAV file. We hacked this code up mercilessly to convert this horribly verbose program into a silent wonder-worker call speak (pronounced "speeek"). The only problem was that, for some strange reason, the program would only write the WAV file if a message box of some sort popped up. Mysteriously, if there was no message box, a file of size zero bytes was always created. No one we asked could understand why that was… So, we were forced to have one message box open. (Of course, no one would really know about this in the first place since any windows produced by the program would be invisible to the user on the phone.) But, since we really wanted to keep the program quiet (to prevent wastage of system resources and [ahem] system instability), we made it quiet by running another program, called kill. By the careful use of semaphore files and network communication, the programs were able to communicate with each other and kill would terminate spiik only when the desired WAV file had been output. This way we were able to get the WAV file output we desired while keeping the conversion program as silent as we possibly could. Altogether, one heck of a hack – wickedly warped but worked wonderfully! J
(Please see the Things We Learned section for more about our thoughts on Windows programming.) [Added later: it turned out we were able to remove the kill program using thread termination (a highly unrecommended approach… but it worked!).
- Learned how to deal with format conversions.
Our Datapath and Gateway teams decided to use CCIT m
-law format (8-bit, 8 kHz) to play and record WAV files. Unfortunately for us, our speak program’s output was 16-bit and 22 kHz! Because of this, we needed to find a converter program that was quiet and could be envoked using a system call. We ended up finding what seems to be the most popular format conversion program on the Web – SoX. SoX stands for Sound Exchange, and it was recommended to us by a number of different people (famous sound-conversion-type people from around the world, including Australia, New Zealand, Spain, and the United States). SoX was able to do the desired conversion for us, but we had to modify Datapath’s format to use a raw format instead of one with a header. In the course of the project, we ended up e-mailing Chris Bagwell, the current author of SoX, to ask him if he could help us with some of the conversion problems we were having (this happenned because our group decided to encode in CCIT m
-law instead of PCM as we had originally planned). We have to say that Chris was amazing in his response time – he replied to our e-mail within minutes, and the next day he had responded to our follow-ups.
- Developed method to store and send voice messages.
To store and send recorded voice messages, we came up with an elaborate scheme… First, the message was recorded from the phone into a WAV file using functions from Datapath. This WAV file was then stored so that it could be played back when the recipient checked his or her voice mail. In addition, the WAV file was sent as an e-mail attachment to the user so that he or she would be able to head any voice messages while sitting at his or her computer. To make the actual attachment, we wrote a small C program that runs constantly on babbage that runs "uuencode" to attach the WAV file. This program communicates with our program by using socket programming (the valuable tool we learned from Puzzle 1).
- Established a mail server.
We found a shareware mail server program that we use to provide mail services for our voice-mail users. Voicemail users forward their email to their special voicemail-email account we provide them. We keep the mail server running on CS519PNT. Initially we didn’t realize that this was part of our project, and we were not quite clear on how we would handle the e-mail aspect of the project. But, after talking with Christian and thinking carefully about what was required, we realized that we needed to implement a mail server. We searched the web to find a free mail server that allowed us to create and maintain accounts, as well as to store e-mails conveniently.
- Used unix mail programs.
To make life easier for us and to make implementation easier, we wrote this small C++ program that ran on a port on babbage. This program is sent a user name and a file name when a message is received by our voicemail application. Upon receiving the username and filename, the C++ program opens a ftp connection, gets the file, uuencodes it using the unix uuencode utilities and then mails it out to the user using mailx.
[3] – Problems We Encountered
Essentially the problems we encountered were the ones that we addressed in the Work Completed section (i.e., finding a mail server, getting interfaces fully refined, making a database, etc…) The following are the biggest problems we found that we would have liked to have found more elegant solutions to:
- The text-to-speech conversion:
as described in the previous section, we had to come up with come pretty convoluted hacks to make our converter program quieter.
- Multiple sessions at a time:
we initially planned to have multiple sessions at a time, as an extra feature of the voice-mail system. Many of the necessary underlying structure is already in place, such as the use of semaphores and support for threading and multiple instances. One of the main problems was that we would have had to refine some of the interfaces with other teams further. In the meantime, however, they were working on principal functionalities with other teams. Because there were more important issues to deal with, we decided not to fully implement this idea in favor of improving other areas.
- Windows NT vs. Windows NT Terminal Server:
Throughout the project, we had been using Windows NT Terminal Server Edition so that we could work remotely on the CS519PNT. On the night of the last Tuesday (December 15, 1998), our management found that Visual J++ 6.0 was no longer working properly on CS519PNT. They tried to re-install the development environment repeatedly but had no success. Because of this, they had to re-install Windows NT 4.0 – for some reason, some of our Java code no longer worked because threads were no longer functioning properly. In addition, we had been relying on the "kill" system call to end our spiik program (as described in the What We Accomplished section) – but Windows NT 4.0 does not have such a command-line system call. Thus we had to find a new workaround… This is still in progress at the time of writing.
[4] - What We Learned
- Planning Interfaces.
We learned that taking an active role in interface development was very important, especially since we were an application group. Because of this we were able to tailor our application to the capabilities offered to us by the other groups.
Believe it or not, we learned a great deal about Windows Programming from the speak program and the conversion that we did to make it work. Essentially, we had to remove all the interface components of the program. This involved going through the code, understanding where messages were passed, received, and what events were associated with them. In addition, we learned how to make dialog boxes and to get information from radio buttons and text boxes! All this from one program… J
One of the most important things we learned, as lame as it sounds, is that in a large group, it is important to take an active role in everything, not only in planning interfaces. Because we did so, we always knew what was going on, what was being planned, and what we could expect from the other groups. In particular, we worked very closely with the Datapath, Gateway, Signaling, and Management teams throughout the development process. This helped us to understand what was going on when things didn’t work, for example, because we actually had some knowledge of what was going on underneath the other teams’ interfaces. In addition, it was a lot of fun getting to know and working together with the other members of our team. J
We divided up the work into to main parts: one person (Joy) worked mainly with interfaces and team-to-team coordination, and the other person (Randy) worked with the database and application development. At the same time, we made sure to understand what the other person as doing, so that each of us could effectively communicate with other group members on our own. In many cases, the division was not rigid because we spent a lot of time working together on code, interfaces, and coordinating with other teams. The division was only there to help us to get work done in a more organized fashion.
Throughout the course of the project, we made a ton of backups. At the end of each day, we made a folder in our backup location and gave it a long description with the date and the latest modifications. We also kept these backups in different places to make sure that we absolutely minimized any chance to lose our valuable data.
By doing this project, we learned a great deal about Java. We were very impressed with Java’s flexibility and ease of use. In particular, Visual J++ 6.0 is really quite amazing… with all its Microsoft IntelliSenseÔ
technology. For example, we were able to find out parameters for functions without even opening up other windows (because J++ automatically provides "hints"). We also learned about Java’s excellent and easy-to-use networking facilities, which were very helpful.
[5] - Mistakes We Made
Implementing the database ourselves. If we were more knowledgeable or had taken a course in databases, we would have used a third-party database such as Microsoft Access instead of writing our own. It turned out that the class structure we picked as not quite optimal – because of this we ended up having to write more code than we needed to and the database structure was a little more complex and confusing internally than it needed to be. This made adding new data structures and functions a little annoying. However, this did not pose any problem to the outside because, at the top level, the database was fully functional and easy to use.
- Choosing MSAPI for text-to-speech.
Although this was not really a "mistake," we believe we were perhaps a little hasty with our choice of MSAPI. It turned out to be quite difficult to use and to understand what was going on in the code. This was for three main reasons: the code was poorly commented, , the header files were poorly commented, and the documentation was not very clear. Perhaps the main reason for all this was our inadequate knowledge of Windows programming. We had a difficult time following all the structures and declarations that were used to make MSAPI work, and so we ended up having to "hack" more than we wanted to. If we really understood it, we should have been able to avoid all the hacks we had to go through. In retrospect, some of the other choices, such as Eloquence, might have been better. Of course, we did learn a great deal about Windows programming and hacking because of MSAPI…
[6] -
Interfaces
From Gateway:
Our interface with gateway was primarily directed towards tone detection. However, eventually to facilitate record and playback options we used the getDigits function to signal to gateway when to send us packets, when to receive packets from us apart from the usual tone detection objective.
//Constructor
public gatewayRequest()
//Tone Detection
//We also use this function to let gateway know we want to play out a file (i.e. request 8 digits)
//inform it of a hangup (i..e. request 9 digits), inform it of of our intention to record to a file
//(i.e. request 0 digits)
public static String getDigits (int linenumber, int numberdigits)
To Signaling:
We were the first application to have a working interface with signaling. The signaling interfaces for the other applications were based on our design. To that end, we had to spend a significant amount coordinating with the signaling team to make the interface simple, robust and easy to build upon.
The Signaling interface basically provides call initiation functions, call hangup functions and a status function.
//Constructor
public VMail_Accept( ){
}
//if there's a call for voicemail...call this function
//it will return 1 if voicemail accepts the call
//0 otherwise
public int Accept_call( );
// starts a call
//the flag should be 1 if it's a telephone to pc call
//otherwise it should be 0
//if flag is 1 then
//port should be the port number or line number to connect
//to gateway with, and IP_addr should be Gatway's IP_addr
//if flag is 0 then port should be the port of the remote
//computer trying to connect to us....ip_addr should be the
//IP addr of the remote computer
public int start_call(int port, int flag, String IP_addr);
// lets us know when a call has ended
public int Vmail_Hangup(int line);
// lets us know when a call has ended
public int Vmail_Hangup(String IP, int port);
// this function returns 1 if a connection is alive
//0 if the connection is not alive
//The IP_addr is the IP_adde of the computer we are connected to
//If we are connected to a telephone, then the IP_Addr is gateway’s IP
//and the line number is the line number we are connected to
public int isAlive(String IP, int line)
From Signaling:
Again, we were the first app to coordinate with signaling what functions they should give us. Since we never dialed a user, we never needed a dial function from them. We coordinated with them with the specifics of a Voicemail Hangup function which we use to tell signaling our intention to hangup.
//We call this function to let signaling know we want to terminate a connection
public boolean vMailHangup(String destIP, String port, String lineNo)
From Datapath:
We used Datapath’s dll to playout a wave file, save to a wave file, and play out raw audio data. The following are the functions they provided to us.
//We call this function to let datapath know who we want to connect to
public static native boolean AddSenderTo(byte[] destAddress, int iPortNum);
//This is the wave file we want to play out. This can also be a raw audio file but the extension must be .raw
public static native boolean SetLocalInputFile(byte[] fileName);
//This is the file we save to when we record a message
public static native boolean SetLocalOutputFile(byte[] fileName);
//After calling the above functions and initializing as necessary we call
//this function to establish a connection and initiate packet transfer
public static boolean CreateConnection();
From Management:
Management basically needed functions to get our current users, get their status, add users to our database, delete users from our database. Due to time constraints we were not able to provide the the delete users function but all the other functionality was provided. They provided us with a thread that periodically queried us about our status.
//Initialize Management Thread with an instance of us
public PMServerVoiceMail(Vmail_Server whomadeus);
//Run the management Thread
public void run ()
To Management:
//Management calls this function from their thread. We pass an instance
//of us to them and they can access our public functions
public void getVMailStat(VMailStat v);
//Add a user to our database
public void AddUserToDatabase(int code, int number, String username) ;
From Directory Services
We call directory service to register our location with the directory so that signaling knows which calls are Voicemail calls and which are not.
If we crash, we need to deregister myself and reregister myself again
// adds new user to directory
// Email: user's email
// Extension: user's extension -- used to uniquely identify the user with a number for phone-to-PC calls
// Password: password, may be empty
// Week: 7-bit array. Each bit represents one day of the week (0 - Monday, ...). 1's define one
// group of days and 0's another. Example: 0000011 -- Monday- Friday are group 0; Saturday, Sunday are group 1
// Group0Time?: 3 numbers between 1 and 24 used to split each day in group 0 into 4 time periods.
// Ex: Group0Time0 is 9, Group0Time1 is 17, Group0Time2 is 20 means that period 0 is midnight-9am,
// period 1 is 9am-5pm, period 3 is 5pm-8pm, period 4 is 8pm-midnight.
// If the user does not want to specify all 4 time periods, just set unwanted Group0Time? arguments to 24
// Ex: Group0Time? are all 24 means that the whole day is just one period
// Group1Time?: same thing as Group0Time?, but for days in group 1
// returns true if succeeds, false if fails
boolean AddUser(String Email, int Extension, String Password, BitSet Week, byte Group0Time0, byte Group0Time1, byte Group0Time2, byte Group1Time0, byte Group1Time1, byte Group1Time2);
boolean RemoveUser(String Email, String Password);
Billing:
We do not directly communicate with Billing. Signaling takes care of that on our behalf.
[6.5] – System-Sock J
This is a C++ program that ran perpetually on cs519pnt. Its basic task was to listen on a socket for a signal from our main VMail_Server application. Upon receiving a signal it called our MSAPI program that converted text-to-speech and then converted the wave files thus created into raw audio data (using Sox).
[7] – Advice for the Course Staff
- Make the puzzles easier/have less of them.
It would have been nice to have been able to start earlier than we did on the final project. Even a month was really not enough time to get it going as well as we wanted, combined with all our work from other classes. We tried very hard to get an early start on the final project, and, as is usually the case, we ended up as the "almost finished" state for a very long time. The pass/fail nature of the puzzles is quite demanding and we know that anyone who got a zero after spending many many hours on the puzzles probably felt quite miserable. At the same time, we must admit that we learned a great deal from the puzzles… our knowledge of socket programming helped us greatly in doing the final project.
- Try to design the puzles so that babbage will not die.
Perhaps this recommendation complements the previous one… No matter what anyone says, it was impossible for more than a few people to run the full routing simulation at a time on babbage. Even though it was possible to try small configurations, it would truly be foolish to hand in a pass/fail assignment without testing the actual configuation, which involved 12 routers. Even though babbage is nicely loaded with memory, four processors are not enough to support all the 519 students, or even 10 of them if they were all running their full routing protocol at once.
- Good materials and staff help.
We felt that the course staff were helpful in getting us to know what was going on. The system of having three progress reports was a good idea because it helped us to understand what was going on and to see what we needed to do to get rolling properly.
[8] -
References
http://java.sun.com:81/products/java-media/speech/forDevelopers/jsapi-guide/Synthesis.html#7621
http://java.sun.com:81/products/jtapi/implementations.html
http://www.real.com/devzone/library/creating/rmsdk/index.html
http://www.cs.cornell.edu/cs519/
Schulzrinne and Rosenberg, Internet Telephony: Architecture and Protocols an IETF Perspective
E-mail with Chris Bagwell (the author of Sox) and a number of other format-conversion people
APPENDIX A
Flowchart of VoiceMail System
Every stage after STAGE 1 listens for hang-up signal from signaling.
.