Group number and name: Group 5, MIA

Team members: Kevin Wood and David Patariu

Task number and name: Team 7, Voice-email Gateway

What did you set out to do? (1-2 pages) Use a figure and accompanying text to describe what you set out to do. You can reuse part of your first report if you wish. You should have sufficient detail so that someone who is not familiar with the project would still be able understand the scope of your work. For instance, imagine giving this to a friend who is not in the class--write something that this person would be able to understand without having to ask you for help.

The Voice-email gateway is an application that allows a user to call and retrieve email messages, while also allowing a user to call and send email messages to other people. The application accepts input from the user in the form of tones produced by a dialing pad, and gives feedback when you use the system. Feedback is in the form of spoken sentences that for example would prompt a user to log in or ask what email message they want to have read to them. If a user so chooses, they can then check their email and listen to content of the email messages. Security is maintained via the use of passwords (pass numbers) that the user punches into the dial pad upon starting the call. These passwords and id numbers are stored in a directory of user’s ids and passwords, as well as information about where to send mail to them as well as from which server they prefer to download mail.

To do this a text to speech engine was found. This is a software engine that takes at it’s "fuel" words or strings and produces audio output to a destination, which could be a file, telephone, or speaker. This text to speech engine is then used to encode voice messages into files that are sent as attachments in emails. So a user can call the application, select to leave a voice email message, speak into the phone to record the message, and then have the message sent to a particular person in an email as an attachment.

Another aspect of the application is detecting input from a dial pad on a phone or a visual representation of a dial pad on a computer. This is difficult because there are only twelve digits to use, and so the application must present clear choices to the user. For example, the application states "Press one to send an email message, Press two to check your email…". Once the user selects a "path", the application must respond or the user might think that the application is "stuck". So this application would have to have a state machine like quality based on user input, where choosing different "paths" would cause the application to be in different states and thus present different choices to a user.

The application would then need to be able to send email and check an email account, listing messages in the account and allowing a user to choose a particular message to listen to.

2. What did you actually accomplish? (2-3 pages) Write down exactly what you managed to get done by the time of the demo. Did you achieve all the goals you set out to do? Did you do something that you didn't expect to be doing when you started?

We have a working null implementation that emulated DTMF events through a GUI that is a dialing pad. The dialer collects button events much like the DTMF collector we implemented. We feel this framework would be adequate if the connections worked. DTMF events are similar to GUI events, so much so that only one line of code per event would need to be changed to convert button to dtmf event:

Button:

Public boolean action (Event evt, Object o) {

If (evt.target == zero){

DigitCollection = new String(digitCollection + "0");

}

-change to –

public void handleConnectionEvent(ConnectionEvent evt, Connection conn)

{

if(evt instanceof DTMFEvent)

{

char digit = ((DTMFEvent)evt).getDTMF();

System.out.println("detected DTMF: "+digit);

if (digit == '0'){

dialerTextField.setText(dialerTextField.getText()+"0");

digitCollection = new String(digitCollection + "0");

}

In implementing the dialer gui, we had created two implementations, a Swing version and a standard AWT version in JAVA. The swing version was dumped due to compiling problems under JDK 1.2. See problems section for further details.

The dialer is used to collect digits for use in state machine.

Email functionality was included. This includes the ability to send and receive email. We can also connect with the signal server, and use mime attachments. Text to speech using Microsoft’s SDK also works, saving wav files works, making wav files from the text to speech engine as well as from a microphone also works.

3. Problems: (1-2 pages) It is possible that you did not actually accomplish what you set out to do:-( If so, why? What problems did you run into? Include both technical problems, and problems in coordinating your work with others. Be as specific as you can about technical problems. This is the only way we have to determine whether you really tried to solve the problem, instead of just giving up with token effort.

We did not accomplish getting voice-email to work with the connection objects provided. A testing application call MIA phone was provided. Unfortunately, it utilized the Jaudio API that had just been implemented over the weekend. The testing application was "buggy" in that it needed to run on machines with full duplex capability, which restricted its use to 5 machines in the entire CSUGLAB. It was additionally difficult to find a good build of MIAphone, due to constant updates and fixes. Two instances of MIAphone could not run on the same machine because of resource "grabbing", in which one app would starve. Thus, testing DTMF over our MIA’s framework was not possible. Earlier in the semester we had proposed an alternative for of testing that would have used a modem to emulate a null connection object, but we were strongly discouraged from doing so. In retrospect, knowing that the logic of our tone detection is very sound, we regret not ignoring the advice given and proceeding with our original testing paradigm.

Actual recording of messages was not integrated into the final application. We are able to record via a microphone connected to a desktop machine, but since we could test through the gateway or via using MIAphone, it was left out of the final project. One must wonder how some of the test applications were developed. In the CSUGLAB, only one machine was available per group, so if the testing applications had been developed in the CSUGLAB where the group had access, the functionality would have reflected this. Instead, it is most likely that these testing applications were developed in the Systems Lab, where multiple computers and resources would be available.

Interaction with Directory services was not implemented. Authentication mechanisms were in place in the hope that directory would have provided password verification, but this was not to be. The morning of the demo we were verbally notified by management the only service directory could provide was lookup of user id, which was not what we had asked for nor was even remotely use full. If they had lookup of user ID, password and lookup of pop3 server, we would have attempted to integrate our applications. So, any integration based on the functionality they provided would have been an empty exercise, providing no further utility or security to our application.

Interaction with the real audio server was extensively researched but not realized. This was an unfortunate casualty of integration problems we encountered as well as the fact that there was a scarcity of examples using the technology. As a group we feel it is ironic that we overcame the technical hurdles of converting C and C++ into a usable form for our Java applications, dealing with WAV files, MIME attachments and other problems, only to lack the critical integration support from management. I empathize with having a lot of course obligations, but integration was a large part of the responsibility they assumed.

The largest problem of the project was that we were using Java, but that most of what we needed to use was in C, and there was little adequate documentation on the C code we did need.

For the mail, the API’s turned up after we had a working implementation using SMTP, POP3, and MIME. We resorted to the cleaner API, which unfortunately we were told was a trivial part of our application to get working.

In implementing the dialer GUI, we had created two implementations, a Swing version and a standard AWT version in JAVA. The swing version was dumped due to compiling problems under JDK 1.2. They were as follows:

Exception occurred during event dispatching:

java.lang.UnsatisfiedLinkError: Speak

at GUI.action(GUI.java:172)

at java.awt.Component.handleEvent(Compiled Code)

at java.awt.Window.postEvent(Compiled Code)

at java.awt.Component.postEvent(Compiled Code)

at java.awt.Component.dispatchEventImpl(Compiled Code)

at java.awt.Component.dispatchEvent(Compiled Code)

at java.awt.EventQueue.dispatchEvent(Compiled Code)

at java.awt.EventDispatchThread.run(EventDispatchThread.java:68)

Exception occurred during event dispatching:

java.lang.UnsatisfiedLinkError: Speak

at GUI$13.actionPerformed(GUI.java:258)

at javax.swing.AbstractButton.fireActionPerformed(Compiled Code)

at javax.swing.AbstractButton$ForwardActionEvents.actionPerformed(AbstractButton.java:1101)

at javax.swing.DefaultButtonModel.fireActionPerformed(Compiled Code)

at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:250)

at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:217)

at java.awt.Component.processMouseEvent(Component.java:3126)

at java.awt.Component.processEvent(Compiled Code)

at java.awt.Container.processEvent(Compiled Code)

at java.awt.Component.dispatchEventImpl(Compiled Code)

at java.awt.Container.dispatchEventImpl(Compiled Code)

at java.awt.Component.dispatchEvent(Compiled Code)

at java.awt.LightweightDispatcher.retargetMouseEvent(Compiled Code)

at java.awt.LightweightDispatcher.processMouseEvent(Compiled Code)

at java.awt.LightweightDispatcher.dispatchEvent(Compiled Code)

at java.awt.Container.dispatchEventImpl(Compiled Code)

at java.awt.Window.dispatchEventImpl(Compiled Code)

at java.awt.Component.dispatchEvent(Compiled Code)

at java.awt.EventQueue.dispatchEvent(Compiled Code)

at java.awt.EventDispatchThread.run(EventDispatchThread.java:68)

After reading several news group articles, it was derived that the possible cause of the above error was an old version of the SDK of JDK. This was not the case, since we made a point of keeping the software we used in this project current. Restoring the AWT version of the dialer pad in DEVStudio "fixed" the problem. This above may be a classic case of the type of complaints waged between Sun and Microsoft regarding Java. Different features will work depending on which compilers are utilized, when in theory there is little reason why code should not work under both implementations.

4. What you learnt: (1-2 pages) What did you learn by working on the project? Include details on technical things you learnt, as well as what you learnt about dealing with large projects, collaborating with other busy people, and dealing with other coursework.

Software APIs
-J/Direct

J/Direct enables one to use the native capabilities of the operating system a process is running on. The major feature that differentiates J/Direct from COM or RNI is that it removes a layer of abstraction by allowing "direct" calls to DLLS. The absence of a translation layer between the call and the library means that J/Direct has performance benefits relative to other options available. The classic example of J/Direct in action is to show a message box. Problems arise when one tries to map data types between Java and the C or C++ native implementation. Java also does not support pointers to functions (callbacks), which is also a problem in porting native code. Structures are also problematic, not to mention pointers. The above difficulties caused us not to use J/Direct in our implementation.

-JNI

The Java Native Interface API was explored as a possible means of utilizing Microsoft’s Speech SDK in our Java framework. One hundred percent pure Java was not possible with this type of project, mainly because the speech engines available do not provide a Java interface. Although an implementation of the Java Speech API specification by IBM using the Via Voice product exists, it was not available to students in the course. JNI would seem to provide the solution, integrating native C and C++ with Java. Issues arose with

Correctly representing types in C++, as well as determination of how much of the speech

SDK should be integrated. Although we did not utilize this technology, it was explored in depth in order to assess feasibility. Several excellent resources are available that contain thorough descriptions of how to use JNI. They include Essential JNI by Rob Gordon, Tricks of the Java Programming Gurus by Glenn Vanderburg, and the native 1.1 tutorial, and Overview of JNI, which can be found at the following URL:

(http://www.java.sun.com/docs/books/tutorial/native1.1/concepts/index.html) Overview of JNI was the most correct in how to prepare and build a Java program with native methods using Sun’s JDK.

-COM

Integration of Java and COM objects in the Microsoft’s Java development environment was utilized in completing this project. COM in its language independence, and Visual J++ with its built in integration tools, made this a straightforward technology to utilize. The Java type library wizard in Visual Studio makes adding COM objects to a simple. In understanding this technology several small C++ programs were written and converted into COM objects. The procedure involves creating a DLL "project", adding methods and/or parameters, and after compiling using the JavaTLB utility to create the Java-callable wrappers.

-ALT Components

In the search for ways to convert C++ classes in to usable objects in Java, we also studied ALT components. ALT, as stated by "Dr. GUI", was originally designed to implement small COM objects. ALT boasts several features that made it interesting solution to our integration problems. ALT enables one to utilize all of the features in C++, has an abstract interface, and automation via Microsoft’s Visual Studio. Another feature of ALT is speed, since it does not rely on run time libraries. Functionality was more critical in this project that speed, so this did not factor into or decision to explore ALT, but it is a quality worth noting. Creating an ALT component in Visual C++ selecting an ALT COM AppWizard when creating a project, selecting ". DLL" under server type, adding COM components, methods and properties.

ActiveX/DirectX

-DirectSound

The package com.ms.directX of the Microsoft SDK for Java proved to be useful for testing and playing WAV files. The method getFormat could be used to parse out the byte data in a loaded WAV file. The class DirectSoundBuffer contained most of the method of use for playing and parsing WAV data. Interestingly enough, the DirectSound classes did not contain methods to record WAV files from a source, a glaring omission. You can load WAV files from buffers or files though via methods in the DirectSoundResource class.

Swing

Swing, toted as the next-generation GUI toolkit was studied for use in our null implementation that uses a dial pad and email interface. Swing sports many advantages over the core AWT libraries in Java. Swing evolved as a solution to the limited functionality provided by the AWT. Some notable features include pluggable look and feel, lightweight components that are not dependent on their native peers to render themselves, and additional components such as slider and password classes. An additional bonus of Swing is that the base types such as panel are similar in name convention. A swing panel is a JPanel, in the AWT a panel is a Panel. Sticking a J in front of common components makes the adjustment between using the AWT and swing simple. Swing was not used in the final null implementation because of problems compiling and linking under JDK 1.2 with our speech classes. See the problems section for further details.

Speech APIs
-SAPI

In discerning which speech engine the group would use in implementing the voice-email gateway, we were impressed with the installer for SDK, the multiple working examples provided, and the test utilities for the SDK. By running the test utilities, we were able to determine if the SDK was implementable, and more importantly that it would run on our development platforms. Eloquence required changing registry keys and we were not provided an installer. Eloquence also lacked relevant support on its web site, whereas Microsoft had extensive examples and documentation. This is not to say that the documentation was good for MS SAPI. An example of how flawed the SAPI documentation can be was revealed in our search to find out exactly what the variable called result referred to. It appeared in several functions, and yet no explanation of the value it contained (past the fact that is was of type int) was in the SAPI documentation. A similar problem involving the register function, possibly related to Hregister, was encountered. Once again, there is no documentation as to what purpose this function served or how is to be used. Examples proved to be equally cryptic. The following is a 10-line example we derived that converts text to speech:

……

VTxtAuto voice;

File ourFile;

DirectSR audio;

…….

Audio = new DirectSR();

Voice = new VtxtAuto();

Voice.iid.set(ourFile.toString());

Voice.Register(ourFile.toString(), "Blah");

Int voiceEnb = voice.getEnabled();

Voice.setEnabled(voiceEnb);

Voice.speak("Hello Telephony World", 1);

……..

The equivalent example from the SAPI documentation is as follows:

BOOL BeginOLE(){

HRESULT hRes; // Initialize OLE.

if (FAILED(CoInitialize(NULL)))

return ReleaseInterfaces("CoInitialize() failed.");

// Create a Voice Text object.

if (CoCreateInstance(CLSID_VTxt, NULL, CLSCTX_LOCAL_SERVER,

IID_IVoiceText, (LPVOID *) &gpIVTxt) != S_OK)

return ReleaseInterfaces(

"Error in CoCreateInstance for Voice Text interface." );

// Get the address of the Voice Text attributes interface.

hRes = gpIVTxt->QueryInterface(IID_IVTxtAttributes,

(LPVOID FAR *) &gpIVTxtAttr); if (FAILED(hRes))

return ReleaseInterfaces(

"Failed to get Voice Text attributes interface.");

// Create and register the Voice Text notification sink.

gpVTxtNotifySink = new CIVTxtNotifySink; if (gpVTxtNotifySink == NULL)

return ReleaseInterfaces(

"Out of memory for Voice Text notification object.");

hRes = gpIVTxt->Register(NULL, "SRClock", gpVTxtNotifySink,

IID_IVTxtNotifySink, VTXTF_ALLMESSAGES, NULL); if (FAILED(hRes))

return ReleaseInterfaces(

"Failed to register Voice Text notification sink."); return

TRUE;

}

It is clear that the examples provided in the SAPI documentation is convoluted and does not get to the point.

Eloquence

As one of the two speech packages provided, it was rumored to be easier to incorporate into a project than Microsoft’s SDK. Upon reviewing the API, we felt that Eloquence would be our speech engine of choice since it had a "readable" API and because it specified an 8kHz sampling rate that was phone-optimized. Upon attempting to use it on our own machines we found the SDK installation procedure to be controversial in that it required the manually altering the registry keys, in which the documentation states should only be modified by ETI-Eloquence. The ETI-Eloquence Command Interface Specification was direct and explicit in explaining how to use Eloquence. On the first page of the specification there is a statement describing how to synthesize text by using the eciSpeakText(). This is in stark contrast to the Microsoft SAPI which is replete with examples that use cryptic variables and multiline examples that obscure the functions they are attempting to showcase.

Protocols

Protocols we learned about included SMTP and POP3 in implementing mail related aspects of the project.

Standards

Standards we learned about include MIME and WAV.

Telephony APIs :TAPI/JTAPI

TAPI provided by Microsoft was studied to further understand this project. In doing so, it became evident that TAPI had better support for Computer-centric model. In this model, the telephone line is connected to a pc via a modem, replacing the telephone whose functionality is reduced to being a source for sound input and output. JTAPI implements a Phone-Centri model, where the phone is the "brain" and controls the calls. The Java Comm API, released November 18^th (previous version was .7 beta) might be able to bridge the gap in that one could write telephony applications that were computer centric using this new API. An excellent resource on TAPI is one by Chris Sells called Windows Telephony Programming. Although the code is written in C/C++ we found this book useful in understanding the ideas behind telephony. JTAPI provides an interesting contrast for the above stated reason as well as for the idea that the purpose of the API is to provide a framework for building applications. This framework is what groups, like MIA have been asked to work with in implementing projects for Com S 519.

Developer Resources

In doing research to understand the caveats involved in implementing a voice – email gateway, knowledge of how and where to search for information was gained. We frequently visited www.msdn.microsoft.com, www.java.sun.com, http://developer.java.sun.com/developer/techDocs/, and several other online sites that have mentioned in the text of this report. Familiarity with these resources was critical in presenting new technologies and salient examples of technology we were not familiar with. Newsgroups were also useful in parsing other developer’s experiences with various software solutions. For example, question number 43 in the newsgroup "nativemethods" dealt with interfacing a Java program to Microsoft’s text to speech engine using the SAPI standard. Problems regarding the use of JNI were related to the poster, as well as a suggestion to use J/Direct. Discussions on newsgroups was a developer resource that enabled our group to benefit from insights of other developers working in the field.

Dealing with Large projects

Dealing with large projects has bee a problem from the start. It is impossible to learn any one API well. Instead, you learn 10 APIs poorly or just enough to get things done. As the extensive list of APIs that we needed to learn or at least know about shows, this number of technologies one must be aware of grows quickly with the size of the project. Modularity also becomes a thorny issue. It is not good enough to implement your own work, it must interact correctly with other people’s software. Working in parallel with your partners also becomes an effective way to accomplish tasks. Utilizing software tools like CVS which enables one to check in and out files becomes critical.

Collaboration with other groups

Collaboration with other groups was poor at best. Management was not effective in providing resources or communicating the availability of resources to the Apps groups. During the most critical periods of integration there was a vacuum in the area of support. For example, I received the following email regarding testing times from management:

Date: Tue, 15 Dec 1998 04:06:14 -0500 (EST)

From: Anurag Sharma <as106@cornell.edu>

To: David Patariu <dnp4@cornell.edu>

Subject: Re: CS519 MIA, we have another machine

In the late afternoon and evening I believe

-Anu

On Tue, 15 Dec 1998, David Patariu wrote:

> What times are the MIA gateway available for final testing?

> I would like some time prior to the large group test.

Unfortunately, there was no group test. Another group was told by management that it was "every man for themselves". We spent a large amount of time getting the MIA machine in the lab prepared for testing with the proper applications and settings, something we should not have had to do.

Our code was supposed to interact with 3 other groups: data, signaling, and directory. Basic password authentication was not provided by directory, and directory completed mail forwarding info functionality too late (Morning of the demo) to realistically be incorporated into our structure.

-Dealing with Other Coursework

This class is running more like a 6 credit class, and thus has adversely affected other coursework. With the 0 or 5 grading scale in place, it puts an undue pressure to complete work for this class. Are we learning? Of course, but it is mostly autotutorial work at the expense of learning in other classes. At an institution like Cornell with some of the most renowned faculty in Computer Science, I would prefer to be working and learning in small project groups that interact directly with a faculty member than doing work that for all intensive purposes could be done at any other institution for the amount of time groups spend with the faculty members here.

5. What would you do differently next time: (1-2 pages) Everyone makes mistakes. But good learners don't repeat mistakes, they make all new ones! This project gave you a lot of freedom to make mistakes. What will you look out for the next time you undertake a large collaborative project?

The next time I would complete my null implementation. A null implementation is critical in showing that work was done towards the project and that even though other parts of the overall project pay not work that under the right circumstances the code that was written would work.

Maybe do more of the sound implementation in C, since enhanced control and record/playback features ore more sophisticated.

Interact directly with signals, and be more adamant about support from directory. Directory caused a part of our implication to be incomplete. We have the logic, and we can do some limited proofs that DTMF tone detection works, but this is a relatively straightforward application that could have been integrated.

Next time I intend to be more demanding of management. It is clear that we were let down by their lack of preparedness.

6. Interface that your team will provide to other teams or use: Please give the exact procedure calls that you will make or support. This is your final interface spec. C/C++/Java code is OK.

Our team does not provide any interfaces to other teams. In theory we could use the Jaudio package provided by one of the other application groups. The package would allow us to play audio clips in the form of 8kHz wav files over the network to a phone or another computer. This package also provides the capability of recording sound clips, which would be useful in recording messages that users wish to have sent as mail attachments. This package was also incorporated into MIAphone, a testing tool that we use to place calls.

Our application does not use the interface provided by directory for reasons explained in previous sections.

Our application would use the mia.signal.client.* package in order to accept call objects.

7. Advice for the course staff: What mistakes did we make in running this project? Please help us improve the course.

The course could be improved by making it a 5-credit course, which would entitle it to additional staff resources. The course could benefit from better equipment, machines with zip drives, and additional machines for testing of Telephony applications. It would have been nice to have a lab session where the TA’s were available for questions.

8. What sources did you consult in working out the details of your project? URLs for

Web pages are acceptable for references.

Windows Telephony Programming, A developers Guide to TAPI by Chris Sells—Addison-Wesley 1998

Essential JNI, Java Native Interface by Rob Gordon—1998 by Prentice Hall PTR

Tricks of the Java Programming Gurus by Glenn Vanderburg—Sams Publishing1996

MSDN 6.0a Library – Understanding the Java/Com Integration Model

MSDN 6.0a Library- Package com.ms.directX

ETI-Eloquence SAPI Server, Release 4.0, 1998 Eloquent Technology, Inc.,

ETI-Eloquence Command Interface Specification, Release 4.0, 1998 Eloquent Technology, Inc.

The Java Tutorial, A Practical Guide For Programmers http://www.java.sun.com/docs/books/tutorial/

Dr. GUI and ALT, Part 1: A Very Simple ATL Component, October 12, 1998

http://msdn.microsoft.com/developer/news/drgui/093098.htm

Active Visual J++ by Scott Robert Ladd—Microsoft Press, 1997

Dr. GUI: Going Native With J/Direct, November 17 1997

http://premium.microsoft.com/msdn/library/welcome/dsmsdn/msdn_drguinat.htm