Wizard API updated!
Tim Boudreau has released a new version of the Swing Wizard library (version 0.997) that fixes the WizardException bug reported in JavaWorld's recent Open Source Java Project profile. The article's examples have been reworked to test out the new, improved WizardException. Thanks, Tim, for this helpful fix!
Open Source Java Projects: The Wizard API

Newsletter sign-up

Sign up for our technology specific newsletters.

Enterprise Java
View all newsletters

Email Address:

Talking Java!

Add speech capability to your Java 1.3 applications and applets

Why would you want to make your applications talk? For a start, it's fun, and suitable for fun applications like games. And there's a more serious accessibility side. I'm thinking here not just of those naturally disadvantaged when using a visual interface, but also those situations where it's impossible -- or even illegal -- to take your eyes off what you're doing.

Recently I've been working with some technologies to take HTML and XML information from the Web [see "Access the World's Biggest Database with Web DataBase Connectivity" (JavaWorld, March 2001)]. It occurred to me that I could plug that work and this idea together to build a talking Web browser. Such a browser would prove useful for listening to snippets of information from your favorite sites -- news headlines, for example -- just like listening to the radio while out walking your dog or driving to work. Of course, with current technology you'd have to carry around your laptop computer with your mobile phone attached, but that impractical scenario could well change in the near future with the arrival of Java-enabled smart phones like the Nokia 9210 (9290 in the US).

Perhaps more useful in the short term would be an email reader, also possible thanks to the JavaMail API. This application would check your inbox periodically, and your attention would be attracted by a voice from nowhere proclaiming "You have new mail, would you like me to read it to you?" In a similar vein, consider a talking reminder -- connected with your diary application -- that shouts out "Don't forget your meeting with the boss in 10 minutes!"

Assuming you're sold on those ideas, or have some good ideas of your own, we'll move on. I'll start by showing how to put my supplied zip file to work so you can get up-and-running straightaway and skip the implementation details if you think that's too much hard work.

Test drive the speech engine

To use the speech engine, you'll need to include the jw-0817-javatalk.zip file in your CLASSPATH and run the com.lotontech.speech.Talker class from the command line or from within a Java program.

To run it from the command line, type:

java com.lotontech.speech.Talker "h|e|l|oo"


To run it from a Java program, simply include two lines of code:

com.lotontech.speech.Talker talker=new com.lotontech.speech.Talker();
talker.sayPhoneWord("h|e|l|oo");


At this point you probably wonder about the format of the "h|e|l|oo" string you supply on the command line or provide to the sayPhoneWord(...) method. Let me explain.

The speech engine works by concatenating short sound samples that represent the smallest units of human -- in this case English -- speech. Those sound samples, called allophones, are labeled with a one-, two-, or three-letter identifier. Some identifiers are obvious and some not so obvious, as you can see from the phonetic representation of the word "hello."

  • h -- sounds as you would expect
  • e -- sounds as you would expect
  • l -- sounds as you would expect, but notice that I've reduced a double "l" to a single one
  • oo -- is the sound for "hello," not for "bot," and not for "too"


Here is a list of the available allophones:

  • a -- as in cat
  • b -- as in cab
  • c -- as in cat
  • d -- as in dot
  • e -- as in bet
  • f -- as in frog
  • g -- as in frog
  • h -- as in hog
  • i -- as in pig
  • j -- as in jig
  • k -- as in keg
  • l -- as in leg
  • m -- as in met
  • n -- as in begin
  • o -- as in not
  • p -- as in pot
  • r -- as in rot
  • s -- as in sat
  • t -- as in sat
  • u -- as in put
  • v -- as in have
  • w -- as in wet
  • y -- as in yet
  • z -- as in zoo


  • aa -- as in fake
  • ay -- as in hay
  • ee -- as in bee
  • ii -- as in high
  • oo -- as in go


  • bb -- variation of b with different emphasis
  • dd -- variation of d with different emphasis
  • ggg -- variation of g with different emphasis
  • hh -- variation of h with different emphasis
  • ll -- variation of l with different emphasis
  • nn -- variation of n with different emphasis
  • rr -- variation of r with different emphasis
  • tt -- variation of t with different emphasis
  • yy -- variation of y with different emphasis


  • ar -- as in car
  • aer -- as in care
  • ch -- as in which
  • ck -- as in check
  • ear -- as in beer
  • er -- as in later
  • err -- as in later (longer sound)
  • ng -- as in feeding
  • or -- as in law
  • ou -- as in zoo
  • ouu -- as in zoo (longer sound)
  • ow -- as in cow
  • oy -- as in boy
  • sh -- as in shut
  • th -- as in thing
  • dth -- as in this
  • uh -- variation of u
  • wh -- as in where
  • zh -- as in Asian


In human speech the pitch of words rises and falls throughout any spoken sentence. This intonation makes the speech sound more natural, more emotive, and allows questions to be distinguished from statements. If you've ever heard Stephen Hawking's synthetic voice, you understand what I'm talking about. Consider these two sentences:

  • It is fake -- f|aa|k
  • Is it fake? -- f|AA|k


As you might have guessed, the way to raise the intonation is to use capital letters. You need to experiment with this a little, and my hint is that you should concentrate on the long vowel sounds.

That's all you need to know to use the software, but if you're interested in what's going on under the hood, read on.

Implement the speech engine

The speech engine requires just one class to implement, with four methods. It employs the Java Sound API included with J2SE 1.3. I won't provide a comprehensive tutorial of the Java Sound API, but you'll learn by example. You'll find there's not much to it, and the comments tell you what you need to know.

Here's the basic definition of the Talker class:

package com.lotontech.speech;
import javax.sound.sampled.*;
import java.io.*;
import java.util.*;
import java.net.*;
public class Talker
{
  private SourceDataLine line=null;
}


If you run Talker from the command line, the main(...) method below will serve as the entry point. It takes the first command line argument, if one exists, and passes it to the sayPhoneWord(...) method:

/*
 * This method speaks a phonetic word specified on the command line.
 */
public static void main(String args[])
{
  Talker player=new Talker();
  if (args.length>0) player.sayPhoneWord(args[0]);
  System.exit(0);
}


The sayPhoneWord(...) method is called by main(...) above, or it may be called directly from your Java application or plug-in supported applet. It looks more complicated than it is. Essentially, it simply steps though the word allophones -- separated by "|" symbols in the input text -- and plays them one by one through a sound-output channel. To make it sound more natural, I merge the end of each sound sample with the beginning of the next one:

1 | 2 |  Next >
Resources
  • "Add MP3 capabilities to Java Sound with SPI," Dan Becker (JavaWorld, November 2000)
    http://www.javaworld.com/jw-11-2000/jw-1103-mp3.html
  • Sign up for the JavaWorld This Week free weekly email newsletter to learn what's new at JavaWorld: http://www.idg.net/jw-subscribe
  • You'll find a wealth of IT-related articles from our sister publications at IDG.net