Add MP3 capabilities to Java Sound with SPI

The Service Provider Interface adds functionality to Java Sound without recoding

The digital audio world has changed rapidly over the last ten years, introducing all sorts of new and exciting audio file formats: AU, AIF, MIDI, and WAV, to name a few. The recent arrival of the MP3 file format has set the music world on fire, and the trend shows no sign of slowing as new, better-sounding, and more-compact audio formats replace older, less-efficient ones. How is a computer subsystem such as the Java Sound audio system able to cope with those changes?

Thanks to a new feature in Java 2 1.3 -- the Java Service Provider Interface (SPI) -- the JVM provides audio subsystem information at runtime. Java Sound uses the SPI at runtime to provide sound mixers, file readers and writers, and format conversion utilities to a Java sound program. That allows older Java programs, even Java 1.02 programs, to take advantage of the newly added functions with no changes and no recompiling. Indeed, more functions can be added to Java Sound to take advantage of new file formats, popular compression methods, or even hardware-based sound processors.

In this article, we'll look at the SPI illustrated with a real world example: Java Sound extended to read, convert, and play MP3 sound files.

Note: To download the complete source code for this article, see Resources.

To understand the Service Provider Interface (SPI), it helps to think of a JVM as a provider of services to a Java program -- the consumer of those services. The consumer uses a known interface to request a JVM-provided service. For instance, with Java Sound the Java program requests to play an audio file with one of the public sound methods. In Java 2 version 1.3, the AudioSystem queries itself to see if it can handle the given sound file type. If it can, the sound is played. If it cannot, an exception is thrown, typically the for older Java audio programs that use the or java.applet packages. In contrast, newer Java Sound programs that use the javax.sound package typically throw the javax.sound.sampled.UnsupportedAudioException. Either way, the JVM is telling you it cannot provide the requested service.

In Java 2 version 1.2, the sound subsystem was enhanced to handle audio files of many types: WAV, AIFF, MIDI, and most AU types. With that enhancement -- as if by magic -- the older programs that use the or java.applet packages were able to handle new audio file types. That development represented a blessing to Java audio users, but it still did not allow users to extend the JVM. Java audio programs were still limited to the audio file types provided by the JVM maker.

With Java 2 version 1.3's SPI, we see an architected method of extending the JVM. Java Sound knows how to query those service providers and, when presented with an audio file, one of the service providers may indicate that it knows how to read the audio file type or knows how to convert it. Then the sound subsystem uses that service provider to play the sound.

Next, we examine how to add new service providers to take advantage of one popular audio file type, the MP3 or MPEG Layer 3 audio type developed in the Motion Picture Expert Group ISO standard released several years ago.

Preparing new services

Service providers add services to the JVM by supplying the class files that perform the service and listing those services in a JAR file's special META-INF/services directory. That directory lists all service providers, and JVM subsystems look for additional services there. With that information in mind, let's take look at how Java Sound's implementation provides audio file readers for the standard sampled audio file types: WAV, AIFF, and AU.

The JRE's important rt.jar file, located in the jre/lib directory of a Java installation, contains most of the JRE's runtime Java classes. If you unzip the rt.jar file, you will find that it contains a META-INF/services directory, inside of which you'll find several files that are named with a javax.sound prefix. One of those files -- javax.sound.sampled.spi.AudioFileReader -- contains a list of classes that provide the reading capability to the Java Sound subsystem. Upon opening that UTF-8-encoded file, you will see:

# Providers for audio file reading

The above classes list the service providers that provide audio file read capability to the Java Sound subsystem. The subsystem instantiates those classes, uses them to describe the audio file data format, and gets an AudioInputStream from the file. Similarly, META-INF/services contains other SPI files to enumerate MIDI devices, mixers, sound banks, format converters, and other pieces of the Java Sound subsystem.

The advantage to that architecture: the Java Sound subsystem becomes extensible. To be more specific, other JAR files added to the JRE classpath may contain other service providers that provide additional services. The audio subsystem can query all the service providers and match the appropriate service with the consumer's request. To the consumer, how the services become available and are queried remains completely transparent. Consequently, with the right service providers, older programs can now run with new audio file types -- a big feature.

Let's now move from the theoretical to the concrete by examining how to provide a new service: MP3 audio files.

Implementing the SPI

In this section, we will go step by step through a concrete example of extending the Java Sound audio subsystem using the SPI. To get started, there are two basic classes that link an MP3 decoder to the Java Sound subsystem so that it can play MP3 files:

  • The BasicMP3FileReader (extends AudioFileReader) knows how to read MP3 files
  • The BasicMP3FormatConversionProvider (extends FormatConversionProvider) knows how to convert an MP3 stream to one the Java Sound subsystem can play

The two classes let Java Sound know that MP3 capability is available.

Note: For the purposes of this article, I've kept the classes extremely simple. Many types of encoded MPEG audio exist, but the basic MP3 service provided in this article supports only MPEG versions 1 or 2, layer 3. It does not support multichanneled movie soundtracks. For a full-fledged MPEG decoder, one should investigate the free source Tritonus Java Sound implementation developed by Matthias Pfisterer, available in Resources.

Implementation: Part 1, the BasicMP3FileReader

We begin by implementing the BasicMP3FileReader class, which extends the abstract class javax.sound.sampled.spi.AudioFileReader and requires us to implement the following methods:

  • public abstract AudioFileFormat getAudioFileFormat( InputStream stream ) throws UnsupportedAudioFileException, IOException;
  • public abstract AudioFileFormat getAudioFileFormat( URL url ) throws UnsupportedAudioFileException, IOException;
  • public abstract AudioFileFormat getAudioFileFormat( File file ) throws UnsupportedAudioFileException, IOException;
  • public abstract AudioInputStream getAudioInputStream( InputStream stream ) throws UnsupportedAudioFileException, IOException;
  • public abstract AudioInputStream getAudioInputStream( URL url ) throws UnsupportedAudioFileException, IOException;
  • public abstract AudioInputStream getAudioInputStream( File file ) throws UnsupportedAudioFileException, IOException;

Notice that all the methods throw UnsupportedAudioFileException and IOException, which signal to Java Sound that problems exist with the MP3 file. Those exceptions should be thrown whenever a file is unreadable, bytes do not match, or sample rates or data sizes seem out of whack.

Also notice the two groups of methods to implement. The first group provides an AudioFileFormat object from one of three inputs: InputStream, URL, or File. As its ultimate goal, the getAudioFileFormat() method provides an AudioFileFormat object that describes the encoding, sample rate, sample size, number of channels, and other attributes of the audio stream. While the code contains the details of that conversion, we can summarize by noting that it reads the bytes from the stream, and those bytes are tested to ensure that the stream is, in fact, an MP3 stream, that it describes its sample rate, and that all the necessary fields are present.

Since that SPI code provides support for a new encoding, we have to invent such a class -- BasicMP3Encoding. That simple class contains a static final field to describe the new MP3 encoding in a manner similar to descriptions for existing encodings for PCM, ALAW, and ULAW in the javax.sound.sampled.AudioFormat class.

We also implement the BasicMP3FileFormatType class in a manner similar to javax.sound.sampled.AudioFileFormat, as seen below:

public class BasicMP3Encoding extends AudioFormat.Encoding {
   public static final AudioFormat.Encoding MP3 = new BasicMP3Encoding( "MP3" );
   public BasicMP3Encoding( String encodingName ) {
      super( encodingName );

BasicMP3FileReader's second group of methods provides an AudioInputStream from the same inputs. Since an InputStream can be pulled from a URL or File, we can use the getAudioInputStream() method with the InputStream parameter to implement the other two methods.

This is shown here:

public AudioInputStream getAudioInputStream( URL url )
   throws UnsupportedAudioFileException, IOException {
   InputStream inputStream = url.openStream();
   try {
      return getAudioInputStream( inputStream );
   } catch ( UnsupportedAudioFileException e ) {
      throw e;
   } catch ( IOException e ) {
      throw e;

The stream is tested by using the getAudioFileFormat( inputStream ) method to ensure it is an MP3 stream. Then we create a new generic AudioInputStream from the MP3 stream. For further details, read the source file.

Now that we have implemented the AudioFileReader, we are halfway to our goal. Let's look at how to implement the second half of our service provider, the FormatConversionProvider.

Implementation: Part 2, the BasicMP3FormatConversionProvider

Next, we implement BasicMP3FormatConversionProvider, which extends the abstract class javax.sound.sampled.spi.FormatConversionProvider. A format conversion provider converts from a source to a target audio format. To implement BasicMP3FormatConversionProvider, we must implement the following methods:

  • public abstract AudioFormat.Encoding[] getSourceEncodings();
  • public abstract AudioFormat.Encoding[] getTargetEncodings();
  • public abstract AudioFormat.Encoding[] getTargetEncodings( AudioFormat srcFormat );
  • public abstract AudioFormat[] getTargetFormats( AudioFormat.Encoding targetEncoding, AudioFormat sourceFormat );
  • public abstract AudioInputStream getAudioInputStream( AudioFormat.Encoding targetEncoding, AudioInputStream sourceStream );
  • public abstract AudioInputStream getAudioInputStream( AudioFormat targetFormat, AudioInputStream sourceStream );

As you can see, we have three groups of methods. The first group simply enumerates the source and target encodings that the format-conversion provider supports. The BasicMP3FormatConversionProvider class contains some large static arrays that describe the input and output formats supported by the underlying MPEG decoder.

For instance, the source formats are given below. The source encodings simply are derived from those formats when the class instantiates. Whenever someone calls the getSourceEncodings() method, the source encoding array is returned.

protected static final AudioFormat [] SOURCE_FORMATS = {
   // encoding, rate, bits, channels, frameSize, frameRate, big endian
   new AudioFormat( BasicMP3Encoding.MP3,  8000.0F, -1, 1, -1, -1, false ),
   new AudioFormat( BasicMP3Encoding.MP3,  8000.0F, -1, 2, -1, -1, false ),
   new AudioFormat( BasicMP3Encoding.MP3, 11025.0F, -1, 1, -1, -1, false ),
   new AudioFormat( BasicMP3Encoding.MP3, 11025.0F, -1, 2, -1, -1, false ),

BasicMP3FormatConversionProvider's second group of methods, containing the getTargetFormats() method, proves rather tricky. We want getTargetFormats() to return a target AudioFormat that can be created from the given source AudioFormat. Additionally, the target encoding is given, and the target AudioFormat must be of that encoding. To perform that tricky maneuver, the BasicMP3FormatConversionProvider creates a hashtable to help speed the mapping. The hashtable maps the target format to another hashtable of possible target encodings. The target encodings each point to a set of target audio formats. If you find that difficult to visualize, just remember that the format-conversion provider contains data structures to quickly return a target AudioFormat from a given source AudioFormat.

The third group of methods, two versions of getAudioInputStream(), provides a decoded audio stream from the given input MP3 stream. Simply put, the conversion provider checks that the conversion is supported and, if it does, returns a decoded linear audio-input stream from the given encoded MP3 audio stream. If the conversion is not supported, an IllegalArgumentException is thrown. At that point, our service provider code must actually start decoding the MPEG data stream. As such, it's where the rubber meets the road, as illustrated below:

if ( isConversionSupported( targetFormat, audioInputStream.getFormat() )) {
   return new DecodedMpegAudioInputStream( targetFormat, audioInputStream );
throw new IllegalArgumentException( "conversion not supported" );
1 2 Page 1
Page 1 of 2