Adding Voice to Java EE With SIP Servlets

by Prasad Subramanian

Session Initiation Protocol (SIP) is a signaling protocol that is used to set up, modify, and terminate a session between two endpoints. SIP can be used to set up a two-party call, a multi-party call, or even a multicast session for Internet calls, multimedia calls, and multimedia distribution. JSR 116: SIP Servlet API is a server-side interface describing a container of SIP components or services. SIP servlets, servlets that run in a SIP container, are similar to HTTP Servlets, but also support the SIP protocol. Together, SIP and SIP servlets, are behind many popular telecommunications-based applications that provide services such as Voice-over-IP (VoIP), instant messaging, presence and buddy list management, as well as web conferencing.

SIP and SIP servlets are also important in the enterprise. Combined with Java EE technology, SIP servlets can be used to add rich media interactions to enterprise applications. JSR 289: SIP Servlet v1.1 updates the SIP Servlet API and defines a standard application programming model to mix SIP servlets and Java EE components. SIP servlets are going to play an even bigger part in building the next generation of telecommunications services.

This Tech Tip covers some of the basic concepts underlying SIP and SIP servlets. It also presents a sample application that uses SIP servlets and HTTP servlets to provide VoIP phone service.

What is SIP?

An easy way to describe SIP is to examine a usage scenario. Let's say a user identified as A wants to set up a call with a user identified as B. In a telecommunications setting, user A and B would communicate through what are called user agents. One example of a user agent is a soft phone -- a software program for making telephone calls over the Internet. Another example is a VoIP Phone -- a phone that uses VoIP. Here are the steps that need to happen to set up the call:

  1. A invites B to start a conversation. As part of the invitation, A indicates what media it is capable of supporting.
  2. B receives the invitation, sends an immediate response to A, and then evaluates the invitation.
  3. When B is ready to accept the invitation, it sends an acknowledgment to A. As part of the acknowledgement, B indicates what media it supports.
  4. A examines the acknowledgment it receives from B and determines if the media supported by B and A are the same. If A and B support the same media, a call is set up between A and B. The media specified in the invitation facilitates the call.

Figure 1 illustrates the steps in setting up a call.

Steps in Setting Up a Call
Figure 1. Steps in Setting Up a Call

SIP provides a standardized way of carrying out these steps. It does this by defining specific request methods, responses, response codes, and headers for signaling and call control. The protocol has been standardized by the Internet Engineering Task Force (IETF) under RFC3261 and is now accepted as the standard signaling protocol for the 3rd Generation Partnership Project (3GPP) and as a permanent element in the IP Multimedia Subsystem (IMS) architecture.

How is SIP related to HTTP?

People often ask if SIP uses HTTP as the underlying protocol. The answer is no. SIP is a protocol that operates at the same layer as HTTP, that is, the application layer, and uses TCP, UDP, or SCTP as the underlying protocol. However, SIP does have a lot of similarities with HTTP. For example, like HTTP, SIP is text based and user readable. Also like HTTP, SIP uses a request-response mechanism with specific methods, response codes, and headers. A notable difference between HTTP and SIP is that the request-response mechanism is asynchronous in SIP -- a request does not need to be followed by a corresponding response. In fact, a SIP request could result in one or more requests being generated.

SIP is a peer-to-peer protocol. This means that a user agent can act as a server as well as a client. This is another difference between SIP and HTTP. In HTTP, a client is always a client and a server is always a server.

SIP supports the following request methods and response codes:

Request methods:

  • REGISTER. Used by a client to register an address with a SIP server. .
  • INVITE. Indicates that the user or service is being invited to participate in a session. The body of this message includes a description of the session to which the user or service is being invited.
  • ACK. Confirms that the client has received a final response to an INVITE request. This method is only used with INVITE requests.
  • CANCEL. Used to cancel a pending request.
  • BYE. Sent by a user agent client to indicate to the server that it wishes to terminate the call.
  • OPTIONS. Used to query a server about its capabilities.

Response codes:

  • 1xx: Provisional. An ACK that indicates the action was successfully received, understood, and accepted.
  • 3xx: Redirection. Further action is required to process this request.
  • 4xx: Client Error. The request contains incorrect syntax and cannot be fulfilled at this server.
  • 5xx: Server Error. The server failed to fulfill an apparently valid request.
  • 6xx: Global Failure. The request cannot be fulfilled at any server.

Session Description Protocol

Session Description Protocol (SDP) is a format for describing the media format and type to be used in a multimedia session. SIP uses SDP as a payload in its messages to facilitate the exchange of capabilities between various user agents. For example, the content of an SDP might specify the codecs supported by the user agent and the protocol to be used such as Real-time Transport Protocol (RTP).

SIP Message

Figure 2 shows the composition of a SIP message . There are three major parts:

  • Request Line. Specifies the request method, address, and SIP version.
  • Headers. Specify data about the session or call to be set up or terminated.
  • Message Body. Provides the payload, that is the SDP, describing the media for the session.

Composition of a SIP Message
Figure 2. Composition of a SIP Message

The SIP Servlet Model

The SIP servlet programming model is based on the servlet programming model. It brings programming in SIP closer to Java EE. Servlets are server-side objects that process incoming requests and send an appropriate response to the client. They are typically deployed in a servlet container and have a well-defined life cycle. The servlet container is responsible for managing the life cycle of the servlets within the container and managing resources related to technologies such as JNDI and JDBC that the servlet uses. The servlet container also manages network connections for servlets.

As mentioned earlier, SIP servlets are similar to HTTP Servlets, except that they process SIP requests. They do this by defining specific methods to process each of the SIP request methods. For example, HTTP servlets define the doPost() method, which overrides the service() method, to handle POST requests. By comparison, SIP servlets define a doInvite() method, which also overrides the service() method, to handle INVITE requests.

JSR116 defined SIP Servlet API 1.0. It specified:

  • An API for the SIP servlet programming model.
  • The responsibilities of the SIP servlet container.
  • How SIP servlets interface with HTTP servlets and Java EE components.

The initial SIP Servlet API specification is being revised by JSR 289: SIP Servlet v1.1.

SIP Servlet API -- Key Concepts

The key concepts that underlie SIP servlets are similar to those that underlie HTTP servlets. The following sections briefly describe some of those concepts.

SipServletRequest and SipServletResponse

The request-response methodology in SIP is similar to that for HTTP servlets. A request is defined in a SipServletRequest object and a response in a SipServletResponse object. However, only one ServletRequest or ServletResponse object is non-null. That's because a SIP request does not result in a symmetric response. There is also a common super interface, called SipServletMessage, for both SipServletRequest and SipServletResponse objects. The SipServletMessage interface defines the methods that are common to SipServletRequest and SipServletResponse objects.

Figure 3 illustrates the hierarchy of the SipServletRequest and SipServletResponse objects.

Composition of a SIP Message
Figure 3. Hierarchy of SipServletRequest and SipServletResponse Objects

Servlet Context

The servlet context as defined in the servlet specification also applies to SIP servlets. The servlet specification defines specific context attributes that are used to store and retrieve information specific to SIP servlets and interfaces from the context. The servlet context can be shared with HTTP servlets within the same application. This is explained in the section Converged Applications.

Deployment Descriptor

An XML-based deployment descriptor is used to describe the SIP servlets, the rules for invoking them, as well as the resources and environment property used in the application. This descriptor is in a sip.xml file and is similar to the web.xml file used in HTTP servlets. The sip.xml file is defined by an XML schema.

SIP Application Packaging

SIP applications have the same packaging structure as web applications. They are packaged in the JAR format with a file extension of .sar (Sip archive) or .war (web archive).

Converged Context and Converged Application

An application may use both SIP and HTTP servlets to create a service. To allow for HTTP and SIP servlets being in the same application package, the SIP servlet specification defines a ConvergedContext object. This object holds the servlet context shared by both HTTP and SIP servlets and provides the same view of the application to SIP and HTTP servlets in terms of servlet context attributes, resources, and JNDI namespaces.

When an application includes both SIP and HTTP servlets it is known as a converged application. This is in contrast to a SIP-only application, which is called a SIP application. A converged application is similar in structure to a SIP application except that it has a web.xml file as a deployment descriptor in addition to a sip.xml file. In SIP Servlet API 1.1 (JSR289), the concept of converged applications has been extended to also cover enterprise applications. Enterprise applications can now include a SIP application or a converged application as a module. This type of enterprise application is called a converged enterprise application.

SIP Sessions

The SIP servlet specification defines SipSession objects to represent a session over SIP in the same way as HttpSession objects represent sessions over HTTP. Because a single application such as a converged application may have sessions over HTTP and SIP, the specification also defines SipApplicationSession, which is a session object at the application level. The SipApplicationSession object acts as a parent to HTTP and SIP sessions (that is, protocol sessions) in an application.


Recall that the SIP Servlet API 1.1 aims to align SIP servlets with Java EE 5. As a result, the specification introduces the use of annotations defined by Java EE 5 within SIP servlets and listeners. It also defines custom annotations to represent interfaces defined by the SIP servlet specification. The specification introduces the following annotations:

Project Sailfin - an Open Source SIP Application Server

A SIP servlet container can be standalone, that is, supporting only SIP servlets, or it can be a converged container supporting both HTTP and SIP servlets. However, for most enterprise uses, a SIP servlet container needs to be a converged container within an application server. Project Sailfin is an effort to produce an open source implementation of a SIP servlet container using the GlassFish application server. The project is being developed under, with Sun and Ericsson as the major contributors. Sailfin, the SIP servlet container implementation in GlassFish being developed in the SailFin project, supports SIP Servlet API 1.0 and aims to support SIP Servlet API 1.1 when the specification is finalized.

The CallSetup Sample Application

1 2 3 Page 1
Page 1 of 3