The Session Initiation Protocol (SIP), defined in RFC 3261 , is an application level signaling protocol for setting up, modifying, and terminating real-time sessions between participants over an IP data network. SIP can support any type of single-media or multi-media session, including teleconferencing.
SIP is just one component in the set of protocols and services needed to support multimedia exchanges over the Internet. SIP is the signaling protocol that enables one party to place a call to another party and to negotiate the parameters of a multimedia session. The actual audio, video, or other multimedia content is exchanged between session participants using an appropriate transport protocol. In many cases, the transport protocol to use is the Real-Time Transport Protocol (RTP). Directory access and lookup protocols are also needed.
The key driving force behind SIP is to enable Internet telephony, also referred to as Voice over IP (VoIP). There is wide industry acceptance that SIP will be the standard IP signaling mechanism for voice and multimedia calling services. Further, as older Private Branch Exchanges (PBXs) and network switches are phased out, industry is moving toward a voice networking model that is SIP signaled, IP based, and packet switched, not only in the wide area but also on the customer premises [2, 3].
SIP supports five facets of establishing and terminating multimedia communications:
|User location: Users can move to other locations and access their telephony or other application features from remote locations.|
|User availability: This step involves determination of the willingness of the called party to engage in communications.|
|User capabilities: In this step, the media and media parameters to be used are determined.|
|Session setup: Point-to-point and multiparty calls are set up, with agreed session parameters.|
|Session management: This step includes transfer and termination of sessions, modifying session parameters, and invoking services.|
SIP employs design elements developed for earlier protocols. SIP is based on an HTTP-like request/response transaction model. Each transaction consists of a client request that invokes a particular method, or function, on the server and at least one response. SIP uses most of the header fields, encoding rules, and status codes of HTTP. This provides a readable text-based format for displaying information. SIP incorporates the use of a Session Description Protocol (SDP), which defines session content using a set of types similar to those used in Multipurpose Internet Mail Extensions (MIME).
SIP Components and Protocols
A system using SIP can be viewed as consisting of components defined on two dimensions: client/server and individual network elements. RFC 3261 defines client and server as follows:
|Client: A client is any network element that sends SIP requests and receives SIP responses. Clients may or may not interact directly with a human user. User agent clients and proxies are clients.|
|Server: A server is a network element that receives requests in order to service them and sends back responses to those requests. Examples of servers are proxies, user agent servers, redirect servers, and registrars.|
The individual elements of a standard SIP configuration include the following:
|User Agent: The user agent resides in every SIP end station. It acts in two roles:
|Redirect Server: The redirect server is used during session initiation to determine the address of the called device. The redirect server returns this information to the calling device, directing the UAC to contact an alternate Universal Resource Identifier (URI). A URI is a generic identifier used to name any resource on the Internet. The URL used for Web addresses is a type of URI. See RFC 2396  for more detail.|
|Proxy Server: The proxy server is an intermediary entity that acts as both a server and a client for the purpose of making requests on behalf of other clients. A proxy server primarily plays the role of routing, meaning that its job is to ensure that a request is sent to another entity closer to the targeted user. Proxies are also useful for enforcing policy (for example, making sure a user is allowed to make a call). A proxy interprets, and, if necessary, rewrites specific parts of a request message before forwarding it.|
|Registrar: A registrar is a server that accepts REGISTER requests and places the information it receives (the SIP address and associated IP address of the registering device) in those requests into the location service for the domain it handles.|
|Location Service: A location service is used by a SIP redirect or proxy server to obtain information about a callee's possible location(s). For this purpose, the location service maintains a database of SIP-address/ IP-address mappings.|
The various servers are defined in RFC 3261 as logical devices. They may be implemented as separate servers configured on the Internet or they may be combined into a single application that resides in a physical server.
Figure 1: SIP Components and Protocols
Figure 1 shows how some of the SIP components relate to one another and the protocols that are employed. A user agent acting as a client (in this case UAC Alice) uses SIP to set up a session with a user agent that acts as a server (in this case UAS Bob). The session initiation dialogue uses SIP and involves one or more proxy servers to forward requests and responses between the two user agents. The user agents also make use of the SDP, which is used to describe the media session.
The proxy servers may also act as redirect servers as needed. If redirection is done, a proxy server needs to consult the location service database, which may or may not be colocated with a proxy server. The communication between the proxy server and the location service is beyond the scope of the SIP standard. The Domain Name System (DNS) is also an important part of SIP operation. Typically, a UAC makes a request using the domain name of the UAS, rather than an IP address. A proxy server needs to consult a DNS server to find a proxy server for the target domain.
SIP often runs on top of the User Datagram Protocol (UDP) for performance reasons, and provides its own reliability mechanisms, but may also use TCP. If a secure, encrypted transport mechanism is desired, SIP messages may alternatively be carried over the Transport Layer Security (TLS) protocol.
Associated with SIP is the SDP, defined in RFC 2327 . SIP is used to invite one or more participants to a session, while the SDP-encoded body of the SIP message contains information about what media encodings (for example, voice, video) the parties can and will use. After this information is exchanged and acknowledged, all participants are aware of the participants' IP addresses, available transmission capacity, and media type. Then, data transmission begins, using an appropriate transport protocol. Typically, the RTP is used. Throughout the session, participants can make changes to session parameters, such as new media types or new parties to the session, using SIP messages.
SIP Universal Resource Indicators
A resource within a SIP configuration is identified by a URI. Examples of communications resources include the following:
|A user of an online service|
|An appearance on a multiline phone|
|A mailbox on a messaging system|
|A telephone number at a gateway service|
|A group (such as "sales" or "help desk") in an organization|
SIP URIs have a format based on e-mail address formats, namely user@domain. There are two common schemes. An ordinary SIP URI is of the form:
The URI may also include a password, port number, and related parameters. If secure transmission is required, "sip:" is replaced by "sips:." In the latter case, SIP messages are transported over TLS.
Examples of Operation
The SIP specification is quite complex; the main document, RFC 3261, is 269 pages long. To give some feel for its operation, we present a few examples.
Figure 2 shows a successful attempt by user Alice to establish a session with user Bob, whose URI is firstname.lastname@example.org.  Alice's UAC is configured to communicate with a proxy server (the outbound server) in its domain and begins by sending an INVITE message to the proxy server that indicates its desire to invite Bob's UAS into a session (1); the server acknowledges the request (2). Although Bob's UAS is identified by its URI, the outbound proxy server needs to account for the possibility that Bob is not currently available or that Bob has moved. Accordingly, the outbound proxy server should forward the INVITE request to the proxy server that is responsible for the domain biloxi.com. The outbound proxy thus consults a local DNS server to obtain the IP address of the biloxi.com proxy server (3), by asking for the DNS SRV resource record that contains information on the proxy server for biloxi.com.
Figure 2: SIP Successful Call Setup
The DNS server responds (4) with the IP address of the biloxi.com proxy server (the inbound server). Alice's proxy server can now forward the INVITE message to the inbound proxy server (5), which acknowledges the message (6). The inbound proxy server now consults a location server to determine Bob's location Bob (7), and the location server responds with Bob's location, indicating that Bob is signed in, and therefore available for SIP messages (8).
The proxy server can now send the INVITE message on to Bob (9). A ringing response is sent from Bob back to Alice (10, 11, 12) while the UAS at Bob is alerting the local media application (for example, telephony). When the media application accepts the call, Bob's UAS sends back an OK response to Alice (13, 14, 15).
Finally, Alice's UAC sends an acknowledgement message to Bob's UAS to confirm the reception of the final response (16). In this example, the ACK is sent directly from Alice to Bob, bypassing the two proxies. This occurs because the endpoints have learned each other's address from the INVITE/200 (OK) exchange, which was not known when the initial INVITE was sent. The media session has now begun, and Alice and Bob can exchange data over one or more RTP connections.
Figure 3: SIP Presence Example
The next example (Figure 3) makes use of two message types that are not yet part of the SIP standard but that are documented in RFC 2848  and are likely to be incorporated in a later revision of SIP. These message types support telephony applications. Suppose that in the preceding example, Alice was informed that Bob was not available. Alice's UAC can then issue a SUBSCRIBE message (1), indicating that it wants to be informed when Bob is available.
This request is forwarded through the two proxies in our example to a PINT (Public Switched Telephone Network [PSTN]-Internet Networking) server (2, 3). A PINT server acts as a gateway between an IP network from which comes a request to place a telephone call and a telephone network that executes the call by connecting to the destination telephone. In this example, we assume that the PINT server logic is colocated with the location service. It could also be the case that Bob is attached to the Internet rather than a PSTN, in which case the equivalent of PINT logic is needed to handle SUBSCRIBE requests. In this example, we assume the latter and assume that the PINT functionality is implemented in the location service. In any case, the location service authorizes subscription by returning an OK message (4), which is passed back to Alice (5, 6). The location service then immediately sends a NOTIFY message with Bob's current status of not signed in (7, 8, 9), which Alice's UAC acknowledges (10, 11, 12).
Figure 4 continues the example of Figure 3. Bob signs on by sending a REGISTER message to the proxy in its domain (1). The proxy updates the database at the location service to reflect registration (2). The update is confirmed to the proxy (3), which confirms the registration to Bob (4). The PINT functionality learns of Bob's new status from the location server (here we assume that they are colocated) and sends a NOTIFY message containing Bob's new status (5), which is forwarded to Alice (6, 7). Alice's UAC acknowledges receipt of the notification (8, 9, 10).
Figure 4: SIP Registration and Notification Example
As was mentioned, SIP is a text-based protocol with a syntax similar to that of HTTP. There are two different types of SIP messages, requests and responses. The format difference between the two types of messages is seen in the first line. The first line of a request has a method, defining the nature of the request and a Request-URI, indicating where the request should be sent. The first line of a response has a response code. All messages include a header, consisting of a number of lines, each line beginning with a header label. A message can also contain a body such as an SDP media description.
For SIP requests, RFC 3261 defines the following methods:
|REGISTER: Used by a user agent to notify a SIP configuration of its current IP address and the URLs for which it would like to receive calls|
|INVITE: Used to establish a media session between user agents|
|ACK: Confirms reliable message exchanges|
|CANCEL: Terminates a pending request, but does not undo a completed call|
|BYE: Terminates a session between two users in a conference|
|OPTIONS: Solicits information about the capabilities of the callee, but does not set up a call|
For example, the header of message (1) in Figure 2 might look like the following:
INVITE sip:email@example.com SIP/2.0
Via: SIP/2.0/UDP 22.214.171.124:5060
To: Bob <sip:firstname.lastname@example.org
From: Alice <sip:email@example.com;tag=1928301774
CSeq: 314159 INVITE
The first line contains the method name (INVITE), a SIP URI, and the version number of SIP that is used. The lines that follow are a list of header fields. This example contains the minimum required set.
The Via headers show the path the request has taken in the SIP configuration (source and intervening proxies), and are used to route responses back along the same path. As the INVITE message leaves, there is only the header inserted by Alice. The line contains the IP address (126.96.36.199), port number (5060), and transport protocol (UDP) that Alice wants Bob to use in his response.
The Max-Forwards header limits the number of hops a request can make on the way to its destination. It consists of an integer that is decremented by one by each proxy that forwards the request. If the Max-Forwards value reaches 0 before the request reaches its destination, it is rejected with a 483 (Too Many Hops) error response.
The To header field contains a display name (Bob) and a SIP or SIPS URI (sip:firstname.lastname@example.org) toward which the request was originally directed. The From header field also contains a display name (Alice) and a SIP or SIPS URI (sip:email@example.com) that indicate the originator of the request. This header field also has a tag parameter that contains a random string (1928301774) that was added to the URI by the UAC. It is used to identify the session.
The Call-ID header field contains a globally unique identifier for this call, generated by the combination of a random string and the host name or IP address. The combination of the To tag, From tag, and Call-ID completely defines a peer-to-peer SIP relationship between Alice and Bob and is referred to as a dialog.
The CSeq or Command Sequence header field contains an integer and a method name. The CSeq number is initialized at the start of a call (314159 in this example), incremented for each new request within a dialog, and is a traditional sequence number. The CSeq is used to distinguish a retransmission from a new request.
The Contact header field contains a SIP URI for direct communication between user agents. Whereas the Via header field tells other elements where to send the response, the Contact header field tells other elements where to send future requests for this dialog.
The Content-Type header field indicates the type of the message body. The Content-Length header field gives the length in octets of the message body.
The SIP response types defined in RFC 3261 are in the following categories:
|Provisional (1xx): The request was received and is being processed.|
|Success (2xx): The action was successfully received, understood, and accepted.|
|Redirection (3xx): Further action needs to be taken in order to complete the request.|
|Client Error (4xx): The request contains bad syntax or cannot be fulfilled at this server.|
|Server Error (5xx): The server failed to fulfill an apparently valid request.|
|Global Failure (6xx): The request cannot be fulfilled at any server.|
For example, the header of message (13) in Figure 2 might look like the following:
SIP/2.0 200 OK
Via: SIP/2.0/UDP server10.biloxi.com
Via: SIP/2.0/UDP bigbox3.site3.atlanta.com
Via: SIP/2.0/UDP 188.8.131.52:5060
To: Bob <sip:firstname.lastname@example.org;tag=a6c85cf
From: Alice <sip:email@example.com;tag=1928301774
CSeq: 314159 INVITE
The first line contains the version number of SIP that is used and the response code and name. The lines that follow are a list of header fields. The Via, To, From, Call-ID, and CSeq header fields are copied from the INVITE request. (There are three Via header field valuesone added by Alice's SIP UAC, one added by the atlanta.com proxy, and one added by the biloxi.com proxy.) Bob's SIP phone has added a tag parameter to the To header field. This tag is incorporated by both endpoints into the dialog and is included in all future requests and responses in this call.
Session Description Protocol
The Session Description Protocol (SDP), defined in RFC 2327, describes the content of sessions, including telephony, Internet radio, and multimedia applications. SDP includes information about :
|Media streams: A session can include multiple streams of differing content. SDP currently defines audio, video, data, control, and application as stream types, similar to the MIME types used for Internet mail.|
|Addresses: SDP indicates the destination addresses, which may be a multicast address, for a media stream.|
|Ports: For each stream, the UDP port numbers for sending and receiving are specified.|
|Payload types: For each media stream type in use (for example, telephony), the payload type indicates the media formats that can be used during the session.|
|Start and stop times: These apply to broadcast sessions, for example, a television or radio program. The start, stop, and repeat times of the session are indicated.|
|Originator: For broadcast sessions, the originator is specified, with contact information. This may be useful if a receiver encounters technical difficulties.|
Although SDP provides the capability to describe multimedia content, it lacks the mechanisms by which two parties agree on the parameters to be used. RFC 3264  remedies this lack by defining a simple offer/answer model, by which two parties exchange SDP messages to reach agreement on the nature of the multimedia content to be transmitted.
 T. Berners-Lee, R. Fielding, and L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax," RFC 2396, August 1998.
 S. Borthick, "SIP Services: Slowly Rolling Forward," Business Communications Review, June 2002.
 S. Borthick, "SIP for the Enterprise: Work in Progress," Business Communications Review, February 2003.
 M. Handley and V. Jacobson, "SDP: Session Description Protocol," RFC 2327, April 1998.
 S. Petrack and L. Conroy, "The PINT Service Protocol: Extensions to SIP and SDP for IP Access to Telephone Call Services," RFC 2848, June 2000.
 J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: Session Initiation Protocol," RFC 3261, June 2002.
 J. Rosenberg and H. Schulzrinne, "An Offer/Answer Model with the Session Description Protocol," RFC 3264, June 2002.
 H. Schulzrinne and J. Rosenberg, "The Session Initiation Protocol: Providing Advanced Telephony Access Across the Internet," Bell Labs Technical Journal, October-December 1998.
 Figures 2 through 4 are adapted from ones developed by Professor H. Charles Baker of Southern Methodist University.
WILLIAM STALLINGS is a consultant, lecturer, and author of over a dozen books on data communications and computer networking. He also maintains a computer science resource site for CS students and professionals at http://www.WilliamStallings.com/StudentSupport.html. He has a PhD in computer science from M.I.T. His latest book is Computer Networks, with Internet Protocols and Technology (Prentice Hall, 2003). His home in cyberspace is http://www.WilliamStallings.com and he can be reached at firstname.lastname@example.org