Guide to Cisco Systems' VoIP Infrastructure Solution for SIP
Chap 1: Overview of the Session Initiation Protocol

Table of Contents

Overview of the Session Initiation Protocol

Overview of the Session Initiation Protocol

This chapter provides an overview of SIP. It includes the following sections:

Introduction to SIP

Session Initiation Protocol (SIP) is the Internet Engineering Task Force's (IETF's) standard for multimedia conferencing over IP. SIP is an ASCII-based, application-layer control protocol (defined in RFC 2543) that can be used to establish, maintain, and terminate calls between two or more end points.

Like other VoIP protocols, SIP is designed to address the functions of signaling and session management within a packet telephony network. Signaling allows call information to be carried across network boundaries. Session management provides the ability to control the attributes of an end-to-end call.

SIP provides the capabilities to:

  • Determine the location of the target end point—SIP supports address resolution, name mapping, and call redirection.

  • Determine the media capabilities of the target end point—Via Session Description Protocol (SDP), SIP determines the "lowest level" of common services between the end points. Conferences are established using only the media capabilities that can be supported by all end points.

  • Determine the availability of the target end point—If a call cannot be completed because the target end point is unavailable, SIP determines whether the called party is already on the phone or did not answer in the allotted number of rings. It then returns a message indicating why the target end point was unavailable.

  • Establish a session between the originating and target end point—If the call can be completed, SIP establishes a session between the end points. SIP also supports mid-call changes, such as the addition of another end point to the conference or the changing of a media characteristic or codec.

  • Handle the transfer and termination of calls—SIP supports the transfer of calls from one end point to another. During a call transfer, SIP simply establishes a session between the transferee and a new end point (specified by the transferring party) and terminates the session between the transferee and the transferring party. At the end of a call, SIP terminates the sessions between all parties.

Conferences can consist of two or more users and can be established using multicast or multiple unicast sessions.

Note   The term conference means an established session (or call) between two or more end points. In this document, the terms conference and call are used interchangeably.

Components of SIP

SIP is a peer-to-peer protocol. The peers in a session are called User Agents (UAs). A user agent can function in one of the following roles:

Typically, a SIP end point is capable of functioning as both a UAC and a UAS, but functions only as one or the other per transaction. Whether the endpoint functions as a UAC or a UAS depends on the UA that initiated the request.

From an architecture standpoint, the physical components of a SIP network can be grouped into two categories: clients and servers. Figure 1-1 illustrates the architecture of a SIP network.

Note   In addition, the SIP servers can interact with other application services, such as Lightweight Directory Access Protocol (LDAP) servers, location servers, a database application, or an extensible markup language (XML) application. These application services provide back-end services such as directory, authentication, and billing services.

Figure 1-1: SIP Architecture

SIP Clients

SIP clients include:

SIP Servers

SIP servers include:

How SIP Works

SIP is a simple, ASCII-based protocol that uses requests and responses to establish communication among the various components in the network and to ultimately establish a conference between two or more end points.

Users in a SIP network are identified by unique SIP addresses. A SIP address is similar to an e-mail address and is in the format of The user ID can be either a user name or an E.164 address.

Users register with a registrar server using their assigned SIP addresses. The registrar server provides this information to the location server upon request.

When a user initiates a call, a SIP request is sent to a SIP server (either a proxy or a redirect server). The request includes the address of the caller (in the From header field) and the address of the intended callee (in the To header field). The following sections provide simple examples of successful, point-to-point calls established using a proxy and a redirect server.

Over time, a SIP end user might move between end systems. The location of the end user can be dynamically registered with the SIP server. The location server can use one or more protocols (including finger, rwhois, and LDAP) to locate the end user. Because the end user can be logged in at more than one station and because the location server can sometimes have inaccurate information, it might return more than one address for the end user. If the request is coming through a SIP proxy server, the proxy server will try each of the returned addresses until it locates the end user. If the request is coming through a SIP redirect server, the redirect server forwards all the addresses to the caller in the Contact header field of the invitation response.

For more information, see RFC 2543—SIP: Session Initiation Protocol, which can be found at

Using A Proxy Server

If a proxy server is used, the caller UA sends an INVITE request to the proxy server, the proxy server determines the path, and then forwards the request to the callee (as shown in Figure 1-2).

Figure 1-2: SIP Request Through a Proxy Server

The callee responds to the proxy server, which in turn, forwards the response to the caller (as shown in Figure 1-3).

Figure 1-3: SIP Response Through a Proxy Server

The proxy server forwards the acknowledgments of both parties. A session is then established between the caller and callee. Real-time Transfer Protocol (RTP) is used for the communication between the caller and the callee (as shown in Figure 1-4).

Figure 1-4: SIP Session Through a Proxy Server

Using a Redirect Server

If a redirect server is used, the caller UA sends an INVITE request to the redirect server, the redirect server contacts the location server to determine the path to the callee, and then the redirect server sends that information back to the caller. The caller then acknowledges receipt of the information (as shown in Figure 1-5).

Figure 1-5: SIP Request Through a Redirect Server

The caller then sends a request to the device indicated in the redirection information (which could be the callee or another server that will forward the request). Once the request reaches the callee, it sends back a response and the caller acknowledges the response. RTP is used for the communication between the caller and the callee (as shown in Figure 1-6).

Figure 1-6: SIP Session Through a Redirect Server

SIP Versus H.323

In addition to SIP, there are other protocols that facilitate voice transmission over IP. One such protocol is H.323. H.323 originated as an International Telecommunications Union (ITU) multimedia standard and is used for both packet telephony and video streaming. The H.323 standard incorporates multiple protocols, including Q.931 for signaling, H.245 for negotiation, and Registration Admission and Status (RAS) for session control. H.323 was the first standard for call control for VoIP and is supported on all Cisco Systems' voice gateways.

SIP and H.323 were designed to address session control and signaling functions in a distributed call control architecture. Although SIP and H.323 can also be used to communicate to limited intelligence end points, they are especially well-suited for communication with intelligent end points.

Table 1-1 provides a brief comparison of SIP and H.323.

Table 1-1: SIP versus H.323
Aspect SIP H.323




Network intelligence and services

Provided by servers (Proxy, Redirect, Registrar)

Provided by gatekeepers

Model used



Signaling protocol


TCP (UDP is optional in Version 3)

Media protocol



Code basis


Binary (ASN.1 encoding)

Other protocols used

IETF/IP protocols, such as SDP, HTTP/1.1, IPmc, and MIME

ITU / ISDN protocols, such as H.225, H.245, and H.450

Vendor interoperability



Although SIP messages are not directly compatible with H.323, both protocols can coexist in the same packet telephony network if a device that supports the interoperability is available.

For example, a call agent could use H.323 to communicate with gateways and use SIP for inter-call agent signaling. Then, after the bearer connection is set up, the bearer information flows between the different gateways as an RTP stream.