May 1995
An abridged version of this paper was presented at Engineering InterOp,
Las Vegas, March 1995.
1.0 Introduction
2.0 ATM Network Operation
3.0 ATM Signaling and Addressing
4.0 ATM Routing Protocols
5.0 LAN Emulation
6.0 Native Mode Protocols
7.0 Multiprotocol Over ATM
8.0 Wide Area Network Internetworking
9.0 Conclusions
10.0 References
Appendix A: A Survey of ATM Traffic Management
Appendix B: Status of Key ATM Standards and Specifications
These benefits, however, come at a price. Contrary to common misconceptions, ATM is a very complex technology, perhaps the most complex ever developed by the networking industry. While the structure of ATM cells and cell switching do facilitate the development of hardware intensive, high performance ATM switches, the deployment of ATM networks requires the overlay of a highly complex, software intensive, protocol infrastructure. This infrastructure is required to both allow individual ATM switches to be linked into a network, and for such networks to internetwork with the vast installed base of existing local and wide area networks.
This paper is a survey of this protocol infrastructure. It starts by discussing the unique features of ATM networks -- such as its connection oriented nature, which contributes to the complexity of ATM protocols. The fact that ATM is connection oriented implies the need for ATM specific signaling protocols and addressing structures, as well as protocols to route ATM connection requests across the ATM network. These ATM protocols, in turn, influence the manner in which existing higher layer protocols can operate over ATM networks. The latter can be done in a number of different ways, each with its own advantages and characteristics, which will be discussed.
The remainder of this paper is organized as follows:
This paper assumes familiarity with the fundamentals of ATM technology, including the ATM layer protocols and cell formats, and the operation of ATM switching systems. Many sources are available which describe these aspects of ATM systems -- [McDysan], [Minoli], and [Prycker] are good sources for such background information.
Many of the protocols described in this paper were still under development, as of the time of writing, and aspects of their operation may change by the time the protocols are finalized. Consult the latest versions of the referenced specifications for the most current information.
Figure 1: ATM Network Interfaces
*2* ATM does not have an analog of the redundant physical links provided by FDDI, with its dual attached stations. Hence any end-system requiring a redundant connection to an ATM network will need to support two separate UNIs, and either operate one link in a standby mode, or perform local connection level load sharing between the links.
*3* In NNI cells, unlike UNI cells, there is no Generic Flow Control (GFC) field, and the first four bits of the cell are used by an expanded (12 bit) VPI field. Since the GFC is rarely used, however (its use is not defined, for instance, in the ATM Forum UNI specifications), there is, in practice, no functional difference between UNI and NNI cells, other than in the fact that the latter can support a larger VPI space.
*4* For this reason, the connection between a private ATM switch and a public ATM switch is a UNI -- known as a Public UNI -- since these switches do not typically exchange NNI information (refer to Section 4.5).
As noted above, ATM networks are fundamentally connection oriented. This means that a virtual circuit needs to be set up across the ATM network prior to any data transfer. ATM circuits are of two types: virtual paths, identified by virtual path identifiers (VPI); and virtual channels, identified by the combination of a VPI and a virtual channel identifier (VCI). A virtual path is a bundle of virtual channels, all of which are switched transparently across the ATM network on the basis of the common VPI. All VCI and VPI, however, have only local significance across a particular link, and are remapped, as appropriate, at each switch. In normal operation, switches allocate all UNI connections within VPI=0; the use of other virtual paths is discussed later in this paper.
The basic operation of an ATM switch is very simple: to receive a cell across a link on a known VCI or VPI value; to look up the connection value in a local translation table to determine the outgoing port (or ports) of the connection and the new VPI/VCI value of the connection on that link; and to then retransmit the cell on that outgoing link with the appropriate connection identifiers.
Figure 2: ATM Switch Operations
The switch operation is so simple because external mechanisms set up the local translation tables prior to the transmittal of any data. The manner in which these tables are set up determine the two fundamental types of ATM connections:
Figure 3: Virtual Circuit and Virtual Path Switching
ATM signaling is initiated by an ATM end-system that desires to set up a connection through an ATM network; signaling packets are sent on a well known*5* virtual channel, VPI=0, VCI=5. The signaling is routed through the network, from switch to switch*6*, setting up the connection identifiers as it goes*7*, until it reaches the destination end system. The latter can either accept and confirm the connection request, or can reject it, clearing the connection. Note that because the connection is set up along the path of the connection request, the data also flows along this same path.
Figure 4: Connection Setup through ATM Signaling (SVC)
*6* Strictly, the signaling requests are passed between the signaling or call control processes associated with the switches, and it is these that set- up the connection through the switches. In general, however, for the sake of robustness and performance, most vendors will integrate the call control capability into each switch, rather than supporting them on an off-board processor.
*7* The connection identifiers (that is, VPI/VCI values) for a particular connection are typically allocated, across any given link, by the node to which the request is sent, as opposed to the requesting node. Connection identifiers -- with typically the same VPI/VCI values -- are always allocated in each direction of a connection, but the traffic parameters in each direction can be different; in particular, the bandwidth in one direction could be zero.
In the following section we discuss the ATM signaling protocols, while Section 4.0 discusses the ATM routing protocols that actually route ATM connection requests across the ATM network. Before this, the different types of ATM connection that can be set up, either as SVCs or PVCs are discussed.
There are two fundamental types of ATM connections:
Figure 5: Types of ATM Connections
What is notably missing from these types of ATM connections is an analog to the multicasting or broadcasting*9* capability common in many shared medium LAN technologies such as Ethernet or Token Ring. In such technologies, multicasting allows multiple end systems to both receive data from other multiple systems, and to transmit data to these multiple systems. Such capabilities are easy to implement in shared media technologies such as LANs, where all nodes on a single LAN segment must necessarily process all packets sent on that segment. The obvious analog in ATM to a multicast LAN group would be a (bidirectional) multipoint-to- multipoint connection. Unfortunately, this obvious solution cannot be implemented when using AAL5, the most common ATM Adaptation Layer (AAL) used to transmit data across ATM networks.
Unlike AAL 3/4*10*, with its Message Identifier (MID) field (see [Forum1]), AAL 5 does not have any provision within its cell format for the interleaving of cells from different AAL5 packets on a single connection. This means that all AAL5 packets sent to a particular destination across a particular connection must be received in sequence, with no interleaving between the cells of different packets on the same connection, or the destination reassembly process would not be able to reconstruct the packets.
*10* Despite the problems that AAL 5 has with multicast support, it is not really feasible to use AAL 3/4 for data transport instead. This is because AAL 3/4 is a much more complex protocol than AAL 5 and would lead to much more complex and expensive implementations; indeed, AAL 5 was developed specifically to replace AAL 3/4. In any case, while the MID field of AAL 3/4 could preclude cell interleaving problems, allowing for bidirectional, multipoint-to-multipoint connections, this would also require some mechanism for ensuring that all nodes in the connection use a unique MID value. There is no such mechanism currently in existence or development; the number of possible nodes within a given multicast group would also be severely limited due to the small size of the MID space.
This is why ATM AAL 5 point-to-multipoint connections can only be unidirectional, for if a leaf node was to transmit an AAL 5 packet onto the connection, it would be received by both the root node and all other leaf nodes. However, at these nodes, the packet sent by the leaf could well be interleaved with packets sent by the root, and possibly other leaf nodes; this would preclude the reassembly of any of the interleaved packets. Clearly, this is not acceptable.
Notwithstanding this problem, ATM does require some form of multicast capability, since most existing protocols, being developed initially for LAN technologies, rely upon the existence of a low-level multicast/broadcast facility. Three methods have been proposed for solving this problem:
Figure 6: Multicast Server Operation
*12* The multicast server could also connect to each of the destinations using point-to-point connections, and replicate the packets before transmission. In general, however, ATM networks can perform replication, through point-to-multipoint connections, much more efficiently.
Figure 7: Multicast Through Overlaid Point-to-Multipoint Connections
The last mechanism requires each node to maintain N connections for each group, where N is the total number of transmitting nodes within the group, while the multicast server mechanism requires only two connections. This mechanism also requires a registration process for telling nodes that join a group what the other nodes in the group are, so that it can form its own point-to-multipoint connection. The other nodes (see below) also need to know about the new node so they can add the new node to their own point-to- multipoint connections. The multicast server mechanism is more scalable in terms of connection resources, but has the problem of requiring a centralized resequencer, which is both a potential bottleneck and a single point of failure.
In short, there is, as yet, no ideal solution within ATM for multicast. Higher layer protocols within ATM networks use both the latter two solutions for multicast, as will be discussed later in this paper. This is one example of why internetworking existing protocols with ATM is so complex. Most current protocols, particularly those developed for LANs, implicitly assume a network infrastructure very similar to existing LAN technologies -- that is, a shared medium, connectionless technology with implicit broadcast mechanisms. As noted above, ATM violates all of these assumptions. In later sections the mechanisms used to work around these problems will be discussed.
Before proceeding, this brief survey of ATM networking will conclude with a mention of the Interim Local Management Interface (ILMI) protocol. The ILMI protocol uses SNMP format packets across the UNI (and also across NNI links, as discussed later) to access an ILMI Management Information Base (MIB) associated with the link, within each node. The ILMI protocol is run across a well known virtual channel, VPI=0, VCI=16. The ILMI protocol allows adjacent nodes to determine various characteristics of the other node -- for example, the size of each other's connection space, the type of signaling used, hooks for network management autodiscovery, and so on. One of its most useful features, address registration, greatly facilitates the administration of ATM addresses and is discussed in the next section. The ILMI will likely be extended in the future to support other autoconfiguration capabilities, such as for group addressing, as discussed later.
The UNI 3.1 specification is based upon Q.2931, a public network signaling protocol developed by the International Telecommunications Union-Telecommunications Sector*14* (ITU-T), which, in turn, was based upon the Q.931 signaling protocol used with Narrowband ISDN (N-ISDN). The ATM signaling protocols run on top of a Service Specific Convergence Protocol (SSCOP), defined by the ITU-T Recommendations Q.2100, Q.2110, and Q.2130. This is a data link protocol that guarantees delivery through the use of windows and retransmissions*15*.
*14* Known formerly as the CCITT.
*15* Note that in general, ATM does not offer an assured service -- cells are not retransmitted by ATM devices upon loss, for instance, since it is assumed that higher layers (such as TCP) will handle reliable delivery, if this is what the application requires. This also makes ATM devices much simpler, faster, and cheaper. Refer to [Partridge3] for a discussion of reliable delivery in ATM networks. ATM signaling requires the assured delivery guarantees of SSCOP since it does not run on any standard higher layer protocol like TCP, and the signaling state machines can be made much simpler if assured delivery can be assumed.
ATM signaling uses the 'one-pass' method of connection set-up, which is the model used in all common telecommunications networks (e.g. the telephone network). That is, a connection request from the source end-system is propagated through the network, setting up the connection as it goes, until it reaches the final destination end-system. The routing of the connection request -- and hence of any subsequent data flow -- is governed by the ATM routing protocols (e.g. the P-NNI protocols discussed in the following section). Such protocols route the connection request based upon both the destination address, and the traffic and QoS parameters requested by the source end-system. The destination end- system may choose to accept or reject the connection request, but since the call routing is based purely on the parameters in the initial connection request message, the scope for negotiation of connection parameters between source and destination -- which may, in turn, affect the connection routing -- is limited.
A number of message types are defined in the UNI 3.0/3.1 specification, together with a number of state machines defining the operation of the protocol, cause error codes defining reasons for connection failure, and so forth. Data elements used in the signaling protocol -- addresses, for instance -- are carried within Information Elements (IE) within the signaling packets.
In overview, a source end-system wishing to set up a connection will formulate and send into the network, across its UNI, a Setup message, containing the destination end-system address, desired traffic and QoS parameters, various IEs defining particular desired higher layer protocol bindings (see Section 6.2.1) and so forth. This Setup message is sent to the first, ingress switch, across the UNI, which responds with a local Call Proceeding acknowledgment. The ingress switch will then invoke an ATM routing protocol, as discussed in the following section, to propagate the signaling request across the network, to the egress switch to which is attached the destination end-system.
This egress switch will then forward the Setup message to the end-system, across its UNI. The latter may choose to either accept or reject the connection request; in the former case, it returns a Connect message, back through the network, along the same path, to the requesting source end-system. Once the source end-system receives and acknowledges the Connect message, either node can then start transmitting data on the connection. If the destination end-system rejects the connection request, it returns a Release message, which is also sent back to the source end-system, clearing the connection (e.g. any allocated connection identifiers) as it proceeds. Release message are also used by either of the end-systems, or by the network, to clear an established connection.
The ATM Forum greatly simplified the Q.2931 protocol, but also extended it to add support for point-to-multipoint connection set up. In particular, UNI 3.1 allows for a root node to set up a point-to-multipoint connection, and to subsequently add a leaf node. While a leaf node can autonomously leave such a connection, it cannot add itself.
The ATM Forum is currently working on new signaling capabilities, which will be released in the second half of 1995 as part of its UNI 4.0 specification [Forum3]. UNI 4.0 will add support for, amongst other things, leaf-initiated joins to a multipoint connection. While some would like to use this to allow for true multipoint-to-multipoint connections, it should be noted that signaling support for such connections does not imply the existence of a suitable mechanism for such connections. At the time of this writing, it is not clear that UNI 4.0 will have any better solution for multicast within ATM than what exists today.
The most important contribution of UNI 3.0/3.1 in terms of internetworking across ATM was its addressing structure. Any signaling protocol, of course, requires an addressing scheme to allow the signaling protocol to identify the sources and destination of connections. The ITU-T has long settled upon the use of telephone number-like E.164 addresses as the addressing structure for public ATM (B-ISDN) networks. Since E.164 addresses are a public (and expensive) resource, and cannot typically be used within private networks, the ATM Forum extended ATM addressing to include private networks. In developing such a private network addressing scheme for UNI 3.0/3.1, the ATM Forum evaluated two fundamentally different models for addressing.
These two models differed in the way in which the ATM protocol layer was viewed in relation to existing protocol layers, in particular, existing network layer protocols such as IP, IPX, and so on. These existing protocols all have their own addressing schemes and associated routing protocols. One proposal was to also use these same addressing schemes within ATM networks. Hence ATM endpoints would be identified by existing network layer addresses (such as IP addresses), and ATM signaling requests would carry such addresses. Existing network layer routing protocols (such as IGRP and OSPF [Dickie]) would also be used within the ATM network to route the ATM signaling requests, since these requests, using existing network layer addresses, would look essentially look like connectionless packets.
This model was known as the peer model, since it essentially treats the ATM layer as a peer of existing network layers.
Figure 8: Peer Model of ATM Addressing
An alternate model sought to decouple the ATM layer from any existing protocol, defining for it an entirely new addressing structure. By implication, all existing protocols would operate over the ATM network. For this reason, the model is known as the subnetwork or overlay model. This mode of operation is, in fact, the manner in which such protocols as IP operate over such protocols like X.25 or over dial-up lines. The overlay model requires the definition of both a new addressing structure, and an associated routing protocol. All ATM systems would need to be assigned an ATM address in addition to any higher layer protocol addresses it would also support. The ATM addressing space would be logically disjoint from the addressing space of whatever protocol would run over the ATM layer, and typically would not bear any relationship with it. Hence, all protocols operating over an ATM subnet would also require some form of ATM address resolution protocol to map higher layer addresses (such as IP addresses) to their corresponding ATM addresses.
Note that the peer model does not require such address resolution protocols. By using existing routing protocols, the peer model also may have precluded the need for the development of a new ATM routing protocol.
Nonetheless, it was the overlay model that was finally chosen by the ATM Forum for use with UNI 3.0/3.1 signaling. Among other reasons, the peer model, while simplifying end-system address administration, greatly increases the complexity of ATM switches, since they must essentially act like multiprotocol routers and support address tables for all current protocols, as well as all of their existing routing protocols. Current routing protocols, being originally developed for current LAN and WAN networks, also do not map well into ATM or allow use of ATM's unique QoS properties.
Figure 9: Overlay Model of ATM Addressing
Perhaps most importantly, the overlay model, by decoupling ATM from other higher protocol layers, allows each to be developed independently of the other. This is very important from a practical engineering viewpoint -- as will be seen, both ATM and evolving higher layer protocols are extremely complex and coupling their development would likely have slowed the deployment of ATM quite considerably. Though there is a price to pay for such layering, in the need for disjoint address spaces and routing protocols, and in possibly suboptimal end-to-end routing*16*, the practical benefits arguably greatly exceed the theoretical costs.
Given the choice of the overlay model, the ATM Forum then defined an address format for private networks based on the syntax of an OSI Network Service Access Point (NSAP) address. Note, however, that an ATM address is not an NSAP, despite the similar structure; while in common usage such addresses are often referred to as "NSAP addresses," they are better described as ATM private network addresses, or ATM end-point identifiers, and identify not NSAPs, but subnetwork points of attachment.
The 20-byte NSAP format ATM addresses are designed for use within private ATM networks, while public networks typically use E.164 addresses that are formatted as defined by ITU-T. The Forum did specify, however, an NSAP encoding for E.164 addresses. This will be used for encoding E.164 addresses within private networks but may also be used by some private networks. Such networks may base their own (NSAP format) addressing on the E.164 address of the public UNI to which they are connected and take the address prefix from the E.164 number, identifying local nodes by the lower order bits.
All NSAP format ATM addresses consist of three components: an Authority and Format Identifier (AFI), which identifies the type and format of the Initial Domain Identifier (IDI); the IDI, which identifies the address allocation and administration authority; and the Domain Specific Part (DSP), which contains actual routing information. The Q.2931 protocol defines source and destination address fields for signaling requests, and also defines subaddress fields for each; the use of the latter are explored later in this paper.
There are three formats of private ATM addressing that differ by the nature of the AFI and IDI:
Figure 10: ATM Private Network Address Formats
The ATM Forum recommends that organizations or private network service providers use either the DCC or ICD formats to form their own numbering plan. Organizations that want to obtain ATM addresses would do so through the same mechanism used to obtain NSAP addresses (for example, through a local address administration body -- in the US, this is ANSI). Once obtained, such addresses can be used for both ATM addresses and also, if desired, for NSAP addressing*17*.
In real NSAPs, the DSP is typically subdivided into a fixed hierarchy that consists of a Routing Domain (RD), an Area identifier (AREA), and an End System Identifier (ESI). The ATM Forum, however, has combined the RD and AREA fields into a single High-Order DSP (HO-DSP) field, which is then used to support flexible, multi-level addressing hierarchies for prefix-based routing protocols. No rigid boundary exists within the HO-DSP; instead, a range of addressing hierarchies will be supported, using prefix masks, as with IP subnets. This is described in more detail in Section 4.0.
Figure 11: Address Registration Using the ILMI Protocol
The ESI field is specified to be a 48-bit MAC address, as administered by the IEEE. This facilitates the support of both LAN equipment, which is typically hardwired with such addresses, and of such LAN protocols as IPX, which rely on MAC addresses. The final, one octet, Selector (SEL) field is meant to be used for local multiplexing within end-stations and has no network significance.
To facilitate the administration and configuration of ATM addresses into ATM end systems across UNI, the ATM Forum defined an address registration mechanism using the ILMI. This allows an ATM end-system to inform an ATM switch across the UNI, of its unique MAC address, and to receive the remainder of the node's full ATM address in return. This mechanism not only facilitates the autoconfiguration of a node's ATM addressing, but may also be extended, in the future, to allow for the autoconfiguration of other types of information (such as higher layer addresses and server addresses).
Note that the addressing formats defined in UNI 3.0/3.1 identify only single end-points. These can also be used to set up point-to-multipoint connections because in UNI 3.0/3.1 such connections are set up a leaf at a time, using unicast addressing. UNI 4.0 will add support for group addresses, and will permit point-to-multipoint connections to be set up to multiple leaves in one request.
The notion of an anycast address will also be supported in UNI 4.0. An well known anycast address, which may be shared by multiple end systems, is used to used to route a request to a node providing a particular service [Partridge1], and not to identify the particular node per se. A call made to an anycast address is routed to the "nearest" end-system that registered itself with the network to provide the associated service. Anycast is a powerful mechanism for autoconfiguration and operation of networks since it precludes the need for manual configuration or service locations protocols. While few details of ATM group addressing have yet been determined, the ATM Forum has decided that anycast will be addressed as a special case of group addressing.
Specifically, nodes will use an extension of the ILMI address registration mechanism to inform the network that they support a particular group address (note that this is the opposite of the normal address registration mechanism). As part of this registration, the node also informs the network of the desired scope of registration, that is, the extent of the network to which the existence of the multicast node should be advertised (as part of the ATM routing protocols -- see below). This scope is administrative (such as within a single building, within the local site, or within the enterprise network). The network must map this information through administrative policy to the ATM routing protocol's own hierarchy. Once a node has registered its membership within a multicast group, other nodes may set up connections to these nodes.
If the requesting node initiates a point-to-multipoint connection to the group address, the network will connect all nodes that are registered with that particular ATM address. Conversely, if the requesting node specifies a point-to-point connection, the network will set up a connection to the "nearest" registered node. In this way, anycast can be supported as a special case of group addressing, and a new addressing format is not required. However, many details of this procedure, including the format of the group addresses, had yet to be specified as of the time of writing. Routing aspects of group addressing are discussed in Section 4.4.
In practice, the question is moot -- much of the controversy arises both from limitations of the OSI model, and from an incomplete understanding of the complexities of practical network operation. The basic OSI model did not incorporate the concept of overlay networks, where one network layer must overlay another, though such concepts were later added as addenda to the model. As we discussed in the previous section, such a model is often used where one type of network protocol must be carried transparently across another. Today, for instance, such layer 3 protocols as IP and IPX are often carried (tunneled) across other network layer protocols like X.25 -- or the telephone network, for instance -- since this is generally much simpler than attempting to interoperate the protocols through a protocol gateway.
As noted in the previous section, the ATM overlay model was chosen so as to separate and hence facilitate the engineering efforts involved in both completing the ATM layer protocols, as well the efforts needed to modify existing protocols to operate with ATM. The overlay model also simplifies switch operation, at the arguable cost of redundancy in protocol functions and suboptimality in routing. As we will discuss later, the overlay model also leverages the existing installed application base, and facilitates future application portability, since it builds upon and extends today's ubiquitous network layer protocol infrastructure. Such trade-offs were felt by the Forum to be defensible, but in no way detract from the fact that ATM is indeed a full fledged network layer protocol -- one, indeed, that is perhaps at least as complex as any that exists today.
What makes ATM a network layer protocol is indeed the very complexity of its addressing and routing protocols, and this is independent of the fact that other network layer protocols are run over ATM -- indeed, as we will discuss later, the LAN Emulation protocols actually operate a MAC layer protocol over ATM, but this does not make ATM a physical layer.
A related issue that also causes confusion is the notion of "flat addressing" and whether or not ATM can be used to build a "simpler" network, in some sense, than today's network layer protocol based routed internetworks. This issue is coupled to the layering issue discussed above because some, as noted, draw a correspondence between ATM and layer 2 MAC protocols. As it happens, the latter do indeed have a flat address space -- that is, 48 bit MAC addresses -- and it is true that MAC layer internetworking devices -- that is, MAC bridges -- do offer "plug and play" capabilities, and do not require the complex configuration of layer 3 internetworking devices (that is, routers).
This simplicity comes from the fact that since MAC addresses are indeed flat -- that is, they have no logical hierarchy -- packets must be flooded throughout the network, using bridging protocols. While this requires no network configuration, it also greatly reduces the scalability -- and stability -- of such bridged networks. A hierarchical address space, together with address assignment policies that minimize (flat) host routes, permit the use of address aggregation, where reachability for entire sets of end systems can be summarized by a single address prefix (or, equivalently, by subnet masks). Coupled with a routing protocol that disseminates such address prefixes, hierarchical addressing precludes the need for flooding, and greatly reduces the amount of reachability information that must be exchanged.
Protocols with hierarchical, aggregatable address spaces do indeed generally require more configuration for address and subnet assignment, but by the same token this very hierarchy permits the operation of routing protocols, and hence the deployment of much more scalable and stable networks. Flat addressing, by definition, precludes routing and requires bridging, with consequent lack of scalability.
Indeed, very few networks, outside of bridged LANs, actually have a truly flat address space. The telephone network, for instance, which is often thought of as a flat network, actually incorporates a very structured hierarchy within its address space (that is, country code, area code, and so on), and it is only this rigid hierarchy that has permitted the telephone network to scale globally as it has. ATM networks certainly do not have a flat address space -- indeed, as discussed in the previous section, the ATM address space has scope for an unprecedented level of hierarchical structure, and this structure is exploited in the ATM routing protocols we discuss below to support greater degrees of scalability within ATM networks than is possible within any other network.
Much of the discussion about flat addressing and ATM actually revolve around the perception that ATM networks can be made easier to administer than existing layer 3 networks. It is true that, for historical reasons, few efforts were made in the development of many current network layer protocols to facilitate ease of administration, though many such efforts are being made today, for instance as with the Dynamic Host Configuration Protocol (DHCP) [Droms], in the case of IP. Ease of administration argues not for flat addressing, however, but for a systematic focus on supporting autoconfiguration within protocols, as is now being done for the IP Next Generation (IPng or IPv6) protocol. This has been a prime focus for the ATM Forum from its inception, and by building on such mechanisms as the ILMI, most of the protocols developed for ATM, as we will discuss later in the paper, do incorporate such support.
The ATM Forum has an ongoing effort to define a Private NNI (P-NNI) protocol. The goal is to define NNI protocols for use within private ATM networks -- or, more specifically, within networks that use NSAP format ATM addresses. Public networks that use E.164 numbers for addressing will be interconnected using a different NNI protocol stack based upon the ITU-T B-ISUP signaling protocol and the ITU-T MTP Level 3 routing protocol. This work, being carried out by the Broadband Inter-Carrier Interface (B-ICI) subworking group of the ATM Forum [Forum4], and other international standards bodies, is not discussed further in this paper.
The P-NNI protocol consists of two components: the first is a P-NNI signaling protocol used to relay ATM connection requests within the networks, between the source and destination UNI. The UNI signaling request is mapped into NNI signaling at the source (ingress) switch. The NNI signaling is remapped back into UNI signaling at the destination (egress) switch*18*.
The P-NNI protocols operate between ATM switching systems (which can represent either physical switches or entire networks*19* operating as a single P-NNI entity), which are connected by P-NNI links. P-NNI links can be physical links or virtual, "multi-hop" links. A typical example of a virtual link is a virtual path that connects two nodes together. Since all virtual channels, including the connection carrying the P-NNI signaling, would be carried transparently through any intermediate switches between these two nodes on this virtual path, the two nodes are logically adjacent in relation to the P-NNI protocols.
Figure 12: UNI and NNI Signaling
*19* A private ATM network, might use proprietary NNI protocols internally, and use the P-NNI protocol for external connectivity and interoperability.
The ILMI protocol, first defined for use across UNI links, will also be used across both physical and virtual NNI links; enhancements to the ILMI MIBs allow for automatic recognition of NNI versus UNI links, and of private versus public UNI.
The current P-NNI signaling protocol [Cherukuri] being developed by the ATM Forum is an extension of UNI signaling and incorporates additional Information Elements (IE) for such NNI-related parameters as Designated Transit Lists (DTL). P-NNI signaling is carried across NNI links on the same virtual channel, VCI=5, which is used for signaling across the UNI. The VPI value depends on whether the NNI link is physical or virtual.
The second component of the P-NNI protocol is a virtual circuit routing protocol. This is used to route the signaling request through the ATM network. This is also the route on which the ATM connection is set up, and along which the data will flow. The operation of routing a signaling request through an ATM network, somewhat paradoxically, given ATM's connection oriented nature, is superficially similar to that of routing connectionless packets within existing network layer protocols (such as IP). This is due to the fact that prior to connection set up, there is, of course, no connection for the signaling request to follow.
As such, a VC routing protocol can use some of the concepts underlying many of the connectionless routing protocols that have been developed over the last few years. However, the P-NNI protocol is much more complex than any existing routing protocol. This complexity arises from two goals of the protocol: to allow for much greater scalability than what is possible with any existing protocol, and to support true QoS-based routing.
The current state of the P-NNI protocols will be examined by looking at the manner in which the protocol tackles these challenges. It should be noted, however, that the ATM Forum is not currently scheduled to complete the "P-NNI Phase 1" protocol [Forum5] until August 1995. In the interim, the ATM Forum has defined a so called "P-NNI Phase 0" protocol, the Interim Inter-Switch Signaling Protocol (IISP) [Forum6]. This protocol will be examined after the Phase 1 protocol. Finally, multicast routing, how private and public ATM networks internetwork, and implementation considerations for P-NNI are discussed. Note, however, that since the P-NNI Phase 1 Protocol is still under development, the description given here may change before the specification is finalized.
Both the P-NNI Phase 1 protocol, and the IISP protocol, currently only will interface with, and support the capabilities of, UNI 3.0/3.1 signaling. In particular, neither of these protocols will support such aspects of UNI 4.0 signalling as leaf-initiated joins, group addressing, or ABR connection parameter negotiation. Such functionality will be added to the P-NNI protocols as part of a possible future P-NNI Phase 2 protocol specification.
To deliver such QoS guarantees, ATM switches implement a function known as connection admission control (CAC). Whenever a connection request is received by the switch, the switch performs the CAC function. That is, based upon the traffic parameters and requested QoS of the connection, the switch determines whether setting up the connection violates the QoS guarantees of established connections (for example, by excessive contention for switch buffering). The switch accepts the connection only if violations of current guarantees are not reported. CAC is a local switch function, and is dependent on the architecture of the switch and local decisions on the strictness of QoS guarantees.
The VC routing protocol must ensure that a connection request is routed along a path that leads to the destination and has a high probability of meeting the QoS requested in the connection set up -- that is, of traversing switches whose local CAC will not reject the call.
Figure 13: Connection Admission Control
To do this, the protocol uses a topology state routing protocol in which nodes flood QoS and reachability information so that all nodes obtain knowledge about reachability within the network and the available traffic resources within the network. Such information is passed within P-NNI topology state packets (PTSP), which contain various type-length-value (TLV) encoded P-NNI topology state elements (PTSE). This is similar to current link state routing protocols such as OSPF. Unlike these, however, which only have rudimentary support for QoS, the P-NNI protocol supports a large number of link and node state parameters that are transmitted by nodes to indicate their current state at regular intervals, or when triggered by particular events.
There are two types of link parameters: non-additive link attributes used to determine whether a given network link or node can meet a requested QoS; and additive link metrics that are used to determine whether a given path, consisting of a set of concatenated links and nodes (with summed link metrics), can meet the requested QoS.
The current set of link metrics are:
The current set of link attributes are:
All network nodes can obtain an estimate of the current state of the entire network through flooded PTSPs that contain such information as listed above. Unlike most current link state protocols, the P-NNI protocol advertises not only link metrics, but also nodal information. Typically, PTSPs include bidirectional information about the transit behavior of particular nodes based upon entry and exit port, and current internal state. This is particularly important in cases where the node represents an aggregated network (that is, a peer group -- see below). In such a case, the node metrics must attempt to approximate the state of the entire aggregated network. This internal state is often at least as important as that of the connecting links for QoS routing purposes.
The need to aggregate network elements and their associated metric information also has important consequences on the accuracy of such information, as discussed below.
Two approaches are possible for routing a connection through the network: hop-by-hop routing and source routing. Hop-by-hop routing is used by most current network layer protocols such as IP or IPX, where a packet is routed at any given node only to another node -- the "next hop" -- closer to the final destination. In source routing, the initial node in the path determines the entire route to the final destination.
Hop-by-hop routing is a good match for current connectionless protocols because they impose little packet processing at each intermediate node. The P-NNI protocol, however, uses source routing for a number of reasons. For instance, it is very difficult to do true QoS-based routing with a hop-by-hop protocol since each node needs to perform local CAC and evaluate the QoS across the entire network to determine the next hop. Hop-by-hop routing also requires a standard route determination algorithm at each hop to preclude the danger of looping.
However, in a source-routed protocol, only the first node would ideally need to determine a path across the network, based upon the requested QoS and its knowledge of the network state, which is gained from the PTSPs. It could then insert a full source routed path into the signaling request that would route it to the final destination. Ideally, intermediate nodes would only need to perform local CAC before forwarding the request. Also, since it is easy to preclude loops when calculating a source route, a particular route determination algorithm does not need to be standardized, leaving this as another area for vendor differentiation.
This description is only ideal, however. In practice, the source routed path that is determined by a node can only be a best guess. This is because in any practical network, any node can have only an imperfect approximation to the true network state because of the necessary latencies and periodicity in PTSP flooding. As discussed in the next section, the need for hierarchical summarization of reachability information also means that link parameters must also be aggregated. Aggregation is a "lossy" process, and necessarily leads to inaccuracies. Furthermore, as noted above, CAC is a local matter. In particular, this means that the CAC algorithm performed by any given node is both system dependent and open to vendor differentiation.
The P-NNI protocol tackles these problems by defining a Generic CAC (GCAC) algorithm. This is a standard function that any node can use to calculate the expected CAC behavior of another node, given that node's advertised additive link metrics, described above, and the requested QoS of the new connection request. The GCAC is an algorithm that was chosen to provide a good prediction of a typical node-specific CAC algorithm, while requiring a minimum number of link state metrics. Individual nodes can control the degree of stringency of the GCAC calculation involving the particular node by controlling the degree of laxity or conservativeness in the metrics advertised by the node.
The GCAC actually uses the additive metrics described above; indeed these metrics were selected to support the GCAC algorithm chosen for the P-NNI protocol. Individual nodes (physical or logical) will need to determine and then advertise the values of these parameters for themselves, based upon their internal structure and loading. Note, however, that the P-NNI Phase 1 GCAC algorithm is primarily designed for CBR and VBR connections; variants of the GCAC are used depending upon the type of QoS guarantees requested and the types of link metrics available, yielding greater or lesser degrees of accuracy.
The only form of GCAC done for UBR connections is to determine whether a node can support such connections. For ABR connections, a check is made to determine whether the link or node is authorized to carry any additional ABR connections and to ensure that the ACR for the ABR traffic class for the node is greater than the Minimum Cell Rate specified by the connection.
The details of the GCAC are described in [Forum5].
Using the GCAC, a node presented with a connection request (which passes its own CAC) processes the request as follows:
This, however, is not the end of the story. Each node in the path still performs its own CAC on the routed request because its own state may have changed since it last advertised its state within the PTSP used for the GCAC at the source node. Its own CAC algorithm is also likely to be somewhat more accurate than the GCAC. Hence, notwithstanding the GCAC, there is always the possibility that a connection request may fail CAC at some intermediate node. This becomes even more likely in large networks with many levels of hierarchy, since QoS information cannot be accurately aggregated in such cases. To allow for such cases, without excessive connection failures and retries, the P-NNI protocol also supports the notion of crankback.
Crankback is where a connection which is blocked along a selected path is rolled back to an intermediate node, earlier in the path. This intermediate node*23* attempts to discover another path to the final destination, using the same procedure as the original node, but uses newer, or hopefully more accurate network state. This is another mechanism that can be much more easily supported in a source-routed protocol than in a hop-by-hop protocol.
Figure 14: Operation of Crankback
One of the concerns with P-NNI route generation is that most commonly used routing algorithms (such as Dijkstra calculations) were designed for single, cumulative metrics such as link weightings or counts. Since P-NNI uses a number of complex link parameters for link pruning, path selection may often not generate any acceptable paths. In such cases, sophisticated algorithms may use a technique known as fallback, where particular attributes (such as delay) are selectively relaxed, and paths are recalculated in order to find a path that meets some minimal set of desired attributes. In general, path selection, like CAC, is an area with considerable scope for vendor differentiation.
The key to such a scalable protocol is hierarchical network organization, with summarization of reachability information between levels in the hierarchy. Protocols such as OSPF implement such mechanisms, but only implement two level of hierarchy, which is inadequate for very large networks. The P-NNI protocol, however, uses the 20-byte NSAP addresses to identify levels in the network hierarchy to support an almost limitless number of levels: a maximum of 105 (the number of bits in the 13 high-order bytes of the NSAP address, excluding the ESI and SEL fields), though no more than a half dozen or so will likely ever need to be used, and even then only within the very largest, global networks.
To support this hierarchy, the P-NNI model defines a uniform network model at each level of the hierarchy. The P-NNI hierarchical model explains how each level of the hierarchy operates, how multiple devices or nodes at one level can be summarized into the higher level, and how information is exchanged between levels. The model is recursive in that the same mechanisms used at one level are also used at the next level.
Each level in the hierarchy consists of a set of logical nodes, interconnected by logical links. At the lowest level, each logical node represents a physical switching system consisting of a single physical switch, or a network of switches that internally operate a proprietary NNI protocol and support the P-NNI protocol for external connectivity. At this lowest level, each switching system must be assigned a unique ATM NSAP address.
Nodes within a given level are grouped into sets known as a peer group. The definition of a peer group is a collection of nodes that all obtain the identical topological database and exchange full link state information with each other. While all nodes within a peer group have complete state information on each other, peer groups cannot be extended too widely since this would lead to excessive PTSP traffic and processing. Hence, peer groups are organized hierarchically and are associated with a higher level parent peer group.
Within its parent peer group, each peer group is represented, by default, as a single logical node, known as the logical group node. Within the parent peer group, the logical group node acts as a normal node, exchanging PTSPs with the other nodes within the parent peer group. The peer groups represented by logical group nodes within a parent group are known as the child peer groups of that group.
Figure 15: The P-NNI Network Hierarchy Model
Normally, peer groups are identified by strict prefixes of private ATM addresses. At the lowest level, where switching systems consist of actual switches, and where by default, all end systems connected to a switch obtain their network address prefix from that of the switch (which implies that end system reachability defaults to switch reachability), the default peer group ID is the high 12 order bytes of the switch NSAP address. This allows for up to 256 switches within this lowest level peer group, without requiring any manual configuration of peer group IDs of the switches or configuration of the end systems.
At higher levels, the default for a peer group ID is a prefix on a lower level peer group ID. The peer group ID of a parent must be shorter than the prefix of its child peer group ID; this makes it easy to determine the relationship between two peer groups, and precludes the formation of a peer group hierarchy loop. Hence, the peer group ID becomes smaller as the hierarchical level becomes larger.
Nodes within a peer group are identified by a 22-byte node identifier. At the lowest level, this is essentially the same as the switching system's ATM address. At higher levels, the node ID (which now identifies logical group nodes) includes two level indicators that indicate the hierarchical level (that is, prefix length) of both the associated peer group and the child peer group, plus the peer group ID.
In addition to nodes, the P-NNI protocol also requires that links be identified since links between peer groups need to be identified in PTSPs and may also be optionally specified in DTLs. Since ATM link attributes can be asymmetrical (since connections may be asymmetrical), links are identified by a combination of a transmitting node ID and a locally assigned port ID. Nodes exchange such port IDs between themselves (using the Hello protocol discussed below) and hence together identify particular links. In practice, link identification is somewhat more complex, since multiple physical or virtual links*24* may need to be aggregated. (Refer to [Forum5] for more details.)
Each peer group elects a single node*25* within the group to perform the functions of the logical group node. This node, known as the peer group leader (PGL), is selected through an election mechanism and is based upon a "leadership priority" and the switches' node ID. Each PGL is identified by a unique ATM address; if a node acts as a PGL within multiple levels of peer groups, then it must have a unique ATM address at each of those levels.
PGLs within each peer group have the responsibility of formulating*26* and exchanging PTSPs with their peer nodes within the parent peer group to inform those nodes of the child group's reachability and attributes*27*. Similarly, recursive information obtained by the PGL about the parent group and that group's parent groups are then fed down by the PGL into the child group. The child nodes can then obtain knowledge about the full network hierarchy, in order to construct full source routes.
*25* However, the information advertised by the logical group node is a function of the state of the entire peer group, and is hence independent of the identity of the PGL.
*26* This also requires the PGL to determine, based upon the PTSPs exchanged within the peer group, and local (unspecified) algorithms, the corresponding link state parameters for the entire aggregated peer group.
*27* This does not mean, however, that PGLs need to process all requests traversing the peer group -- this is done only by the border nodes of the peer group through which a connection request enters and leaves the peer group, and the intermediate switches connecting the two, as described below. A border node, however, could also act as a PGL.
Note, however, that the information that is fed down from the top level peer group all the way to the lowest level groups represent more and more aggregated (summarized) information. Hence, at the lowest level, the nodes will have full information about its own peer group, aggregated information about its parent group, more aggregated information about its "grandparent" group, and so forth. In order for PGLs to communicate with each other, however, they must have reachability information about the way in which the peer groups are linked together. This information is gathered by the P-NNI bootstrap procedure, using the P-NNI Hello protocol operating across P-NNI links.
P-NNI Links -- be they physical or virtual -- are further categorized within the P-NNI model. Horizontal, or inside, links connect two nodes within the same peer group. Exterior links connect nodes within a peer group to other exterior nodes that do not operate the P-NNI protocol. Outside links connect together two border nodes within two different peer groups, where border nodes are those nodes within a peer group that have links to nodes -- "outside neighbors" -- within other peer groups.
Nodes first discover each other through a P-NNI Hello protocol in which nodes exchange Hello packets at regular intervals*28* with their immediate neighbor nodes.
If two neighbors discover that they are within the same peer group, by comparison of their peer group IDs, they start to send PTSPs to each other and synchronize their reachability databases. Once the nodes have synchronized their databases, they flood PTSPs throughout the peer group (i.e. across horizontal links) to ensure rapid convergence.
The P-NNI Hello packets and PTSPs are sent on a well known virtual channel, VCI=18 within VPI=0 for physical links, and within the appropriate VPI value for logical links. Mechanisms such as flooding, sequence numbers, "lock-step" acknowledgments, and checksums are used (instead of an ATM-specific data link protocol, such as SSCOP) to ensure reliable and timely delivery of PTSPs. As with other link state protocols, PTSPs are sent at regular intervals or when triggered by a significant event*29* (such as a quantum of change within bandwidth allocation on a link).
*29* Specifically, a PTSP is triggered by a significant change in any topology information group (TIG), of which six are currently defined: nodal information, internal reachable ATM addresses, external reachable ATM addresses, pairwise nodal metrics, horizontal links, and uplinks. A "hold-down" timer is used to ensure that PTSP are not sent at unacceptable high rates. The P-NNI specification defines what a "significant" change is for each of the particular TIGs -- refer to [Forum5] for more details.
Two border nodes will also discover each other, across an outside link, through the Hello protocol, which will show that the two nodes have different peer IDs. Two border nodes exchange peer ID information across an outside link to determine the lowest level at which the ancestors of the two nodes are themselves peers (i.e. the two nodes must, by definition, have in common some ancestor, be it a parent, grandparent, etc.). Each border node then determines that the outside link is an uplink to that outside ancestor peer group. The two border nodes exchange metric information about the outside link in the Hello protocol, then advertise the uplink, and its characteristics, throughout their respective peer groups using PTSP.
At higher levels of the P-NNI hierarchy, multiple outside links may be aggregated together into fewer logical uplinks, but information about the binding between logical uplinks and their constituent outside links must be advertised so that nodes can map a logical inter-peer group link into a physical link.
Border nodes also exchange information about the PGLs of their own peer groups. This allows the PGLs of groups that discover that they are within the same parent peer group to set up connections to each other, across the identified uplinks, and start exchanging their own Hellos and PTSPs. They then discover the existence of yet higher level peer groups, until all nodes discover their entire network hierarchy. Through fed-down PTSPs, containing summarized reachability and uplink information, the PGLs discover full network state. A full example of P-NNI bootstrapping and discovery is given in [Forum5] and [Swallow].
Once full state information is obtained by all nodes, they can then use this to route signaling requests. When a signaling request is received across a UNI by an ingress switch -- the DTL originator -- the switch will use a shortest path algorithm, such as a Dijkstra calculation, to determine one or more paths that connect the source node to the desired destination, using the algorithm described in the previous section. This calculation will create a hierarchically complete source route, that is, a set of DTLs, which will have: a full, detailed path within the source node's own peer group; a less detailed path within the parent peer group; and even less detail on higher level peer groups, terminating in the lowest level peer group*30* which is an ancestor of both the source and destination nodes.
These DTLs are arranged in a stack within the P-NNI signaling request where each DTL contains the path elements for one level in the hierarchy. This comprises a list of node and, optionally, link IDs, together with a pointer that indicates which element in the list is to processed next. Within a given peer group, that peer group's DTL is processed by nodes until it reaches a node that is a border node to the next peer group on the path. At this point, the DTL of that peer group is exhausted, since the final element in that DTL is the ID of the border node. The border node then removes that DTL, notes that the next DTL points to the neighbor peer group (possibly at a different level in the hierarchy), and forwards it to its peer border node within that neighbor peer group.
Once the request arrives at that border node within that neighbor peer group, that node discovers that the request must be routed through that node's peer group. Typically, however, the original DTL only has aggregated information about this neighbor peer group. The border node then constructs one or more new DTLs, describing how to route the request through its peer group and "pops" it onto the top of the stack of DTLs. In this way, the request is forwarded to a border node within this peer group, which performs a similar function for the next peer group in the path, and so on, until the final destination peer group is reached.
At this point, the (ingress) border node will construct a DTL that routes the request to the switch on which the destination end system is attached. There, the final switch -- the DTL terminator -- re-maps the request into UNI signaling and forwards it across the appropriate UNI link. DTLs are hence only created by the source node and by border nodes. Other intermediate nodes only process DTLs and move the DTL pointer forward and pass the request to the next node on the path.
Figure 16: DTL Processing in Connection Setup
Crankback works within this same mechanism; to make the previous description more precise, connections can only be cranked back to nodes that actually create and insert DTLs into a request -- the original source node, or ingress border nodes. Such nodes maintain state information about all requests that they have forwarded until the connection set up is confirmed, or a connection reject is received from the destination end system. If, however, an intermediate node rejects the call (for example, due to local CAC), then the call is rerouted back along the path that it followed to that node to the last node to insert a DTL. If possible, this node then recalculates a new path across its own peer group, avoiding the node that rejected the call, and re-forwards the request.
Good examples of the operation of both P-NNI routing and crankback are given in [Forum5] and are highly recommended, since a proper description of the P-NNI procedures is outside the scope of this paper.
While the procedures outlined here can be scaled to very large networks, it should be noted that the aggregation used to ensure such scalability also fundamentally works against the QoS routing properties of ATM. This is because the QoS metrics discussed in the previous section must also be aggregated to match the aggregation of network topology inherent in the network hierarchy; aggregation, however, is a fundamentally "lossy" process. At the lowest level, such metrics might yield information about the state of particular switch and link combinations. At higher levels, the same metrics must attempt to approximate the "average" state of entire networks, which consists of many individual switches.
Clearly such aggregated information will be much less accurate than information about individual switches. This problem is exacerbated by the fact that at higher levels entire peer groups are represented by single nodes (that is, logical group nodes). Advertising metrics about such nodes imply an assumption about the symmetry and compactness of the topology of the child peer group and its traffic flows, which is very unlikely to be accurate in practice.
To ameliorate this problem, the P-NNI protocol allows a peer group to be modeled at higher levels, for advertising purposes, not as a single node but as a "complex node," with an internal structure. The Phase 1 P-NNI protocol allows complex nodes to be modeled as a star of nodes that consists of a "pseudo-node" connected to a group of border nodes across "pseudo-links," each with an identical radius*31* for each link parameter. These nodes need not necessarily correspond to any actual physical node, but the hope is that the "radius" advertised for this abstract network better represents the metrics across the actual peer network, than by modeling it by a single node. Modeling peer groups in this fashion require much more information to be advertised and modeled within PTSPs. There are more complex and possibly more accurate ways to model a peer group other than a star (such as a mesh or spanning tree). Future phases of the P-NNI protocol might allow for these alternate models of complex nodes.
In addition to summarized addresses, a number of other elements of reachability information are also carried within PTSP. Routes to external networks, reachable across exterior links, are advertised as external addresses. Peer groups may also include nodes with non-aggregatable addresses, which must also be advertised, as must registered group and anycast addresses. Generally none of these types of information can be summarized, since they fall outside the scope of the default P-NNI address hierarchy.
Note that the scope of advertisement of the group addresses is a function of how the network administrator maps the administrative scope of a registered node to the corresponding P-NNI hierarchy.
The P-NNI protocol also has support for "soft permanent virtual connection" set-up [Grossman]. The latter is a means of setting up PVCs and permanent virtual paths (PVP) using P-NNI procedures. Through network management, a PVC or PVP is established only across the source and destination UNI, but not across the entire network. Then, through network management the first (ingress) switch is instructed to route a connection across the network to the destination (egress switch) using P-NNI. This is done with the usual P-NNI procedures, but hooks in the signaling instruct the destination switch to terminate the connection on the pre-established PVC/PVP, rather than forwarding a UNI signaling request to the destination end-system.
Given the need to use permanent connections (because end-systems do not support signaling, for instance), soft connection set-up is a much more convenient and reliable way to set up such connections rather than using hop-by-hop configuration. This also allows permanent connections to be set up with a specific QoS using the P-NNI procedures.
4.3 The IISP Protocol
While the P-NNI Phase 1 protocol is extremely powerful, it is also quite
complex. For this reason, the ATM Forum's work on the protocol is unlikely to
be completed until the second half of 1995. Actual interoperable
implementations are unlikely to be widely deployed until well into 1996. For
instance, as of the time of writing, many vendors currently had yet to fully
roll out implementations of UNI 3.0 signaling, despite the fact that this
standard was completed in September 1993. Clearly, the P-NNI Phase 1 protocol
is much more complex than UNI 3.0.
Unfortunately, without a P-NNI protocol, there is no standard way for users to build interoperable multivendor ATM networks. Many users are not willing to wait until 1996 for such interoperability since they have pressing needs to test multiple vendor's switches within the ATM test beds that they are currently running. To solve this short-term protocol, Cisco Systems proposed to the ATM Forum that it develop a very simple, UNI-based signaling protocol for switch interoperability [Alles1].
Originally designated the P-NNI Phase 0 protocol, this was later renamed the Interim Inter-Switch Signaling Protocol (IISP) to avoid confusion with the P-NNI Phase 1 protocol. This protocol was recently completed and approved by the ATM Forum [Forum6]. The IISP, as the name suggests, is essentially a signaling protocol for inter-switch communication. Given the fact that the UNI 3.0/3.1 signaling procedures are essentially symmetrical, it uses UNI signaling for switch-to-switch communication, with nodes arbitrarily taking the role of the network and user side across particular switch-to-switch links (known as IISP links).
Signaling requests are routed between switches using configured address prefix tables within each switch, which precludes the need for a VC routing protocol. These tables are configured with the address prefixes that are reachable through each port on the switch. When a signaling request is received by a switch, either across a UNI or an IISP link, the switch checks the destination ATM address against the prefix table and notes the port with the longest prefix match. It then forwards the signaling request across that port using UNI procedures.
The IISP protocol is very simple and does not require modification to UNI 3.0/3.1 signaling or any new VC routing protocol. It can leverage current development efforts on UNI signaling and hence can be deployed very quickly. The IISP, however, does not have anywhere near the same scalability as the Phase 1 protocol. For instance, manually configuring prefix tables limits its applicability to networks with only a small number of nodes. This is adequate for now, given that most ATM switches today are deployed in small test beds and not in large scale production networks.
IISP implementations will not be interoperable with P-NNI Phase 1 implementations*32* because IISP only uses UNI and not NNI signaling. Users will need to upgrade their switches when P-NNI Phase 1 becomes available. This was deliberately done to simplify the specification and accelerate the deployment of IISP, and to emphasize its interim nature.
The IISP also does not support QoS-based routing, although nodes may implement CAC; it does not support crankback, though nodes can be configured with redundant or alternate paths (the selection of such paths being a local matter). These limitations of the IISP, however, are not as restrictive as might first be imagined. While the Phase 1 protocol has extensive support for QoS routing, this is required only for routing VBR and CBR connections, where end systems can request a specific QoS. End systems that request either Unspecified Bit Rate (UBR) or Available Bit Rate (ABR) connections, however, can specify only very limited QoS capabilities. As such, the P-NNI protocol metrics do not apply to such connections and must be routed using some other criteria -- such as shortest path*33*.
*33* Some have proposed that the P-NNI protocol should attempt some sort of network load balancing for UBR and ABR connections by routing such connections along paths with the smallest number of such pre-established connections. It is not clear what benefits this would provide since one link may have a large number of such connections, each of which uses little bandwidth; another link may have a few such connections that use very large amounts of bandwidth.
Most data traffic on ATM networks will likely use UBR or ABR connections in the short to medium term, since higher layer protocols cannot specify QoS (and hence use VBR connections). Given these factors, it is likely that IISP will be widely deployed prior to the final specification and deployment of the P-NNI Phase 1 protocol, though it will certainly by supplanted by the latter as it becomes available.
The only difference is that the signaling procedures will ensure that no new connections are set up across a link for a particular add-leaf request if a branch of the point-to-multipoint connection already exists across that link. Ideally, a new branch of the tree will be added only at the point "closest" to the new leaf, where the connection must branch off to the new leaf. In terms of the P-NNI Phase 1 operation, this may impact the selection of possible routes during the route pruning phase.
Through this support of point-to-multipoint connections, the P-NNI Phase 1 and IISP protocols will support existing UNI 3.0/3.1 multicast mechanisms such as multicast servers and overlaid point-to-multipoint connections.
With UNI 4.0, support will need to be added for group addressing. Reachability information about registered group addresses can be advertised within PTSP in the Phase 1 protocol, and can be configured within the IISP protocol. This does not address, however, the support of such new UNI 4.0 mechanisms as leaf-initiated joins and the addition of multiple leaves in a single point-to-multipoint connection request. Such issues were deferred by the P-NNI group to a possible Phase 2 effort.
This effort may tackle ways to automatically configure*34* groups of ATM end-points into some form of multicast group, based upon their registration of membership within the multicast group. Support will also be needed for a multicast routing protocol to allow for point-to-multipoint connections to group addresses, since the P-NNI protocols will then need to generate a source rooted tree linking the source to each of the leaves. Such a protocol may build upon such existing multicast protocols as Protocol Independent Multicast (PIM) [Deering2].
Currently, many public network service providers are considering the deployment of public ATM networks, which will offer an ATM interconnect service across public UNI to private ATM systems. In the first instance, it is likely that the service offered across such networks will not be a pure ATM service, but will be ATM-based variants of such existing WAN technologies as Frame Relay or the Switched Multimegabit Data Service (SMDS). These services will be described in Section 8.0. Here, however, we consider private-public ATM internetworking, assuming that the public network does indeed offer a native ATM service.
The first problem likely to be faced with such internetworking is that, for various technical, administrative, and tariffing reasons, it is likely that the majority of initial public ATM services will not support switched virtual connections across public UNI*35*. This is a cause for concern since most private ATM networks primarily use SVCs. A method must be found to at least convey ATM signaling information between two private network switching systems across the public network, even if the public network does not process the signaling information. One way in which this might be done is through a technique known as "Permanent Virtual Path (PVP) tunneling." In this method, two private ATM networks are linked across the public network using a virtual path in which the public network transparently trunks the entire collection of virtual channels in the VP between the two sites.
Signaling requests from one private network at the Public UNI would then be mapped into the appropriate virtual channel (that is, VCI=5) within the VP from the usual (VPI=0, VCI=5) virtual channel by the egress private network switch, and carried transparently across to the ingress switch in the other private network. At this point, this switch would map the signaling request back into the usual channel and propagate it across the destination network. Note that if the two networks were also running the P-NNI (or IISP) protocols, then this PVP across the public network would be treated as a virtual link. Hence the link between the private and public network would simultaneously be a Public UNI and a virtual P-NNI link. The only change PVP tunneling requires in normal node operation is that procedures must be used by the ingress and egress switches to allocate particular channels within the PVP to particular connection requests (as opposed to VPI=0, which is the normal operation), as they are passed.
While PVP tunneling does at least allow for signaling to be passed across the public network, it still requires manual configuration (such as through subscription) of connections across the Public UNI. To eliminate this restriction and permit ubiquitous connectivity (at least within the policy and administrative restrictions imposed by the public network service provider), signaling needs to be supported across the Public UNI. One complexity in doing this, however, is P-NNI internetworking, or the lack therefore, across the Public UNI.
It is likely that most public network service providers will not, in fact, support the P-NNI protocol within their networks, since they usually do not wish to display their internal network structure to users. As discussed above, public networks typically operate only with E.164 numbers, not NSAP format private ATM addresses, and internally run their own NNI protocols. This raises two issues: how private networks can obtain reachability information about the public network and how private network addresses can be carried through the public network.
With respect to the first problem, there have been proposals that variants of border routing protocols such as the Inter-Domain Routing Protocol (IDRP) be used to insert public network connectivity information into P-NNI networks as external routes. Alternatively, it has been proposed that the entire public network could be viewed as a single peer group within the P-NNI hierarchy. In general, however, it is likely that public networks will not offer, at least initially, any kind of reachability information at all to private networks. The likely result is that private networks will treat the public network as a subnetwork and will simply tunnel signal requests across it, much as current network layer protocols run across such networks as X.25 or across dial-up networks.
Such tunneling may use the subaddress fields defined in the UNI signaling procedures. At the egress switch from a private network, prior to forwarding the signaling request across the public network, the egress switch will move the destination NSAP format address into the destination subaddress field and will replace the destination address field with the E.164 address that corresponds to the Public UNI of the switch which provides the ingress to the destination private network*36*; correspondingly, the source NSAP format address will be moved into the source subaddress field, and replaced with the E.164 number of the egress node's Public UNI.
This signaling request will then be forwarded into the public network, which will then route it, using the destination E.164 number, across to the destination public UNI, using internal NNI protocols. At the ingress switch to the destination private network, the ingress switch will move the destination and source NSAP addresses back into the main address fields, and will process the request as normal. Note that this procedure would be needed to make the initial connection, even if the private networks were to subsequently tunnel the P-NNI protocol across the public network.
Figure 17: Address Re-mapping at Public UNI
The remaining issue with this method is how the private network switches obtain the information to map destination NSAP format addresses to the E.164 numbers of the UNI through which they are reachable. In the first instance, this will almost certainly be done through manual configuration, much as is done today for dial-up lines, for instance. In the future, there have been proposals for a public network directory service, which private network nodes could query to obtain such mappings. In general, however, as of the time of writing, there is little consensus on how public network ATM internetworking would be carried out, and it is likely that variants of all of the schemes discussed above will be deployed, depending upon local public network provider policies.
4.5.1 Firewalls
One unresolved issue with regard to any method of public network ATM connectivity is that of firewalls. Firewalls are the logical filters that multiprotocol routers implement today to control and restrict access to particular parts of networks. For instance, they might allow FTP access from the public network into a private network, but might preclude Telnet access. Such firewalls today are integral to network security, and while firewalls are implemented throughout networks, they are most common at connection points to the public network. Firewalls are implemented today in routers, which can process not only the layer 3 header information on packets, but can also look at higher layer fields -- such as TCP port numbers, in order to determine the information needed to implement the firewalls.
It is not at all clear, however, just how, or whether, it might be possible to implement firewalls in an ATM environment. The problem is that once an ATM connection is set up, no intermediate devices generally interpret or process any of the information sent down that connection; doing so would make them not ATM switches but packet switches. Once a connection is set up between two end nodes, any data could be sent down that connection without visibility to network administration. While firewalls or other security mechanisms could be implemented in the end systems, it is not likely to be a practical solution for most end systems.
There have been proposals that firewall filtering within ATM networks should be done at connection set-up time and not on the transmitted data. Special information elements would be defined within the signaling messages to indicate the actual higher layer application binding that the connection wishes to make (for example, to telnet or to FTP). Then the intermediate switches could filter such connection set-ups based on higher layer information, source, and destination addresses, and so on.
ATM address filtering may be of particular use at the boundary between a private ATM network and a public or shared WAN network. Address filtering could be used at such points to allow connections to be made only to and from particular, trusted addresses (e.g. a remote site of the same administration, for instance), and preclude general connectivity. Such firewalls may be of particular use in conjunction with higher level controls (see Section 6.3), though all address based filtering techniques are also vulnerable to spoofing attacks.
While such techniques may have some utility, they are limited by the fact that little prevents an end system from lying about the use to which a connection would be used, since ATM connections generally terminate at lower levels within end system protocol stacks, and not at the actual applications*37*. Therefore, once a connection is set up, a node could send packets of any protocol type down the connection, and have these demultiplexed at the destination to any supported application, regardless of the identity of the application to which the connection was ostensibly set up to.
The only feasible solution to this problem appears to be to add cryptographic based authentication mechanisms to ATM signaling. Some preliminary work on such security mechanisms has been discussed at the ATM Forum, and elsewhere, but it is likely to be some time before they are fully specified or deployed. In the meantime, many network administrators continue to use routers as security firewalls, particularly at public network boundaries, even to connect two ATM networks to each other. While this has clear performance and service limitations, many network administrators often prefer such a solution to eliminating all existing firewall protections.
Given these considerations, it is likely that the ATM switching systems that use commercial processors for P-NNI calculation could only support call-set up rates of a few hundred connections per second, if that. Each of these could experience significant call set up latencies, perhaps exceeding hundreds of milliseconds, within large networks. These ATM routing latencies would be increased by any additional address resolutions that may need to be performed to map higher layer addresses to ATM addresses, as described in the following sections.
To reduce these set up latencies, which could significantly degrade perceived network responsiveness, many services operating over ATM have defined, or may define, default data paths that allow data to be transmitted pending the successful set up of direct data paths, or for the transmission of small amounts of data, the volume of which do not justify the cost and latency of a connection set-up. This characteristic will be noted in many of the higher layer services we describe next.
5.0 LAN Emulation
The following sections will discuss the internetworking of existing protocols
across ATM networks. Given the vast installed base of LANs and WANs today and
the network and link layer protocols operating on these networks, a key to
ATM success will be the ability to allow for interoperability between these
technologies and ATM. Few users will tolerate the presence of islands of ATM
without connectivity to the remainder of the enterprise network. The key to
such connectivity is the use of the same network layer protocols, such as IP
and IPX, on both existing networks and on ATM, since it is the function of
the network layer to provide a uniform network view to higher level protocols
and applications.
There are, however, two fundamentally different ways of running network layer protocols across an (overlay mode) ATM network. In one method, known as native mode operation, address resolution mechanisms are used to map network layer addresses directly into ATM addresses, and the network layer packets are then carried across the ATM network. Native mode protocols will be examined in the next section. The alternate method of carrying network layer packets across an ATM network is known as LAN emulation (LANE). The ATM Forum has recently completed a Phase 1 LAN Emulation specification [Forum7]. This section discusses the rationale for LAN emulation and describes the operation of the protocol.
Figure 18: Methods of ATM Internetworking
As the name suggests, the function of the LANE protocol is to emulate a local area network on top of an ATM network. Specifically, the LANE protocol defines mechanisms for emulating either an IEEE 802.3 Ethernet or an 802.5 Token Ring LAN.*38*
What LAN emulation means is that the LANE protocol defines a service interface for higher layer (that is, network layer) protocols, which is identical to that of existing LANs, and that data sent across the ATM network are encapsulated in the appropriate LAN MAC packet*39* format. It does not mean that any attempt is made to emulate the actual media access control protocol of the specific LAN concerned (that is, CSMA/CD for Ethernet or token passing for 802.5).
*39* The LANE protocol supports a range of maximum packet (MPDU) sizes, corresponding to maximum size Ethernet, and 4 Mbps and 16 Mbps Token Ring packets, and to the value of the default MPDU for IP over ATM (see Section 6.2). Typically the appropriate MPDU will be used depending upon what type of LAN is being emulated -- and is supported on the LAN switches bridged to the ELAN. An ELAN with only native ATM hosts, however, may optionally use any of the available MPDU sizes, even if this does not correspond to the actual MPDU in a real LAN of the type being emulated. All LECs within a given ELAN must use the same MPDU size.
In other words, the LANE protocols make an ATM network look and behave like an Ethernet or Token Ring LAN -- albeit one operating much faster than a real such network.
Figure 19: Physical and Emulated LANs
The rationale for doing this is that it requires no modifications to higher layer protocols to enable their operation over an ATM network. Since the LANE service presents the same service interface of existing MAC protocols to network layer drivers (for example, an NDIS- or ODI-like driver interface), no changes are required in those drivers. The intention is to accelerate the deployment of ATM, since considerable work remains to be done in fully defining native mode operation for the plethora of existing network layer protocols.
It is envisaged that the LANE protocol will be deployed in two types of ATM-attached equipment:
a. ATM Network Interface Cards (NIC): ATM NICs will implement the LANE protocol and interface to the ATM network, but will present the current LAN service interface to the higher level protocol drivers within the attached end system. The network layer protocols on the end system will continue to communicate as if they were on a known LAN, using known procedures. They will, however, be able to use the vastly greater bandwidth of ATM networks.
b. Internetworking and LAN Switching Equipment: The second class of network gear that will implement LANE will be ATM-attached LAN switches and routers. These devices, together with directly attached ATM hosts, equipped with ATM NICs, will be used to provide a virtual LAN service, where ports on the LAN switches will be assigned to particular virtual LANs, independent of physical location [Cisco]. LAN emulation is a particularly good fit to the first generation of LAN switches that effectively act as fast multiport bridges, since LANE is essentially a protocol for bridging across ATM. Internetworking equipment, such as routers, will also implement LANE to allow for virtual LAN internetworking, as will be discussed later.
Note that the LANE protocol does not directly impact ATM switches. LANE, as with most of the other ATM internetworking protocols we will discuss later in this paper, builds upon the overlay model. As such, the LANE protocols operate transparently over and through ATM switches, using only standard ATM signaling procedures. ATM switches may well be used as convenient platforms upon which to implement some of the LANE server components, which we discuss below, but this is independent of the cell relay operation of the ATM switches themselves. This logical decoupling is one of the great advantages of the overlay model, since they allow ATM switch designs to proceed independently of the operation of overlying internetworking protocols, and vice versa.
Figure 20: LANE Protocol Architecture
The basic function of the LANE protocol is to resolve MAC addresses into ATM addresses. By doing so, it actually implements a protocol for MAC bridging on ATM, hence the close fit with current LAN switches. The goal of LANE is to perform such address mappings so that LANE end systems can set up direct connections between themselves and forward data. The element that adds significant complexity to LANE, however, is supporting LAN switches -- that is, LAN bridges. The function of a LAN bridge, as defined in [ISO] and [IEEE], is to shield LAN segments from each other. While bridges learn about MAC addresses on the LAN segments to which they are connected, such information is not propagated. How LANE resolves this problem will be discussed shortly.
Note that while the current LANE specification defines two types of emulated LANs, one for Ethernet, and one for Token Ring, it does not permit direct connectivity between a LEC that implements an Ethernet ELAN and one that implements a Token Ring ELAN. In other words, LANE does not attempt to solve the mixed media bridging problem, which is particularly intractable for Ethernet-to-Token Ring interconnection. Two such ELANs can only be interconnected through an ATM router that acts as a client on each ELAN, as discussed below.
The LANE protocol does not specify where any of the server components described here should be located; any device or devices with ATM connectivity would suffice. For the purposes of reliability and performance, however, it is likely that most vendors will implement these server components on networking equipment, such as ATM switches or routers, rather than on a workstation or host. This also applies to all other ATM server components described in the remainder of this paper.
The LANE protocol specifies only the operation of the LAN Emulation User to Network Interface (LUNI) between a LEC and the network providing the LANE service. This may be contrasted with the "LAN Emulation NNI" (LNNI) interface, which operates between the server components within a single ELAN system. The Phase 1 LANE protocols specify only the LUNI operation; furthermore, the phase 1 LANE protocol does not allow for the standard support of multiple LESs or BUSs within an ELAN. Hence these components represent both single points of failure and potential bottlenecks. The interactions between each of the server components in the LANE Phase 1 protocol are currently left unspecified, and will be implemented in a proprietary manner by vendors.
Figure 21: LANE Protocol Interfaces
The ATM Forum is currently working on a Phase 2 LANE protocol, which will specify LNNI protocols, so as to allow for redundant LESs and replicated BUSs [Alles2], in order to address concerns about these limitations. The LNNI protocols will specify open interfaces between the various LANE server entities -- LES/LES, LES/LECS, and BUS/BUS -- and will allow for hierarchies of BUSs for greater scalability*40* within ELANs. This work is not expected to be completed until 1996, however.
The Phase 1 LANE entities communicate with each other using a series of ATM connections. LECs maintain separate connections for data transmission and control traffic.
The control connections are as follows:
Figure 22: LANE Control Connections
The data connections are as follows:
Figure 23: LANE Data Connections
5.2.1 Initialization and Configuration
Upon initialization (such as power up), the LEC must first obtain its own ATM address (typically, this will be through address registration). The LEC then sets up a configuration-direct connection to the LECS. To do this, the LEC must first find the location of the LECS by either: using a defined ILMI procedure to determine the LECS address; using a well-known LECS address; or using a well-known permanent connection to the LECS (VPI=0, VCI=17).
After finding the location of the LECS, the LEC will establish the configuration-direct VCC to the LECS. Once connected, a configuration protocol is used by the LECS to inform the LEC of the information it requires to connect into its target ELAN. This includes the ATM address of the LES, the type of LAN being emulated, maximum packet size on the ELAN, and the ELAN name (a text string for display purposes). The LECS is generally configured by network management with this information, which effectively indicates which virtual LAN (where a virtual LAN corresponds to an ELAN) to which the LEC belongs.
5.2.2 Joining and Registration
Once the LEC obtains the LES address, it may optionally clear the configuration-direct VCC to the LECS; then it sets up the control-direct VCC to the LES. Once this is done, the LES assigns the LEC with a unique LEC Identifier (LECID). The LEC then registers its own MAC and ATM addresses with the LES. It may optionally also register any other MAC addresses*41* for which it is proxying -- such as learned addresses in the case of spanning tree bridge.
The LES then sets up, back to the LEC, the control-distribute VCC. The control direct and distribute VCCs can then be used by the LEC for the LAN Emulation ARP (LE_ARP) procedure for requesting the ATM address that corresponds to a particular MAC address. To do this, the LEC formulates a LE-ARP and sends it to the LES. If the LES recognizes this mapping (because some LEC registered the relevant MAC address) it may choose to reply directly on the control-direct VCC. If not, it forwards the request on the control- distribute VCC to solicit a response from a LEC that knows the requested MAC address.
The typical reason why the LES would not know a mapping is because the address is "behind" a MAC bridge, and the bridge may not have registered the address*42*. An ATM NIC, on the other hand, would presumably only support one or a small number of MAC addresses, all of which could easily be registered. Typically, any MAC address not known to the LES would be found only in a LEC within a bridge, and not within a NIC, and only the LECs within such devices need necessarily receive re-directed LE-ARPs.
To accommodate this, LECs may register with the LES as a "proxy" node, indicating that it may proxy for other addresses and needs to obtain LE_ARPs. The LES then has the option of setting up the control distribute VCCs so that LE_ARPs are only sent to such proxy LECs -- for example, through two point- to-multipoint connections connecting the LES to all of the proxy nodes, and one to all of the non-proxy nodes. This is not a requirement, however, and the LES may choose to simply distribute the LE_ARP to all LECs.
In any case, if a LEC can respond to a LE_ARP, because it is proxying for that address, it responds to the LES on the control direct VCC. The LES will then forward this response back either only to the requesting LEC, or, optionally, on the control distribute VCC to all LECs*43*, so that all LECs can learn and cache the particular address mapping (and hence perhaps save future LE_ARPs).
*42* Since bridge tables may have thousands of entries that are continuously being learned, aged out, moved, and so on, a bridge typically would only register static entries.
*43* If the LES maintains two control distribute VCCs, one to proxy nodes, and one to non-proxy nodes, it would then need to replicate such responses before forwarding onto each connection.
To complete initialization, a LEC uses this LE_ARP mechanism to determine the ATM address of the BUS. It does this by sending an LE_ARP for the MAC broadcast address to the LES, which responds with the BUS's ATM address. The LEC then sets up the multicast send VCC to the BUS. The BUS, in turn, sets up the multicast forward VCC back to the LEC, typically by adding the LEC as a leaf to a