The Internet Protocol Journal - Volume 7, Number 2

Content Networks

by Christophe Deleuze

The Internet is constantly evolving, in both usage patterns and underlying technologies. In the last few years, there has been a growing interest in content-networking technologies. Various differing systems can be labelled under this name, but they all share the ability to access objects in a location-independent manner. Doing so implies a shift in the way communications take place on the Internet.

The Classic Internet Model
The Internet protocol stack comprises three layers, shown in Figure 1. The network layer is implemented by IP and various routing protocols. Its job is to bring datagrams hop by hop to their destination host, as identified by the destination IP address. IP is "best effort," meaning that no guarantee is made about the correct delivery of datagrams to the destination host.

The transport layer provides an end-to-end communication service to applications. Currently two services are available: a reliable ordered byte stream transport, implemented by the Transmission Control Protocol (TCP), and an unreliable message transport, implemented by the User Datagram Protocol (UDP).

Figure 1: The Three Layers of the Internet Protocol Stack



Above the transport layer lies the application layer, which defines application message formats and communication semantics. The Web uses a client-server application protocol called Hypertext Transfer Protocol (HTTP) [10].

A design principle of the Internet architecture is the "end-to-end principle," which states that everything that can be done in the end hosts should be done there, and not in the network itself [8]. That is why IP service is so crude, and transport and application layer protocols are implemented only in the end hosts.

Application objects, such as Web pages, files, etc. (we will simply call those "objects") are identified by URLs. (Actually URLs identify "resources" that can be mapped to different objects called "variants." A variant is identified by a URL and a set of request header values, but in order to keep things simple, we will not consider this in the following.) URLs for Web objects have the form http://host:port/path. This means that the server application lives on a host with hostname (or possibly IP address) on port N (with default value of 80), and knows the object under the name path. Thus URLs, as their name implies, tell where the object can be found. To access such an object, a TCP connection is open to the server running on the specified host and port and the object named path is requested.

Content Networks
Content networks aim to provide location-independent access to an object, most commonly because they handle some kind of (possibly dynamic) replication of the objects. By design, URLs are not suited to identify objects available on several places on the network.

Handling such replication and location-independent access usually involves breaking the end-to-end principle at some point. Communication is no more managed end to end: intermediate network elements operating at the application layer (whose most common types are "proxies") are involved in the communication. (Content networks are not the only case where this principle is violated.)

In the same way that IP routers relay IP datagrams (that is, network layer protocol data units), routing them to their destination according to network layer information, those application layer nodes relay application messages, using application layer information (such as content URLs) to decide where to send them. This is often called content routing.

So the goal of a content network is to manage replication, handling two different tasks: distribution ensures the copying and synchronization of the instances of an object from an origin server to various replica servers, and redirection allows users to find on instance of the object (possibly the one closest to them.) (By "replica," we mean any server of any kind other than the origin that is able to serve an instance of the object. This term often has a narrower meaning, not applying, for example, to caching proxies.) This is illustrated in Figure 2.

Figure 2: Elements of a Content Network



Various kinds of content networks exist, differing in the extent to which they handle these tasks and in the mechanisms they use to do so. There are many possible ways to classify them. In this article, we use a classification based on who owns and administers the content network. We thus find three categories: content networks owned by network operators, content providers, and users.

Network Operators' Content Networks
Network operators (also called Internet Service Providers, or ISPs) often install caching proxies in order to save bandwidth [11]. Clients send their requests for objects to the proxy instead of the origin server. The proxy keeps copies of popular objects in its cache and can answer directly if it has the requested object in cache. (To be precise, such a caching proxy does not cache objects, but server responses.) If this is not the case, it gets the object from the origin server, possibly stores a copy in its cache, and sends it back to the client.

This caching proxy scheme can be used recursively, making those proxies contact parent proxies for requests they cannot fulfill from their local store. Such hierarchies of caching proxies actually lead to constructing content-distribution trees. This makes sense if the network topology is tree-like, although there are some drawbacks, including the fact that less popular objects (those not found in any cache) experience delays, which increase with the depth of the tree. Another problem is with origin servers whose closest tree node is not the root.

The Squid caching proxy [5] can be configured to choose the parent proxy to query for a request based on the domain name of the requested URL (or to get the object directly for the origin server). This allows setting up multiple logical trees on the set of proxies, a limited form of content routing.

Such manual configuration is cumbersome, especially because domain names do not necessarily (and actually most do not) match network topology. Thus the administrator must know where origin servers are in the network to use this feature effectively.

The same effects can be achieved, to some extent, in an automatic and dynamic fashion using ICP, the Internet Cache Protocol [16, 15]. ICP allows a mesh of caching proxies to cooperate by exchanging hints about the objects they have in cache, so that a proxy missing an object can find a close proxy that has it. One advanced feature of ICP allows you to select among a mesh of proxies the one that has the smallest Round-Trip Time (RTT) to the origin server.

One design flaw of ICP is that it identifies objects with URLs. We mentioned previously that a URL actually identifies a resource that can be mapped to several different objects called variants. Thus information provided by ICP is of little use for resources that have multiple variants. However, in practice most resources have only one variant, so this weakness does little harm.

Users normally configure their browsers to use a proxy, but automatic configuration is sometimes possible. Multiple proxies can be used by a client with protocols such as the Cache Array Routing Protocol (CARP) [14]. To avoid configuration issues, a common trend is for ISPs to deploy interception proxies. Network elements such as routers running the Cisco Web Cache Communication Protocol (WCCP) [6,7] redirect HTTP traffic to the proxy, without the users knowing. The proxy then answers client requests pretending to be the origin server. This poses numerous problems, as discussed in [12].

Caching proxies have limited support for ensuring object consistency. Either the origin server gives an expiration date or the proxy estimates the object lifetime based on the last modification time, using an heuristic known as adaptive TTL (time to live).

Content Providers' Content Networks
Contrary to ISPs whose main goal is to save bandwidth, content providers want to make their content widely available to users, while staying in control of the delivery (including ensuring that users are not delivered stale objects). We can again roughly classify such content networks in three subcategories:
Server farms: Locally deployed content networks aimed at providing more delivery capacity and high availability of content

Mirror sites: Distributed content networks making content available in different places, thus allowing users to get the content from a close mirror

Content-Delivery Networks (CDNs): Mutualized content networks operated for the benefit of numerous content providers, allowing them to get their content replicated to a large number of servers around the world at lower cost.

Server Farms
Server farms are made of a load-balancing device (we will call it a switch) receiving client requests and dispatching them to a series of servers (the physical servers). The whole system appears to the outside world as a single logical server. The goal of a server farm is to provide scalable and highly available service. The switch monitors the physical servers and uses various load metrics in its dispatching algorithm. Because the switch is a single point of failure, a second switch is usually set up in a hot failover standby mode, as shown in Figure 3.

Figure 3: Server Farm



Some switches are called Layer 4 switches (4 is the number of the transport layer in the OSI Reference Model), meaning they look at network and transport layer information in the first packet of a connection to decide to which physical server the incoming connection should be handed. They establish a state associating the connection with the chosen physical server and use it to relay all packets of the connection. The exact way the packets are sent to the physical servers varies. It usually involves some form of manipulation of IP and TCP headers in the packets (like Network Address Translation [NAT] does) or IP encapsulation. These tricks are not necessary if all the physical servers live on the same LAN.

More complex Layer 7 switches (7 is the number of the application layer in the OSI Reference Model) look at application layer information, such as URL and HTTP request headers. They are sometimes called content switches. On a TCP connection, application data is available only after the connection has been opened. A proxy application on the switch must thus accept the connection from the client, receive the request, and then open another connection with the selected physical server and forward the request. When the response comes back, it must copy the bytes from the server connection to the client connection.

Such a splice of TCP connections consumes much more resources in the switch than the simple packet manipulation occurring in Layer 4 switches. Bytes arrive at one connection and are handed to the proxy application, which copies them to the other connection—all of this involving multiple kernel mode-to-user mode memory copy operations and CPU context switches. Various optimizations are implemented in commercial products. The simplest one is to put the splice in kernel mode. After it has sent the request to the physical server, the proxy application asks the kernel to splice the two connections, and forgets about them. Bytes are then copied between the connections directly by the kernel, instead of being given to the proxy application and back to the kernel.

It is even possible to actually merge the two TCP connections, that is, simply relay packets at the network layer to establish a direct TCP connection between the client and the physical server. This requires manipulating TCP sequence numbers (in addition to addresses and ports) when relaying packets, because the two connections will not have used the same initial sequence numbers. This can be much more complex (or even impossible) to perform if TCP options differ in the two connections.

Mirror Sites
In such a content network, a set of servers are installed in various places in the Internet, and they are defined as mirrors of the master server. Synchronization is most commonly performed periodically (often every night), using FTP or specialized tools such as rsync [4].

Redirection is performed by the users themselves for most sites. The master server, to which the user initially connects, displays a list of mirrors with geographic information and suggests that users choose a mirror close to themselves, by simply clicking on the associated link.

This process can be automated sometimes. One trick is to store the user's choice in a cookie, such that the next time the user connects to the master site, the information provided in the cookie will be used to issue an HTTP redirect (an HTTP server response asking the client to retry the request on a new URL) to the previously selected site.

Other schemes involve trying to find which of the mirrors is closest to the user based on information provided in the user request (such as preferred language) or indicated by network metrics. Such schemes were not very common for simple mirror sites, but today many commercial products allowing for this kind of "global load balancing" are available.

In any case (except if redirection is automatic and Domain Name System [DNS] based—this is discussed in the next section) the URLs of objects change across mirrors.

CDNs
Most content providers cannot afford to own numerous mirror sites. Having servers in different places around the world costs lots of money. Operators of CDNs own a large replication infrastructure (Akamai, the biggest one, claims to have 15,000 servers) and get paid by content providers to distribute their content. By mutualizing the infrastructure, CDNs are able to provide very large reach at affordable costs.

CDN servers do not store entire sites of all the content providers, but rather cache a subset according to local client demand. Such servers are called surrogates. They manage their disk store like proxies do, and serve content to clients like mirrors do (that is, contrary to proxies, they act as the authoritative source for the content they deliver).

Because the number of surrogates can be so large, and because of the argument that "no user configuration is necessary," CDNs typically include complex redirection systems that allow them to perform automatic and user-transparent redirection to the selected surrogate. The selection is based on information about surrogate loads and on network metrics collected by various ways such as routing protocol information, RTTs measured by network probes, etc. The client is made to connect to the selected surrogate either by sending it an HTTP redirect message, or by using the DNS system: when the client tries to resolve the host name of the URL in an IP address to connect to, it is given back the address of the selected surrogate instead. Using the DNS ensures that the URL is the same for all object copies. In this case, CDNs actually turn URLs into location-independent identifiers.

In addition to proxy-like on-demand distribution, content can also be "pushed" in surrogates in a proactive way. Synchronization can be performed by sending invalidation messages (or updated objects) to surrogates.

CDN principles are also being used in private intranets for building Enterprise CDNs (ECDNs).

Users' Content Networks
User-operated content networks are better known as Peer-to-Peer (P2P) networks. In these networks, the costly replication infrastructure of other content networks is replaced by the users, who make some of their storage and processing capacities available to the P2P network. Thus, no big money is needed, and no one has control over the content network.

One advantage P2P networks have over other content networks is that they are usually built as overlay networks and do not strive for transparent integration with the current Web. Thus they are free to build new distribution (some of them allow downloading files from multiple servers in parallel) and redirection mechanisms from scratch, and even to use their own namespace instead of being stuck with HTTP and URLs.

P2P networks basically handle the distribution part of replication in a straightforward way: the more popular an object is, the more users will have a copy of it, thus the more copies of the object will be available on the network. More complex mechanisms can be involved, but this is the basic idea.

The redirection part of replication is more problematic with most current P2P networks. It can be handled by a central directory as in Napster: every user first connects to a central server, updates the directory for locally available objects, and then looks up the directory for locations of objects the user wants to access. Of course, such a central directory poses a major scalability and robustness problem.

Gnutella and Freenet, for example, use a distributed searching strategy instead of a centralized directory. A node queries neighbors that themselves query neighbors, and so on until either one node with the requested object is found or a limit on the resources consumed by the search has been hit. Although there is no single point of failure, such a scheme is no more scalable that the central directory. It seems easy to perform denial-of-service attacks by flooding the network with requests. Additionally, you can never be sure you have found the object even if someone has it.

These examples are primitive and have serious flaws, but much research work is being performed on this topic; refer to [13] for a summary.

Although they are currently used mainly for very specific file-sharing applications, P2P networks do provide new and valuable concepts and techniques. For example, Edge Delivery Network is a commercially available software-based ECDN inspired by Freenet. Various projects use a scatter/gather distribution scheme, useful for very large files: users download several file chunks in parallel from other currently downloading users, thus refraining from using server resources for long periods of time.

Some projects attempt to integrate P2P principles in the current Web architecture and protocols. Examples are [3] and [1].

Conclusion
Current networks have been designed and deployed as ad-hoc solutions of specific problems occurring in the current architecture of the network. Caching proxies lack proper means to ensure consistency, but CDNs tricks the DNS to turn URLs into location-independent identifiers. P2P networks are mostly limited to file-sharing applications.

Content networks implement mechanisms to ensure distribution of content to various locations, and redirection of users to a close copy. They often have to break the end-to-end principle in order to do so, mainly because current protocols assume each object is available in only one statically defined location.

Probably the first step in building efficient distribution and redirection mechanisms for providing an effective replication architecture is the setting up of a proper replication-aware namespace. Applications would pass an object name to a name resolution service and be given back one or more locations for this object. The need for such a location-independent namespace was anticipated a long time ago. URLs are actually defined as one kind of Uniform Resource Identifier (URI), another one being Uniform Resource Names (URNs) intended to provide such namespaces. A URN IETF working group [2] has been active for a long time, and recently published a set of RFCs (3401 to 3406).

Work on the topic of content networking has also been performed by the now closed Web Replication and Caching (WREC) IETF working group, which issued a taxonomy in [9]. An interesting survey of current work on advanced content networks is [13].

References
[1] IETF URN Working Group: http://www.ietf.org/html.charters/urn-charter.html

[2] Open Content Network: http://www.open-content.net

[3] Rsync: http://rsync.samba.org

[4] Squid Internet Object Cache: http://www.squid-cache.org

[5] M. Cieslak and D. Forster, "Web cache coordination protocol v1.0," Expired Internet Draft, draft-forster-wrec-wccp-v1-00.txt, Cisco Systems, July 2000.

[6] M. Cieslak, D. Forster, G. Tiwana, and R. Wilson, "Web cache Coordination Protocol v2.0," Expired Internet Draft, draftwilson-wrec-wccp-v2-00.txt, Cisco Systems, July 2000.

[7] David D. Clark, "The design philosophy of the DARPA Internet protocols," Computer Communication Review, Volume 18, No. 4, August 1988. Originally published in Proceedings of SIGCOMM'88.

[8] Ian Cooper, Ingrid Melve, and Gary Tomlinson, "Internet Web Replication and Caching Taxonomy," RFC 3040, January 2001.

[9] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee, "Hypertext Transfer Protocol — HTTP/1.1," RFC 2616, June 1999.

[10] Geoff Huston, "Web Caching," The Internet Protocol Journal, Volume 2, No. 3, September 1999.

[11] Vinod Valloppillil and Keith W. Ross, "Cache array routing protocol v1.0," Expired Internet Draft, draft-vinod-carp-v1-03.txt, February 1998.

[12] D. Wessels and K. Claffy, "Application of Internet Cache Protocol (ICP), Version 2," RFC 2187, September 1997.

[13] D. Wessels and K. Claffy, "Internet Cache Protocol (ICP), Version 2," RFC 2186, September 1997.

CHRISTOPHE DELEUZE holds a Ph.D. degree in computer science from Université Pierre et Marie Curie, Paris. He worked on quality-of-service architectures in packet networks, and then spent three years in a start-up company designing CDN systems. He has also been a teacher. E-mail: christophe.deleuze@free.fr