The Internet Protocol Journal - Volume 7, Number 3

Letters to the Editor

Content Networks

Dear Editor,

Christophe Deleuze's article on Content Networks (The Internet Protocol Journal, Volume 7, Number 2, June 2004) made me realize that there are very different ways to look at this issue. I would like to use the term Content Addressable Network for a network that is used to retrieve information not by specifying its location but the identity of the content itself. The term points to similar concepts in electronics (Content Addressable Memory) and storage (Content Addressable Storage). One could argue that a Content Addressable Network is in fact a distributed Content Addressable Storage.

In a very real sense the Internet already is content addressable. Several of my non-IT friends use the "Search" field in the Google toolbar even for regular URLs, foregoing the Address field in their browsers. In doing so, they simply ignore the distinction between content and location. It usually gets them where they want to go.

Let's define content as a static binary object, for example, a document, picture, song, or movie. How can we identify content if not by location? We can create a hash of the object as a handle or placeholder. (A hash is the result of a calculation that takes the whole object as input. A good hashing algorithm ensures that if you change a bit in the object, at least one bit in its hash changes too.) If we know the placeholder, we can retrieve a copy of the original object, even if we don't know the location of any of the copies out there on the net. I could mail you the hash of a paper, song, or movie and you would be able to retrieve a copy, although not necessarily from the same place as where I got it. (You might have to pay to get it though!)

Suppose that the Google bot, while traversing the Internet to build its index, calculates the hash for each object it encounters. It can then build an index of all hash codes, relating them to the URLs where they were found. (This requires no change in Google: the hash is just one more word it found in the document.) We can then google a hash code to find all occurrences of the object. (You can simulate this today by selecting a line of text from a document and launching a search for that sequence of words. Google will often find multiple copies. Just one line of text is an extremely poor hash, so you may get a few false hits, but in my experience not many.)

Simply by adding these hashes, we have turned the Internet into a Content Addressable Network. If our purpose is to make ourselves independent of any single copy on any particular server, this is all we need. For other applications, the objective is to optimize the network paths to the servers that hold a copy of our object (for example, a movie). We need a metric that tells us which of the listed locations is "closest" to our point of entry. This is complicated by the fact that the Internet is a weird space. The shortest route between Amsterdam and Brussels might well go via London or Paris.

Fortunately, there is a database that keeps track of all the available routes and their cost. It is the Border Gateway Protocol (BGP) routing table. BGP divides the Internet in chunks called Autonomous Systems or ASs and tracks the cost of the routes to each AS. If the Google bot would record the AS along with each URL, our client system could query our local BGP router (or a proxy holding a copy of its database) to find the AS and thus the copy that is closest in terms of network costs. Note that these costs also reflect policy rules such as peering arrangements between ISPs.

If our objective is to dynamically optimize the load on the servers, we cannot avoid querying (a local subset of) these servers for a bid. Distributing the load over servers in different time zones may sometimes be more important than keeping the transports local. Our client should select a server that is not too busy but no further away than necessary.

The Content Networks as discussed by Christophe Deleuze were created as a commercial offering that would require no cooperation from the clients—in every sense an operator's approach. It is restricted to the case where all copies of the object are published by a single entity. The way ahead is to create protocols for requesting network cost for a list of sites, and service costs from a list of servers, independent of the nature of the object and the servers that hold copies of it.

It may seem more efficient to let the publisher add the hash code to the objects. HTML files would be labeled with a <MD5= tag, obviating the need for bots and users (for "content bookmarks") to do the calculation. This would allow publishers to change content without changing the hash, to correct typos or remove scenes deemed unsuitable for local viewers. But it would no doubt result in fake objects, purporting to be copies of popular objects but peddling dubious commercial proposals. Creating fake objects is more difficult if the hash code is calculated by an independent and unrecognizable bot, although I'm sure the problem is not completely solved with that.
—Ernst Lopes Cardozo
Aranea Consult BV, The Netherlands

e.lopes.cardozo@aranea.nl



IPJ Article Identification

Hi,

I noticed that the IPJ page footer only says "The Internet Protocol Journal" but neither the Volume/Issue number, nor the issue date. That makes it a bit hard to correctly reference a given article when you only have a copy of that article and not the whole issue. I propose that you add something like (from the August issue of CACM):

Communications of the ACM August 2004/Vol. 47, No. 8

(I only checked the archived PDF files but I suppose the hardcopy has the same problem.)
—Örjan Petersson
orjan.petersson@logcode.com

We could certainly add the Volume/Issue identifier to the footer, but since this would have to be done retroactively for all 26 issues to date it is probably better to use our soon-to-be-deployed ASCII index. This will allow you to find any article with a simple search. A short sample of the index is shown below.

The Internet Protocol Journal Volume 1, 1998

Article Author(s) Page
---------------------------------------------------------------------------------

* Volume 1, No. 1, June 1998:

What Is a VPN? - Part I Ferguson/Huston 2
SSL: Foundation for Web Security William Stallings 20
Book Review: Groupware Dave Crocker 31
Book Review: High-Speed Networks Neophytos Iacovou 33

* Volume 1, No. 2, September 1998:

What Is a VPN? - Part II Ferguson/Huston 2
Reliable Multicast Protocols and Applications C. Kenneth Miller 19
Layer 2 and Layer 3 Switch Evolution Thayumanavan Sridhar 38
Book Review: Gigabit Ethernet Ed Tittel 44

* Volume 1, No. 3, December 1998:

Security Comes to SNMP: SNMPv3 William Stallings 2
CATV Internet Technology Mark Laubach 13
Digital TV George Abe 27
I Remember IANA Vint Cerf 38
Book Review: Internet Messaging Dave Crocker 40
Book Review: Web Security Richard Perlman 42
Book Review: Internet Cryptography Frederick M. Avolio 44

—Ole J. Jacobsen
Editor and Publisher
ole@cisco.com