Web Caching by Duane Wessels, ISBN 1-56592-536-X, O'Reilly, June 2001.
It's always a pleasure to read a technical book written by someone who has not just studied the topic, but has been so involved that he has spent years living and breathing the subject. Such books do more than just describe the technology, because they are invariably able to add a dimension of deeper insight and interest, and in so doing, bring the topic to life for the reader. Duane Wessel's experiences in the Harvest project, and then as self-confessed "Chief Procrastinator" in the Squid Web cache project, certainly place him in the category of an author who has lived the topic. The outcome is a well-researched and very readable book on the topic of Web caching.
Web caching has been an integral part of the architecture of the World Wide Web since its inception, and is now a broad topic encompassing a range of approaches, a range of technologies, and a range of deployment issues for the end consumer, the content publisher, and the service provider intermediaries. The book starts with a clear introduction that outlines the elements of the architecture of the Web, and describes the terminology used within the book. This section also provides a basic introduction to the operation of the Hypertext Transfer Protocol (HTTP). This section also describes the various forms of Web caches that are in use today.
The way in which a cache interprets the directives at the header of a delivered Web object is described in some detail. I learned something unexpected here, in that a Web object that includes a directive of the form "Cache-control: no-cache" is defined in RFC 2616 as allowing a cache to store a copy of the object and use it, subject to revalidation, for subsequent requests. It seems that if you really want the object not to be stored in a cache, then "no-store" is what you are after, because "no-cache" allows the object to be cached! As well as describing the definition of the cache control directives, this section provides a clear explanation of how document ageing is defined, and when a cache server determines that a cached object should be checked against the original to ensure that the cached copy remains a faithful reproduction.
Caching has its champions and its detractors, and the book attempts to present both perspectives in a balanced fashion. On the positive side, caching is seen as an effective way to improve the performance of the delivery of Web-based services, and to relieve network and server load. The claim is made here that a large busy cache can achieve a hit ratio of some 70 percent. Don't get too enthusiastic, however, because a more common achieved ratio is somewhere between 30 and 40 percent.
On the negative side is the ever-present issue of accuracy of the cache, the inability for a content provider to track contact access, and the issue of integrity of the cache in the face of service attacks that are directed to the cached copy of the content.
The Politics of Caching
This section of the book intrigued me, because it is certainly rare to see a technical book address the various social implications of the technology. The study includes the issues of privacy, request blocking, copyright control, content integrity, cache busting, and the modifications to the trust model in the presence of cache intermediaries. The book exposes the tension between the content provider, the user, and the service provider. The content provider would generally like to exercise some control over tracking who is accessing the content and how each client uses the content and how they navigate through the Web site. The user is interested in efficiency of content delivery, and also has to place a high level of trust in the integrity of the content-delivery system. The service provider is also interested in rapid delivery of content, as well as managing network load. Third parties, such as regulatory or law-enforcement bodies, may be interested in ensuring that the content originator is un-ambiguously traceable, and that various regulations with respect to content are enforced by content originators and service providers.
From this overview, the book moves onto more practical topics, and first describes how to configure browsers to take advantage of caches. It also covers how various proxy auto-configurators work. The topic that has generated some attention is that of interception caching , where a user's Web-browser commands are intercepted by a provider cache without the direct knowledge of the user of the user's browser. The techniques of implementing such interception caches are described, including a description of the operation of the Web Cache Coordination Protocol (WCCP), policy routing, and firewall interception. Interception caching, or transparent caching, is a topic that has generated its fair share of controversy in the past, and the book does take the time to clearly describe the issues associated with this caching approach.
The other topic covered under the general topic of practical advice is advice to server operators and content providers on how to make servers and content work in a predictable fashion with caches, describing which HTTP reply headers affect cacheability. This section provides advice on how to build a cache-friendly Web site, and motivates this with reasons why a content provider would want to ensure that content is readily cacheable. This includes some practical advice on how a content provider can still receive hit counts and site navigation information while still allowing the content of a site to be cached.
Fun with Caches—Cache Hierarchies and Clusters
Although caches can operate in a standalone configuration, it is possible to interconnect caches so that a cache will refer to another cache in the event of a cache miss, rather than directly refer to the origin server. I gather that the author is not overly keen on such an approach, given that the arguments against such configurations consume five times as much space as the arguments in favor! The alternative to a strict hierarchy is a set of cooperating peer caches, together with an intercache protocol to allow a cache to efficiently query its peers for an object. The book describes the Internet Cache Protocol (ICP), the Cache Array Routing Protocol (CARP), which is pointed out to be an algorithm, not a protocol, despite its name, the Hypertext Caching Protocol (HTCP), and Cache Digests . The scenarios where each approach would be preferred is a helpful addition to this section. Cache clusters are also described; if I have a criticism of the book, it is that this section is too terse—I was looking for more details of cache-balancing and content-distribution techniques.
The final section of the book looks at the tasks associated with designing, benchmarking, and operating cache servers. How much disk space is enough for a cache? How much memory? Where should the caches be placed in the network? What aspects of the cache operation should you monitor? And if you are considering purchasing caches, what aspects of the cache should you carefully examine?
This is not a book about how to build a cache, although if you are considering doing that it's a good place to start your research. Nor is it a book about every detail on how to operate a cache. But if you are operating a cache, it will be useful. Although it's not a book about how to operate a Web server, if you are operating a Web server, then caches will attempt to store your content, and this book will help you configure your server to interoperate predictably with caches.
The Web is a large part of today's Internet, and Web caches can make the Web faster, more efficient, and more resilient. If you want to understand how caches work and understand how you can use caches to improve the user's experience rather than making things worse, then this book is essential reading.
IPSec: The New Security Standard for the Internet, Intranets, and Virtual Private Networks , by Naganand Doraswamy and Dan Harkins, ISBN 0-13-011898-2, 1999, Prentice Hall PTR Web Infrastructure series. http://www.phptr.com
We all know that Internet security is a major concern. Evolving technologies such as Virtual Private Networks (VPNs) are making it easier to deploy secure networks at low costs. VPN technology is based upon encryption techniques that make use of different algorithms. Most of these algorithms are specified in the form of Requests for Comments (RFCs). Though RFCs provide the minute details, they are not exactly lively reading. This is where the IP Security (IPSec) book comes in handy. The authors have done their best to explain IPSec technology in layman's language, although one encounters a lot of technical jargon in this book.
The book is divided into three parts. Part I gives a history of cryptography and techniques and cryptographic tools, and overviews of TCP/IP and IPSec. Authentication methods such as Public Key Infrastructure (PSI), RSA, and DSA are discussed. Key exchange methods such as Diffie-Hellman and RSA Key Exchange are discussed, along with their advantages and disadvantages. IPSec architecture is explored in the IP Security Overiew section, which describes the security services provided by IPSec, how packets are constructed and processed, and the interaction of IPSec processing with policy. IPSec protocols—Authentication Header (AH) and Encapsulation Security Payload (ESP)—are the basic ingredients of the IPSec stack to provide security. Both AH and ESP can be operated in either the transport mode or tunnel mode. Part II offers a detailed analysis of IPSec, the different modes, IPSec implentation, the ESP, AH, and the Internet Key Exchange (IKE). The authors do a good job of describing the IPSec road map, which defines how various components within IPSec interact with each other. Detailed packet formats of different IPSec formats are discussed in Chapter 4. ESP, AH, and IKE are discussed in depth in Chapters 5 through 7. Part III deals with most of the deployment issues concerned with IPSec, as well as policy definition, policy management, implementation architecture, and end-to-end security are discussed in this section. Chapter 11 discusses the future of IPSec and what it means to the world of security. Though IPSec may be thought of as a totally secure method of communication, it has its conflicts when it comes to Network Address Translation (NAT), multicasting, and key management in a multicast environment.
Although the authors have done a good job delivering the IPSec concept, understanding this text requires more than basic computer and communication concepts. One should understand hacking and different types of Internet attacks. OSI layer details and packet-level understanding of every layer within the OSI model is a must.
—Manohar Chandrashekar, WorldCom Inc