The Internet Protocol Journal - Volume 10, No. 1

Writing Internet Drafts and RFCs Using XML

by Marshall T. Rose, Dover Beach Consulting, Inc. and Carl Malamud, Public Resource, Inc.

What is the work product of the Internet Engineering Task Force (IETF)? Some cynical observers might suggest "many fine lunches or dinners," but we argue that those niceties are merely the means to an end. The goal of the IETF is to provide open standards for the Internet community, and those standards are memorialized as written documents called Request For Comments (RFCs).

In general, two organizations control the publication of documents as RFCs:

  • The Internet Engineering Steering Group (IESG) determines which documents are suitable for publication as RFCs—typically by chartering working groups, reviewing their progress (through reading the work-in-progress Internet Drafts)—and ultimately approving their documents for publication.
  • The RFC Editor strives for "quality, clarity, and consistency of style and format," and has developed a particular editorial style. The latest RFC that documents this style, RFC 2223 [1], is about a decade old. A somewhat more current version can be found in a text file maintained by the RFC Editor. [4]

For a more detailed discussion of the interaction between these two organizations, consult RFC 3932 [2].

As an organization, the IETF excels at "eating its own dog food," including its work product: just as a protocol specification describes interactions on the wire but does not dictate the programming language used for implementation, so too, the IETF has not really cared which document preparation tools are used. The IESG worries about technical quality, and the RFC Editor worries about stylistic consistency (and, to be fair, technical quality as well). This policy works because of the careful choices made by the early Internet community, and in particular the RFC Editor, with respect to the "final form" footprint of the documents. (A discussion of these design decisions is far beyond the scope of this short article—for now, we note that it is hard to argue with success.)

An unfortunate side effect of this focus on stylistic consistency is that, for many years, the RFC Editor has had to recode documents for consistent formatting. Internally, the RFC Editor used nroff [5] for this purpose, and sophisticated authors wishing to minimize RFC Editor "downtime" tended to use the same nroff boilerplate. The nroff text-formatting program has many strengths, but it can also be fairly viewed as a textual "assembly language," with the result that authors spent a lot of time dealing with low-level formatting concerns.

In some limited cases, the high degree of formatting-specific expertise is warranted, but for the vast majority of documents, the high entry cost is not.

From Assembly Language to Markup

In early 1999 we were working at a startup company, and we needed a way to organize, search, and retrieve information from documents. We decided to use a markup language for this purpose. We also decided to use the RFC series as one of the testing grounds for the technology, because this series was one we were familiar with. Although today everyone knows what the Extensible Markup Language (XML) is, then there were only two widely known markup languages for authoring: SGML and HTML.

The "SG" in SGML is an abbreviation for Standard Generalized and not Simple Generic. SGML is used for the formatting of a great many books; further, it is used in large projects with long lifetimes. Although truly excellent from an "enumerate every possibility" standpoint, it has a very high cost of entry, making it difficult to use for anything other than specialized applications.

In contrast, the Hypertext Markup Language (HTML) embodies elegance of design, but (in the absence of Cascading Style Sheets [CSS]), is a presentation language, not unlike nroff in many respects. In other words, we needed something with the structural richness of SGML and the elegant simplicity of HTML. The newly invented XML seemed to meet the requirements.

This process led us to develop a language based on XML, which captured high-level RFC constructs (for example, authorship information) and largely ignored presentational concerns. The result is called the 2629 format [3] (also known as the "xml2rfc format," named after the initial processor for this language).

The Advantages of Markup

To understand the advantages of this approach, let's look at one example: references. Like most archival series, the RFC Editor has a very rigorous, yet unstructured, syntax for citations. Although this consistency is good for readers of RFCs, achieving consistency of references using tools such as nroff was often the hardest part of creating a new document. With the 2629 format, the <reference> element contains a small number of subordinate elements that capture all the semantics of the reference. The XML processor takes this information and produces a properly formatted document.

Further, because this information is structured, it is possible to develop automated bibliographic databases for a wide range of data sources. In fact, using the XML "include" mechanism, a document author usually includes just a pointer to the reference, and lets the processor do all the complicated work.

A second advantage is that processors can produce different kinds of output. Some people prefer to view their documents in HTML rather than the canonical textual format. Julian Reschke has written a library of XSLT files that convert to various HTML formats (Strict, Transitional, XHTML, and so on). For example, references are hyperlinked in line, allowing for easy traversal of citations. Still others prefer the Portable Document Format (PDF) for printing. By using one of Julian's XSLT scripts and the truly excellent Prince [6] XML/CSS processor, the result is high-quality, printer-ready output.

However, the primary advantage is that the "high-level" approach allows the author to focus more on content and less on format: a processor can enforce the vast majority of the esoterica associated with the RFC Editorial style, including:

  • Inserting required boilerplate (and in particular, the desired revision of the boilerplate)
  • Checking for mandatory sections such as "Security Considerations" or "Normative References"
  • Generating a specialized table of contents, etc.

To Infinity and Beyond

After publishing RFC 2629, an unexpected result occurred: people outside the IETF started using the 2629 format for their projects. Most credit for this side effect goes to the universality of the canonical textual format. However, some authors are using the 2629 format when writing books (they convert the 2629 format to SGML, which is sent to the publisher), business plans, and software documentation—and even to create a new series of non-IETF technical documents. The constituency here seems to revolve around having a simple yet structured way to author documents.

For the last few years, a large number of XML editing programs have been deployed, and many of these support the 2629 format. These editors offer two advantages: first, they provide a natural paradigm for editing nested content; and, second, sophisticated editors can be integrated into an automated work flow. (Having said that, the authors still use Emacs and vi for their XML editors.)

A good example of the use of XML editors is a "plug-in" for the XMLMind Editor [7]. This plug-in, written by Bill Fenner, provides a variety of services to the author, such as graphical editing of sections, templates for common constructs, and validation of references.

Over the last 10 years, the 2629 format has evolved in true IETF fashion, based on running code and a rough consensus. Originally created by the authors for our own convenience, we have been more than pleased to see this format used first by an informal community of developers and writers, and more recently by the IETF secretariat, tools team, and administrative entity and by the RFC Editor.

Today, many people use a common high-level markup language for writing RFCs. The next step in this natural evolution will be making the repository of XML-tagged RFCs available to those involved in document distribution, so that RFC repositories will be able to take advantage of the meta-data in the creation of search engine, alternative formats, and any other value-added constructs that would be of use to the community. (At present the RFC Editor prefers input in the 2629 format, but ultimately runs a processor that generates nroff for "tweaking"—in the near future, we hope that the xml2rfc textual output can be tuned to avoid this final step.)

To find out more, go to the xml2rfc Website [8] or visit the official directory of IETF authoring tools [9].

References

[1] Postel, J. and J. Reynolds, "Instructions to RFC Authors," RFC 2223, October 1997.

[2] Alvestrand, H., "The IESG and RFC Editor Documents: Procedures," BCP 92, RFC 3932, October 2004.

[3] Rose, M., "Writing I-Ds and RFCs using XML," RFC 2629, June 1999.

[4] Reynolds, J. and R. Braden, "Instructions to Request for Comments (RFC) Authors," August 2004.

[5] Ossanna, J., "Nroff/Troff User's Manual," UNIX Programmer's Manual – Volume 2 (Bell Laboratories), 1979. http://en.wikipedia.org/wiki/Nroff

[6] http://princexml.com/

[7] http://www.xmlmind.com/xmleditor/

[8] http://xml.resource.org

[9] http://tools.ietf.org/inventory/author-tools.shtml

CARL MALAMUD is the co-founder of Public Resource, a nonprofit public-benefit engineering firm. Exploring the Internet was published in 1992 as a book, but today would be called a blog. "Geek of the Week" was published in 1993 as an audio file available for download with FTP, but today would be called a podcast. E-mail: carl@media.org

MARSHALL T. ROSE is Principal of Dover Beach Consulting, Inc. He has authored 9 books, 74 RFCs, and 4 patents. With respect to his work on the 2629 format, he claims "self defense." E-mail: mrose@dbc.mtview.ca.us