The Internet Protocol Journal, Volume 11, No. 2

Letters to the Editor

IDNs

The DNS protocol is 8-bit clean ("Internationalizing the Domain Name System," IPJ, Volume 11, No. 1, March 2008), even if some DNS clients and servers are not. The hardest thing about changing any Internet protocol is coordinating clients and servers during the transition.

And yet, with the DNS, no transition is needed to support UTF-8 domain names. If you want to publish a UTF-8 domain name, then run a name server that supports UTF-8. If you want to be able to access domain names in your own language, switch to DNS software that supports it. Implementations that are 8-bit clean are already available; ordinary market mechanisms will handle the rest.

Punycode is a gross hack that makes my stomach roil. You know it, I know it, any engineer will agree with you, so how did it get through the IETF?

The argument for where to stop internationalization does not spread to protocol:// because it's "gobble-de-gook" in English, too. Dots are a completely arbitrary character used to separate the hierarchy. There's plenty of space at the top for UTF-8 names.

The real problem with IDN is homoglyphs.

-Russ Nelson
nelson@crynwr.com

The author responds:

It would certainly make more sense in terms of design elegance and minimalism within the DNS if the label that was stored in the DNS was precisely the same label that was used in the interface between applications and the DNS client software. There is something rather clumsy about the approach that stores an encoded version of a canonical version of the label value, and relies on the application being capable of performing the stringprep and encoding functions in consistent and uniform ways. The resultant limitations on what can actually sit in DNS labels on a language-by-language basis are, in part, an outcome of the potential indeterminism of this canonicalization function.

But indeterminism is not a tolerable outcome of the DNS. The DNS is not a guessing game, and inconsistencies in the mapped transforms that are provided by the DNS trigger intolerable insecurities in the networked environment. So the nameprep profiles and the related restrictions on allowable Unicode code points are unavoidable if we want to avoid this indeterminism in the DNS.

So if nameprep is required in any case, then what we are left with to consider is the decision to use the Punycode ASCII Compatible Encoding (ACE) to map Unicode labels into the Letter-Digit-Hyphen (LDH) subset of ASCII. But is the Punycode ACE really that much of a problem? Within the overall IDN framework the Punycode algorithm is not so complex that the risk of incorrect implementations is significant, the algorithm is not processor-intensive, and the outcome does not inflate the encoded labels to an impossible length. The advantage of Punycode is that the DNS servers do not require modification, and the clients that manipulate IDNs required additional nameprep functions in any case, so Punycode was evidently intended to be the least-impact approach that spared DNS servers from a potential requirement for modification.

To me, this solution appears to be a design tradeoff, in so far as the ACE approach circumvents the observed problem of non-8-bit clean DNS servers sitting within the deployed DNS, and does not in and of itself demand novel roles and functions on the part of the clients of the DNS in addition to what was already necessitated by the IDN nameprep function. However, at the same time it creates an annoying inconsistency in the overall framework of the design of the DNS, where certain labels in the DNS are intended to trigger a Punycode transform into an equivalent Unicode string while other labels are meant to be used without further transforms applied.

My judgment of the short-term path of least risk sits with the ACE approach as adopted for IDNs, but at the same time I agree with Russ' discomfort that the path that preserves the long-term essential broad utility and function of the DNS through consistency of design and application sits in an 8-bit clean DNS without the adornment of any form of an ACE.

And, yes, I agree with Russ that the most significant problem with IDNs is homoglyphs, because of continued reliance of an underlying approach of "appearance is everything" in terms of the integrity of the DNS as an identity framework.

—Geoff Huston
gih@apnic.net

More IDNs

The LDH restriction referred to in "Internationalizing the Domain Name System" (IPJ, Volume 11, No. 1, March 2008) was relaxed in RFC 1123 [1] to allow a host name to begin with either a letter or a digit

—Andrew Friedman

[1] R. Braden, Editor, "Requirements for Internet Hosts—Application and Support," RFC 1123, October 1989.

The author responds:

My thanks to Andrew for pointing this out. It has been commonly recounted that this relaxation of the LDH convention was associated with the successful registration of the DNS name 3com.com and that the RFC paperwork was revised following this registration. Since then the most visible set of names that used this "liberal" revision of LDH with names that have leading digits were telephone number mapping name sets, including the venerable tpc.int domain of the early 1990s and, more recently, ENUM. As for names with leading hyphens, I don't believe that we are at the point of allowing Morse code into the DNS yet, but I'm sure that someone somewhere is working on it!

—Todd Hansen, UCSD/SDSC
tshansen@hpwren.ucsd.edu

—Geoff Huston, APNIC
(--. . --- ..-. ..-.)

We want to hear from You

Your feedback is important to us. Please send your comments and suggestions to ipj@cisco.com. And don't forget to visit our Website at http://www.cisco.com/ipj where you can read or download back issues, update and renew your subscription, and find articles using our index files. We also encourage you to participate in our online forum at http://ipjforum.org