The DNS protocol is 8-bit clean ("Internationalizing the Domain Name System," IPJ, Volume 11, No. 1, March 2008), even if some DNS clients and servers are not. The hardest thing about changing any Internet protocol is coordinating clients and servers during the transition.
And yet, with the DNS, no transition is needed to support UTF-8 domain names. If you want to publish a UTF-8 domain name, then run a name server that supports UTF-8. If you want to be able to access domain names in your own language, switch to DNS software that supports it. Implementations that are 8-bit clean are already available; ordinary market mechanisms will handle the rest.
Punycode is a gross hack that makes my stomach roil. You know it, I know it, any engineer will agree with you, so how did it get through the IETF?
The argument for where to stop internationalization does not spread to protocol:// because it's "gobble-de-gook" in English, too. Dots are a completely arbitrary character used to separate the hierarchy. There's plenty of space at the top for UTF-8 names.
The real problem with IDN is homoglyphs.
The author responds:
It would certainly make more sense in terms of design elegance and minimalism within the DNS if the label that was stored in the DNS was precisely the same label that was used in the interface between applications and the DNS client software. There is something rather clumsy about the approach that stores an encoded version of a canonical version of the label value, and relies on the application being capable of performing the stringprep and encoding functions in consistent and uniform ways. The resultant limitations on what can actually sit in DNS labels on a language-by-language basis are, in part, an outcome of the potential indeterminism of this canonicalization function.
But indeterminism is not a tolerable outcome of the DNS. The DNS is not a guessing game, and inconsistencies in the mapped transforms that are provided by the DNS trigger intolerable insecurities in the networked environment. So the nameprep profiles and the related restrictions on allowable Unicode code points are unavoidable if we want to avoid this indeterminism in the DNS.
So if nameprep is required in any case, then what we are left with to consider is the decision to use the Punycode ASCII Compatible Encoding (ACE) to map Unicode labels into the Letter-Digit-Hyphen (LDH) subset of ASCII. But is the Punycode ACE really that much of a problem? Within the overall IDN framework the Punycode algorithm is not so complex that the risk of incorrect implementations is significant, the algorithm is not processor-intensive, and the outcome does not inflate the encoded labels to an impossible length. The advantage of Punycode is that the DNS servers do not require modification, and the clients that manipulate IDNs required additional nameprep functions in any case, so Punycode was evidently intended to be the least-impact approach that spared DNS servers from a potential requirement for modification.
To me, this solution appears to be a design tradeoff, in so far as the ACE approach circumvents the observed problem of non-8-bit clean DNS servers sitting within the deployed DNS, and does not in and of itself demand novel roles and functions on the part of the clients of the DNS in addition to what was already necessitated by the IDN nameprep function. However, at the same time it creates an annoying inconsistency in the overall framework of the design of the DNS, where certain labels in the DNS are intended to trigger a Punycode transform into an equivalent Unicode string while other labels are meant to be used without further transforms applied.
My judgment of the short-term path of least risk sits with the ACE approach as adopted for IDNs, but at the same time I agree with Russ' discomfort that the path that preserves the long-term essential broad utility and function of the DNS through consistency of design and application sits in an 8-bit clean DNS without the adornment of any form of an ACE.
And, yes, I agree with Russ that the most significant problem with IDNs is homoglyphs, because of continued reliance of an underlying approach of "appearance is everything" in terms of the integrity of the DNS as an identity framework.