by Torbjörn Eklöv, Interlan Gefle AB, and Stephan Lagerholm, Secure64 Software Corp.
As a reader of The Internet Protocol Journal, you are probably familiar with the Domain Name System (DNS) "cache poisoning" techniques discovered a few years ago. And you have most likely heard that Domain Name System Security Extensions (DNSSEC) [0, 13, 14, 15] is the long-term cure. But you might not know exactly what challenges are involved with DNSSEC and what experience the early adopters have gathered and documented. Perhaps you waited with your own rollout until you could gather more documentation about operational experiences when rolling out DNSSEC.
Stephan Lagerholm and Torbjörn Eklöv are DNS architects with significant DNSSEC experience. Torbjörn lives in Sweden and has helped several municipalities, as well as other organizations, sign their zones. Stephan Lagerholm lives in Dallas, Texas, and has been involved in implementing DNSSEC at several U.S. federal agencies. This article summarizes their experiences, including lessons learned from implementing the technology in production environments, and discusses associated operational concerns.
A plethora of information about DNSSEC and cache poisoning attacks is available on the Internet , so we will not repeat it, but we think it is important to state where DNSSEC is today.
During the last few years the number of deployments, as well as the size and importance of the signed domains, has increased significantly. One of the main reasons for adoption of the DNSSEC during the past year was that the U.S. Office of Management and Budget (OMB) issued a mandate requiring the signing of the .gov domain in the beginning of the year. U.S. federal agencies were mandated to sign their domains by the end of 2009. Some agencies have already implemented the technology, whereas others are still working on it. 
Acceptance of DNSSEC technology is also reaching outside of the U.S. government. Top Level Domains (TLDs) around the globe have announced DNSSEC initiatives. To mention a few, Afilias signed .org and Neustar recently announced signing of .us. Several County Code TLDs (ccTLDs), including .nl and .de, announced that DNSSEC implementation is a work in progress. VeriSign has announced that it is working on signing the largest TLDs, namely .com and .net. Finally, the Internet Corporation for Assigned Names and Numbers (ICANN) along with VeriSign released a timeline for signing the root zone. And of course, the pioneer .se is on its fourth year as a signed TLD.
Several vendors have released software and products to support and make the signing of zones easier. A range of different products is now available on the market.
DNS professionals now have a broad choice of technology—from collections of open-source signing scripts to advanced systems with full automation and support for Federal Information Processing Standard (FIPS)-certified cryptography.
DNSSEC might significantly affect operations unless it is carefully implemented because it requires some changes to the underlying DNS protocol. Those changes are, in fact, the first significant changes that have been made to the DNS protocol since it was invented. Those changes might sometimes fool old systems into believing that the packets are illegal. DNSSEC also introduces new operational tasks such as rolling the keys and resigning the zone. Such tasks must be performed at regular intervals. Furthermore, as with any new technology, there are misconceptions about how to interpret the RFC standard.
The First Bug Reported
Late summer 2007, Torbjörn Eklöv convinced the municipality of Gövle in Sweden of the benefits of DNSSEC. He proudly signed what is believed to be the first municipality zone in the world, gavle.se. At first, everything worked fine. A week or so later, Gövle received reports from citizens who could not reach the municipality's websites. It turned out that a new version of Berkeley Internet Name Domain (BIND) was rolled out by a large service provider and that this version of BIND introduced a rather odd bug that affected DNSSEC. The result of the bug was that home users with some home routers and firewalls could not reach any signed domains.
Some people who heard about the problem at gavle.se wrongly believed that DNSSEC caused the problem and that DNSSEC is broken. However, this assumption is not true; DNSSEC worked as expected, but a bug in a particular version of BIND caused the problem. The problem triggered some research on how home routers handle DNSSEC. Stiftelsen för Internetinfrastruktur, the organization that runs the .se TLD, issued a report describing how commonly used home routers and firewalls handled the new protocol changes in DNS . Later, Nominet, which administers the .uk TLD, issued a similar report . In addition, DENIC, which administers the .de TLD, researched the same subject . The results are all discouraging; only 9 out of 38 tested home gateways supported DNSSEC correctly in the most recent reports.
A Birds of a Feather (BoF) session was held at the 76th meeting of the Internet Engineering Task Force (IETF) in Hiroshima to discuss the problems involving home gateways . We look forward to seeing progress in this area.
Preparing Your Firewall for DNSSEC
Most problems with DNSSEC are related to firewalls. Make sure to involve your security and networking administrators so that they can make the required changes before taking DNSSEC into production.
Two types of firewall problems are most common:
The first involves the Transmission Control Protocol (TCP). There is a misconception among firewall vendors and security administrators that DNS queries use the User Datagram Protocol (UDP) and that zone transfers use TCP. Unfortunately, this assumption is not entirely true. DNS queries first try UDP, but revert to TCP if no response is received for the initial UDP query or if the response lacks important information because it is truncated. The possibility of something in the path blocking the response to the initial query is much higher with DNSSEC because of the increased size of the responses.
For DNSSEC to work correctly, it is mandatory that you open your firewall for both TCP and UDP over port 53.
The second problem is related to the IP Buffer Reassembly size. The authors of the DNSSEC standard realized that a potential problem might exist with TCP queries. TCP puts a higher burden on the DNS servers. (TCP is much more expensive to process than UDP.) To avoid too much TCP traffic, the authors made the EDNS0 extension mandatory for DNSSEC. EDNS0 is one of the Extension Mechanisms for DNS (EDNS), a standard that, among other things, allows a client to signal that it is capable of receiving DNS replies over UDP that are larger than the previous limit of 512 bytes. Some firewalls are not aware of the fact that the EDNS0 standard allows for larger packets and they either block any DNS packet using EDNS0, or block any DNS packet larger than the 512 bytes regardless of the EDNS0 signaling.
Other firewalls allow for the large packets by default, whereas a few vendors require the firewall to be manually configured to do so. Any device in the path that does packet inspection at the application layer must be aware of the EDNS0 standard to be able to make a correct decision about whether to forward the packet or not. ICANN has summarized the status of EDN0 support in some commonly used firewalls .
Note that it is not enough to test that your firewall allows large incoming DNS replies by sending DNS queries to the Internet . You must also test that an external source can receive large DNS replies that your DNS server is sending. One way of doing so is to use an open DNSSEC-aware resolver [8, 9].
Test and configure your firewall to allow for use of EDNS0 and for DNS packets larger than 512 bytes over UDP.
Preparing Your Slaves
Setting up DNSSEC involves substantial changes to the master name server so it can sign and serve the signed data. However, it is easy to foresee that the slaves must be upgraded, too. The slaves are much easier to upgrade and operate because they never produce signatures.
They are secondary systems that transfer data from the primary server and respond to DNS queries. But the slaves must understand how to respond to queries requesting signed data.
Slaves must be upgraded to BIND 9.3 or better to understand the Next Secure (NSEC)  standard. NSEC is a method to provide authenticated denial of existence for DNS resource records. The newer Next Secure 3 (NSEC3) standard introduces some additional requirements for the slaves. If you use NSEC3, you must upgrade the slaves to BIND 9.6 or later. Version 3 of Name Server Daemon (NSD)  and any version of Secure64 DNS Authority/Signer  can do both NSEC and NSEC3. Windows Server 2008 R2 for the x86-64 architecture supports DNSSEC as a master, slave, and validating resolver. However, we recommend limiting the use of the Windows platform to slaves and for domains using NSEC. Our opinion is that it is very hard to implement DNSSEC on Windows, and we suggest that you wait until Microsoft offers a sensible Graphical User Interface (GUI) and support for NSEC3. Note that the Itanium version of Windows 2008 R2 supports neither DNS nor DNSSEC.
Make sure your slaves can handle the version of DNSSEC you intend to use.
If the slaves are administered by another party, contact the administrator before you begin DNSSEC implementation. Make sure the slaves are running a version capable of DNSSEC. Stephan helped a large U.S. federal agency sign its domains. The agency used one of the major federal contractors to run its slave servers. After multiple attempts to reach somebody that understood DNS and DNSSEC, Stephan finally learned that the slaves were running BIND 9.2.3 and that the contractor had no plans to upgrade. The only alternative for the agency was to in-source the slaves and run them itself.
If your slaves are administered by another party, make sure you know if and what version of DNSSEC that party supports before you start implementing.
Communicate with Your Parent
TLDs allow you to communicate with them in two ways:
Most problems described in the following paragraphs apply to both models, but those involving multiple registries are obviously applicable only to the Registrant–Registrar–Registry model.
Establishing a Chain of Trustin DNSSEC involves uploading one or more public keys to the parent. Ultimately the parent publishes a Delegation Signer (DS) record, a smaller fingerprint that can be constructed from the DNSKEY record. To upload your keys, you must use a registrar that supports DNSSEC. If your registrar does not support DNSSEC, you need to move your domains to another registrar (or convince your current registrar to start supporting DNSSEC). It usually takes a few days or up to a week to move a domain from one registrar to another.
Make sure that your registrar supports DNSSEC. If it does not, move your domain to a registrar that supports DNSSEC before you begin signing your zone.
Some registrars allow registration under multiple TLDs. However, just because a registrar handles DNSSEC for one TLD does not mean that it handles DNSSEC for all TLDs it serves. For example, several registrars in Sweden support DNSSEC for .se but not for .org or .us.
Make sure that your registrar handles DNSSEC under the TLD in question.
Most registrars offer you the opportunity to use their name server instead of your own. The service is either offered for free or for an additional cost. The registrar typically provides a web interface where you can change your zone data. This service is a good and useful choice if your domains are uncomplicated and small. Larger and more complex domains are better operated on your own servers.
Some registrars that provide this type of service can handle DNSSEC only if you use their name servers and not your own name servers. These registrars can establish the chain of trust with the parent only if the zone is under their control. They lack a user interface for uploading a DS key that you generate on your own name servers.
If you intend to use your own name servers, make sure that your registrar supports this deployment model, and allows you to upload a DS record for further distribution to the registry.
In theory, the child zone system should create the DS record fingerprint and upload it to the parent. In practice, some registrars require you to upload the DNSKEY record to them. They then create the DS record for you. (This practice is bad because the registrar must know the hash algorithm used to construct the DS record, which it might not know.) The DNSKEY record comes in several different formats, depending on the platform you used to create the keys (BIND, Microsoft, NSD, Secure64, etc.). The formats have minor differences, and you might have to convert the DNSKEY into a format that the registrar accepts.
Not everything works smoothly, even with the correct DNSKEY format. The logic at one registrar's website was to deny uploading of DNSKEYs unless the optional Time To Live (TTL) field existed. (The TTL value is useless in the DNSKEY context because the parent overrides this value with its own TTL). You may have to manually change your DNSKEY before uploading it to comply with the checks that the registrar performs.
If your registrar requires you to upload the DNSKEY, make sure that your solution can generate the requested format. If not, you need to manually change the fields with a text editor.
As noted previously, some registrars are performing too many checks and irrelevant checks before accepting and creating the secure delegation. Other registrars do not check at all or have limited checks that do not work as expected. For example, some registrars assume that your key is created using a certain algorithm, and they do not double-check it prior to creating a DS record. One registrar created a bogus DS record if you uploaded a DNSKEY with upper-case characters in the domain name. The bogus DS record looked valid, and troubleshooting to find this error took hours.
Another example is keys created with Webmin , a graphical tool that you can use for signing zones. Webmin defaults to using the less-common Digital Signature Algorithm (DSA) for its DNSKEYs. The registrar did not complain when uploading the Webmin key, and it created a bogus DS record by assuming that it was an RSA key.
It is hard for a registrant to do anything about errors at the registrars. The best you can do is to make sure that you upload the correct key with the correct parameters such as algorithm, key length, key-id, etc. If something goes wrong, you might have to change the keys in production. Rolling the keys to the same algorithm and key length is relatively easy—but changing your keys to another algorithm adds extra complexity. It is an interesting exercise to change to another algorithm in production, but it is something we recommend avoiding if possible.
Double-check the DNSKEY/DS so that it is created with the correct parameters prior to uploading it.
Communicate with Your Children
If you have sub-domains in your domain, you must make sure that you can accept and publish the DS records that your children upload to you. This situation is not a problem if you use zone files in text format—you can simply insert the DS record using your favorite editor. But it might be a problem if you are using an Internet Protocol Address Management (IPAM) system. In that case make sure that it can insert DS records into the zones that are managed by the system. Some IPAM systems do not support insertion of DS records correctly.
Make sure that your IPAM system can insert DS records into your zones.
A common strategy among organizations with high-availability requirements for their critical servers is to use a global load balancer, which is basically a DNS server that responds differently depending on the status of the service in question. For example, assume a load balancer can respond to a question for www.example.com with 192.0.2.1 and 192.0.2.2 if both web servers are up. If .1 becomes unavailable, the load balancer notices a failure and responds only with .2. In order to use a global load balancer, you must delegate www as a sub-domain to its own DNS process.
When DNSSEC is implemented, you must make sure that the load balancer can handle DNSSEC (and not that many do); otherwise it is impossible to sign the responses for those resources. Unfortunately, these resources are the most critical ones for your environment and would benefit the most from DNSSEC signing.
Make sure that your load balancers support DNSSEC. If they do not, have an alternative strategy.
Rolling the Keys
You should change the DNSKEYs regularly and when you think the keys are compromised. The process of doing so is called rolling the keys. There are normally two different keys in DNSSEC, the Key Signing Keys (KSKs) and the Zone Signing Keys (ZSKs). Rolling the ZSK is an internal process and does not require communication with the parent. Rolling the KSK, on the other hand, requires the parent to publish a new DS record. 
There is no standard yet that describes how the communication between the parent and the child should occur when a key is rolled. Early DNSSEC-capable registrants used a web interface that allowed their registrants to upload and manipulate the DNSSEC information. With a web interface, each domain must be handled separately and there is no easy way to automate the interaction.
The web interface works for a handful of domains but becomes very cumbersome when you have many domains. For those types of organizations, it is important to make sure that there is some kind of Application Programming Interface (API) or script access to the registrar. This interface allows the organization to upload new DS records during the rollover in a convenient way.
Make sure that your registrar supports automation through an API if you have many domains.
Scripting with an API as described previously is one way of communicating with the registrar. Another way of achieving the same type of automation is for the parent (or registrar) to monitor the child for any changes to the DNSKEY records.
Note that the chain of trust is still intact during a nonemergency rollover. The parent can securely poll the child and grab the new DNSKEY records and convert them into DS records. The polling from the parent to each signed child needs to occur regularly so that a rollover is picked up quickly. This regularity of polling makes the scheme best for domains with fewer delegations (in the order of thousands, not millions—consider how much bandwidth an hourly polling of 15 million children would require).
Automation is a good thing, but make sure you understand the implications when opting for automatic detection of key rollovers. The automation scripts are not fail-safe. It has been reported that early versions of such scripts under some circumstances wrongly assumed that a key rollover occurred and deleted the DS record, thus breaking the chain of trust.
Understand the implication when opting for automatic detection, addition, and deletion of DS records.
Management of DNSSEC
Without DNSSEC, you are not bound to any particular registrar; you can switch to a new registrar fairly easily. With DNSSEC, this situation changes. First of all, if you let the registrar sign the zone on your behalf, the registrar will be in charge of the key used to sign your zone. Extracting your key so that it can be imported to another registrar is not always straightforward (also remember that there is really no incentive for your previous registrar to help you because you just discontinued its service). An alternative is to unsign the zone before you change registrars, but that option might not always be a viable one. The lack of standards makes it hard to change registrars on a signed domain that is in production.
You must tell your new registrar that you are using DNSSEC, and you must make sure that the registrar supports it. If not, the registrar might accept the transfer but be unable to publish the DNSKEY records. The result would be a DS record published by the registry but no corresponding DNSKEY records at the child, making the zone "security lame" and causing failed validation.
The same types of problems exist if you are running your own name servers. If you change your master server, make sure that you transfer the secret keys as well. Signing with new keys will not work unless you flush out the old keys with rollovers and upload a new DS record to your parent.
Have a plan ready for how to transfer your keys to a new master server.
It is important to adjust your signature validity periods and the Start of Authorit (SOA) timers so that they match your organizational requirements and operational practices. SOAs expire and signature validity periods all too often are too short.
Unless you are restricted by guidelines saying otherwise, you should strive to set the timers reasonably high. Set the timers so that your zones can cope with an outage as long as the longest period that the system might be unattended.
For example, if you know that your top DNS administrator usually has three weeks of vacation in July, you could consider setting the times so that the zone can survive four weeks of downtime. If you are confident in your signing solution and are monitoring your signatures carefully, you might set it a little bit lower.
Signature lifetime is a trade-off between security (low signature lifetimes) and convenience (high signature lifetimes). Setting a really high signature lifetime is convenient from an operational perspective but is less secure. Some organizations such as the IETF use an excessive signature lifetime of one year (dig ietf.org DNSKEY +dnssec | grep RRSIG). This lifetime is clearly not recommended, and they should know better.
Carefully set your signature lifetimes and SOA times to reflect your organization's operational requirements and practices.
A Note on Validation
This article has focused on the authoritative part of DNSSEC. That part includes signing resource records and serving DNS data. The operational challenges with signing data are much greater than the challenges of validating data. To validate data, the only thing you need to do regularly is update your trust anchor file. Make sure you do so. Torbjörn reports several outages when the .se DNSKEY used in the .se trust anchor expired in January 2010. We look forward to the work being done in this area to automate the process.
DNSSEC has been deployed and taken in production for several large and critical domains. It is not hard to implement DNSSEC, but doing so introduces some operational challenges. Those challenges exist both during the implementation phase when the zone is being signed for the first time and during the operation of the zone. Make sure you understand the possible effects of implementation and plan ahead. The following checklist summarizes the most important pitfalls with DNSSEC: