There are three fundamental problems with the legacy DNS system:
There are significant but non-fundamental problems as well. Namely, the system is vulnerable to abuse by namespace operators and compromises from the inside. And it depends heavily on manual setup and administration, which is expensive and can lead to errors.
Security of Legacy DNS:
What makes the legacy DNS vulnerable to denial-of-service attacks?
Legacy DNS assigns names to a designated, typically small, set of nameservers. As of this writing (March 2005), all names in the etrade.com domain, for instance, are served by two machines. A failure of those two machines is sufficient to disable all access to the etrade.com domain.
Doesn't replication in legacy DNS address failures?
Legacy DNS supports nameserver replication, but in practice nameowners do not employ, or lack the resources for, extensive replication. More than 79% of names are served by just two nameservers. While two servers can guard against random failures such as a disk crash, it is relatively easy to bring them both down simultaneously with a targeted attack. The network serves as a common vulnerability. In many cases, replicated nameservers are behind the same bottleneck links.
In a DoS attack, isn't it possible to determine which queries are legitimate packets sent by users and which are synthetic queries? Can't a server tell which queries are part of a DoS attack and simply ignore them?
No. For one, many legitimate queries stem from automated programs. For another, DNS is a lightweight, connectionless protocol - it was not designed to authenticate the originator of a query. It is trivial to launch a DoS attack where the stream of synthetic queries sent to the targeted name servers are indistinguishable from those of a real user.
How can a namespace operator abuse DNS? What is an insider attack?
In legacy DNS, a response is trusted purely based on the server from which it originates. Thus the owner of that server has full authority over the domains that that server serves. It is possible for a namespace operator to abuse this trust, for instance, by manufacturing bogus responses to queries in order to direct a clickstream to their servers. An overt approach to doing this, such as VeriSign's SiteFinder, is likely to create a large backlash, but a compromised insider can misdirect clients to malicious servers with little chance of being exposed.
For instance, someone at VeriSign, which operates the .COM servers, or someone at an ISP that serves VeriSign, could manufacture false responses to queries from a particular client (say, a high net worth indivudual) and redirect them to a fake site in order to acquire their passwords. End-to-end encryption is ineffective when one of the ends can be relocated through DNS.
What can a determined attacker do to or through DNS?
About 17% of the nameservers on the Internet contain a significant vulnerability. A determined attacker can compromise nameservers and redirect web surfers to servers of their choice. Even without compromising servers, an attacker that launches a DoS attack can bring down parts or all of the namespace hierarchy with relative ease.
Is it really possible to take down the Internet by taking down DNS in practice?
Yes. The preceding discussion points out the theoretical vulnerabilities. And there is a simple proof of how practical and easy such attacks are: Someone tried to take down DNS and almost succeeded in October 2002. Specifically, a person or group of people whose identities are still unknown launched a coordinated distributed denial-of-service attack against the root nameservers. The attack disabled seven out of the thirteen servers by inundating them with requests, presumably from "zombie" machines under the attacker's control. We were logged on at the time and could not contact many web sites, despite the claims in the preceding article that end-users did not notice a slowdown. Simple web browsing was very painful. The article states "many believe that if the attack was sustained for a longer period, the effects could have been catastrophic." The root servers have been replicated more widely since the attack, but their resilience to massive attacks remains untested and a similar attack could take place at any moment. It would almost certainly succeed if it targeted a smaller portion of the DNS namespace, e.g. .MIL.
Are DNS poisoning attacks common, and can they affect people ?
Yes, they are common and can affect end users. This article on how attackers are using DNS poisoning to spread malware and the ensuing discussion illustrate the point. Even recently, New York's oldest ISP got domain-jacked, and someone hijacked UK's Tory party website and loaded it with hardcore pornography.
What if we widely deployed DNSSEC? Would that fix these problems?
It would fix some, but not the most significant, of the security problems. If DNSSEC could be widely deployed, it would be harder for attackers to forge records. However, DoS attacks would still be possible, as DNSSEC also relies on the same static nameserver hierarchy as legacy DNS. It might actually be easier to launch DoS attacks with DNSSEC, as the servers have to do more work in response to queries. Most DNSSEC implementations rely on online secrets, so a compromised host would lead to compromised records and redirection attacks similar to the ones described above. DNSSEC would certainly increase DNS name resolution latency, exacerbating the performance problems mentioned below. And it would not help with dynamic updates.
Performance of Legacy DNS:
How fast is legacy DNS in resolving names? Is its performance a bottleneck?
Recent studies estimate that about 10 to 30 percent of the time spent downloading web pages is spent on DNS resolution [Bent & Voelker 02, Wills and Shang 00, Huitema and Weerahandi 00].
The DNS latency is quite visible when you click on a link you have not visited before and there is a long delay while your browser displays "Resolving name."
Dynamic Updates in Legacy DNS:
What are dynamic updates? Why should one care about if they can be performed at all, or how fast they propagate?
Dynamic updates refer to unanticipated modifications to name-IP bindings. They are critical for fast response to emergencies.
Suppose that you own www.foo.com, a financial brokerage where millions of dollars are traded every hour. For whatever reason (flooding in the server room, electricity outage, terrorist attack, backhoe over the wrong set of wires, etc.), your servers in NYC are unreachable. You will need to immediately move your service to your backup facility in NJ, designed specifically for these kinds of circumstances.
You can't do on-the-fly service relocation with legacy DNS. Your clients will have cached the IP addresses of your servers, and will not discover that your service has relocated until the cache timeout expires. You could pick a very low timeout, in which case your clients will exert undue load on your nameservers, and incur unnecessary delays for your clients, even when everything is working smoothly. Typically, many services will use relatively long timeouts to avoid the DNS load and latency. For instance, ETrade uses DNS timeouts ranging up to half an hour. Such a long disconnection can have significant financial consequences.
Verisign has recently announced "rapid updates" to the top level domains. What does this mean? Does it fix the dynamic update problem mentioned above?
Verisign operates the servers for .COM and .NET. As of this writing (July 2004), Verisign updates the name to IP address bindings on these hosts twice a day. Starting on September 8 2004, Verisign is planning to update this data more frequently.
What this means is that, if you buy a new name, it will be visible soon after purchase. It does NOT mean that changes to an existing name will be visible to clients. Legacy DNS is not cache-coherent; it relies on timeouts - changes to a name may not be visible to clients who have previously accessed and cached a given record for as long as the timeout dictates. In most cases, this is on the order of 24 hours.
In short, rapid updates are good to have for new name owners, but they do not address the dynamic update problem as they do not allow existing servers to be quickly relocated to alternative addresses.
Manual Administration in Legacy DNS:
What kinds of errors stem from the manual setup and administration of
the legacy DNS?
DNS is fragile because it requires consistent configurations across multiple machines, but relies on humans for the consistency. As a result, it suffers from broken delegation chains, lame delegations, and misconfigured servers. A broken delegation chain consists of a series of nameservers where following the nameservers as dictated by the DNS standard does not lead to successful resolution of the name to an IP address or NXDOMAIN record. These can arise when a nameserver is down, when the name of a nameserver is misspelled, or when there is a cycle in the delegation chain. A lame delegation is a delegation of a domain to a nameserver that is not configured to serve that domain. Misconfigured servers are sets of servers that disagree about the name to IP address binding of a domain name, where the disagreement is non-superficial (i.e. if the disagreement is about which subset of replicas a name maps to, it is superficial; if a server thinks a host resides at an address where it clearly does not, it is a misconfiguration).
Are errors stemming from manual setup significant?
Yes. Approximately 1% of the domain names we surveyed suffer from a manual configuration error. Such errors are not confined to the "unwashed masses;" even the top-500 most popular domain names suffer from such errors. Others [Pappas et al. 04] have examined the prevalence of such errors and independently conclude that they are common and significant. Typically, these errors remain latent because they are masked by a correctly functioning server. They often surface following the failure of a primary server.
What's the root cause behind these problems with the legacy DNS?
The security, performance and dynamic update problems all stem from the static, hierarchical nature of the legacy DNS. A given domain is entrusted to a small set of servers, whose failure, compromise or misuse affects an entire hierarchy. The delegations are fragile, rely solely on physical handoff from server to server, and only the source IP address is used to authenticate the validity of a response.
Can the DNS security and performance really be this bad? Do you have more detailed data?
Unfortunately, it is this bad. The SIGCOMM paper titled "Impact of Configuration Errors on DNS Robustness" by Pappas et al. provides a more detailed analysis, so do our papers.
CoDoNS:
How can we improve the security, performance and functionality of the
legacy DNS?
The legacy DNS needs a safety net. It's all too easy to compromise servers, misdirect clients, and launch DoS attacks on the system. We need a backup in case portions of the DNS hierarchy are attacked, again.
How does CoDoNS help?
CoDoNS tolerates denial-of-service attacks by dynamically distributing the load across a large collection of servers. It provides better performance than legacy DNS. And it allows updates to be disseminated within a few seconds throughout the entire system.
What does it take to use CoDoNS?
Relatively little. Just point your resolver to the DNS server on this list that is physically closest to you. You can point your resolver by editing /etc/resolv.conf on a Linux box, by modifying your DNS server settings in Windows, or by changing a setting in your DHCP server for clients who receive their DNS settings through DHCP.
What will I notice when I make the switch over?
You will see the same namespace as the legacy DNS, except your resolutions should be faster on average.
How does CoDoNS resist denial-of-service attacks?
CoDoNS is a flat, peer-to-peer system that acts as a single, coordinated cache. Excess load is shed off to peers automatically via replication of the affected records. Failed nodes are automatically elided out of the peer network and replaced by adjacent nodes that take over their functionality.
How does query resolution latency compare with CoDoNS versus legacy DNS?
At several educational institutions with good connections to the Internet (i.e. under the best possible circumstances), the median time to resolve a name using the legacy DNS is around 36ms. In contrast, the median time to resolve a name using CoDoNS is around 6ms. Your mileage may vary depending on your distance to a CoDoNS server - if there is no nearby CoDoNS server to where you are on the Internet, let us know; we would like to have distributed servers evenly covering all connection points.
How does CoDoNS provide fast updates?
CoDoNS keeps track of every copy of every DNS record (efficiently, based on the underlying structure of the peer network). When a record needs to be updated, CoDoNS distributes the new record to all nodes that have a copy of the old record. A single integer per object is sufficient to locate all copies of an object. In contrast, legacy DNS caches are opportunistic and uncoordinated - a complete traversal of all name caches would be required to find every copy of a given object.
How does CoDoNS help with manual configuration errors?
CoDoNS is a self-configuring overlay. Authenticity of records is checked based on cryptographic signatures. Problems stemming from bad delegations and misconfigured servers are eliminated entirely by replacing physical delegations with cryptographic equivalents. The network automatically detects and heals around any failed nodes.
What's the relationship between CoDoNS and DNSSEC?
CoDoNS implements the DNSSEC standard. All records in CoDoNS are signed by a CoDoNS master key. A CoDoNS deployment would provide a head-start for wider DNSSEC adoption - there would be a set of servers from which signed, tamperproof names can be acquired without having to wait for nameowners to switch to DNSSEC.
Will we need to purchase new names if the world switches over to CoDoNS?
No. CoDoNS serves the same name hierarchy as legacy DNS. The namespace will remain identical to what it was before.
Does CoDoNS support DNS trickery, such as dynamic server selection the way Akamai does it?
Yes. These tricks are often proprietary, and need to be performed on designated nameservers that the nameowner trusts. CoDoNS detects low-TTL records and keeps a forwarding pointer to the originating server. CoDoNS makes no effort to ship this functionality out of such trusted nameservers; instead, it ships the queries to the designated nameservers.
CoDoNS DNS Survey:
Questions related to our DNS survey have their own FAQ.
CoDoNS Deployment:
ISPs are welcome to join CoDoNS. Membership in the overlay is free and open.
All you need is a modest server box (>1GHz processor, >512MB
RAM, >2GB disk, CD or DVD-ROM, static IP address). Please let us know
that you want to join by email and we will give you an update on
project status and when we expect to ship you a bootable CD.
I am interested in running a CoDoNS node locally. Can I join the CoDoNS overlay? What is required?