Interestingly, US government sites are generally well-run, but we found 321 .GOV sites and 504 .MIL sites with vulnerabilities, including:
There is no clear pattern to the vulnerabilities. The problems seem randomly distributed over the space of names.
You claim that only 17% of nameservers are running old software with known exploits, yet 30% of names are vulnerable. How could that be?
It all has to do with the role the nameservers play in name delegations. A simple example might help. Suppose there are only two nameservers and 100 domain names on earth. Server A serves ten names, and server B serves ninety names. If B is running an old version of BIND with a known exploit, 50% nameservers have known exploits, yet 90% of names are vulnerable. A slightly more complicated example involves A, B and C. A serves 10 names, B serves 89 names, C serves B's name. If C has a known exploit, 33% of nameservers are vulnerable, yet 90% of names are potentially affected.
Won't DNSSEC deployment fix the security problems in DNS?
No. DNSSEC is better than nothing, but it's not a complete fix to the DNS security problems. In our SIGCOMM'04 and IMC'05 papers, we identified two separate but intertwined problems about the shape of the DNS dependency graph. One problem is that there are many nameservers with known exploits which allow scripted attacks against nameservers, while another problem is that the name delegation graph has small graph cuts which allow denial-of-service attacks. Clever attackers can combine these two types of attacks to extend their reach. DNSSEC addresses the former problem, but it does nothing to protect against denial-of-service attacks against DNS servers.
Don't glue records keep client resolvers from having to explore large portions of the name delegation graph, and therefore avoid the problems you mention?
The short answer is no, the glue does not eliminate the dependencies in delegation graphs, and will not by itself compensate for vulnerabilities.
Glue records are non-authoritative IP address bindings (A records) that are provided for the nameservers (NS records) that appear in a DNS response. In essence, a nameserver (let's call it DNS0.tld) providing glue to a client C says "The authoritative name server for the host you are looking for (www.SITE.tld) is DNS1.tld. I don't know SITE's IP address, you should ask DNS1 instead. And by the way, DNS1 is at the IP address 1.2.3.4." The last statement is made by virtue of the glue record.
DNS1's name is typically delegated to some other nameserver, DNS2.tld. DNS0, by providing the address of DNS1, makes it unnecessary for client C to consult DNS2. (Incidentally, the client should be wary of trusting glue records unconditionally, as they are non-authoritative. A well-known cache poisoning attack (documented here) works by tricking clients to believe glue records for all time and for all queries. Glue should be trusted for only the lookup in question for only the duration of the lookup).
A resolver that trusts glue records can, indeed, avoid independently exploring the portion of the name delegation graph that lies behind DNS1. To a naive observer, it may seem as if the problem is solved. But the place where poisoning takes place merely got moved out of the client resolvers.
So why doesn't glue by itself solve the problem?
The information in the glue record has to come from somewhere. It is true that the client does not have to discover it independently, thanks to the glue. But the glue provider, DNS0, had to acquire the binding somehow, initially as well as periodically as it expires.
There are only three ways in which a nameserver can acquire records that serve as glue:
Glue chasing nameservers are completely vulnerable to problems of transitive trust. In fact, attacks are even easier to launch against glue chasing nameservers, as the clever attacker need only launch her attack when glue is about to expire. She can time her attack, DoS any non-compromisable nameservers that serve DNS1's name, force DNS0 to have to inquire about DNS1's binding from a nameserver she has compromised, and thus extend her reach to all clients that consult DNS0 for names served by DNS1, without breaking into DNS1.
An attacker which has compromised DNS2 via a remote exploit can poison DNS0 unless the glue is transferred via a zone transfer, unless the zone transfer is signed, and unless the signing keys are kept offline. That's three separate conditions. If the keys are kept online, for instance, an attacker that has broken into DNS2 can simply modify the database to change the IP address for DNS1, sign it and pass it onto DNS0, which will serve poisoned glue and allow the attacker to hijack SITE, without having to break into DNS1.
With glue-chasing disabled, signed zone transfers, and offline keys, it is possible for there to be vulnerable paths in the delegation graph that are never exercised. External observers of DNS servers cannot tell if zone transfers are used, if the records are signed, if the keys are offline, though an attacker that breaks into nameservers can find out easily. The readers can decide what the chances are that someone who is running a version of BIND with a known exploit correctly set up signed zone transfers with offline keys properly. We conservatively consider all names downstream of such hosts potentially affected.
So, no, glue is not a panacea.
You seem to imply that nameservers in the .EDU domain which play a large role in name delegation graphs are dangerous. I know the folks who operate the servers at X.EDU and they do a terrific job!
At an educational domain ourselves, we also know some of the same folks and realize first-hand how hard they work under competing time pressures. The issue is not that educational nameservers are more vulnerable. It's that such nameservers should not play a large role in the resolution of unaffiliated names. Educational institutions (say, University of Oregon) has no fiduciary responsibility to people who own DNS names (say, in the Ukraine), yet may well be in a position to control large sections of the same namespace (University of Oregon appears in the dependency graph of all names in the Ukrainian namespace). This creates two problems: educational nameservers become more prominent targets because they play a large role in DNS dependency graphs, and pose a legal liability for the university should a nameserver get compromised.
Your survey examined BIND version numbers as reported by the nameservers. The nameservers might be reporting incorrect version numbers.
True, we did not break into the nameservers to verify the presence of exploits, as that is illegal. The default behavior for BIND is to truthfully report version numbers. While it is possible for a production nameserver to pretend to have a flaw when, in fact, it does not, this requires extra effort and makes little sense. Perhaps some of the ~27000 nameservers we identified with known exploits are honeypots; chances are small that all of them are.
We, an inside group of DNS system administrators, knew about these problems already.
Let's suppose such a cabal existed, knew about the problems but did not take steps to address them (which mounts to criminal negligence, but no matter). Ut isn't sufficient for some people to be aware of potential problems with transitive trust in DNS. The architecture of DNS implies that the namesystem will not be secure until all administrators are aware of, and take active steps to avoid, problems.
Your CoDoNS system proposes to use a peer-to-peer distributed hash table (DHT) to serve DNS. Why would a DHT make sense for serving DNS?
DNS is already a large distributed hash table, albeit with poor failure resilience against DoS attacks, slow lookup performance, and no support for unplanned record updates. It requires substantial manual effort to create a secure namespace. Administering DNS is not only difficult and expensive, but manual administration can lead to inconsistencies and errors. These are not surprising, given that DNS was designed over 25 years ago when we did not know much about building failure-resilient, high performance, self-organizing distributed systems. We now know how to do better, and the time is ripe for rethinking the architecture of the naming system.