Lecture notes by Tom Roeder
Methods for authenticating people differ significantly from those for authenticating machines and programs, and this is because of the major differences in the capabilities of people versus computers. Computers are great at doing large calculations quickly and correctly, and they have large memories into which they can store and later retrieve Gigabytes of information. Humans don't. So we need to use different methods to authenticate people. In particular, the cryptographic protocols we've already discussed are not well suited if the principal being authenticated is a person (with all the associated limitations).
All approaches for human authentication rely on at least one of the following:
The idea here is that you know a secret --- often called a password --- that nobody else does. Thus, knowledge of a secret distinguishes you from all other individuals. And the authentication system simply needs to check to see if the person claiming to be you knows the secret.
Unfortunately, use of secrets is not a panacea. If the secret is entered at some sort of keyboard, an eavesdropper ("shoulder surfing") might see the secret being typed. For authenticating machines, we used challenge/response protocols to avoid sending a secret (key) over the wire where it could be intercepted by a wiretapper. But we can't force humans to engage in a challenge/response protocol on their own, because people cannot be expected to do cryptographic calculations.
Furthermore, people will tend to choose passwords that are easy to remember, which usually means that the password is easy to guess. Or they choose passwords that are difficult to guess but are also difficult to remember (so the passwords must be written down and then are easy for an attacker to find).
Even if a password is not trivial to guess, it might succumb to an offline search of the password space. An offline search needs some way to check a guess without using the system itself, and some methods used today for storing passwords do provide such a way. (See below.)
Finally, changing a password requires human intervention. Thus, compromised passwords could remain valid for longer than is desirable. And there must be some mechanism for resetting the password (because passwords will get forgotten and compromised). This mechanism could itself be vulnerable to social-engineering attacks, which rely on convincing a human with the authority to change or access information that it is necessary to do so.
With all these concerns about passwords, you might wonder what is required for a password to be considered a good one. There are three dimensions, and they interact so that strengthening one can be used to offset a weakness in another.
When passwords are used for authenticating a user, the system must have a way to check whether the password entered is valid. Simply storing a file with the list of usernames and associated passwords, however, is a bad idea because if the confidentiality of this file were ever compromised all would be lost. (Similarly, backup copies of this file would have to be afforded the same level of protection, since people rarely ever change their passwords.) Better not to store actual passwords on-line. So instead we might compute a cryptographic hash of the password, and store that. Now, the user enters a password; the system computes a hash of that password; and the system then compares that hash with what has been stored in the password file.
Even when password hashes instead of actual passwords are what is being stored, the integrity of this file of hashes must still be protected. Otherwise an attacker could insert a different hash (for a password the attacker knows) and log into the system using that new password.
The problem with having a password file that is not confidential --- even if cryptographic hashes are what is being stored --- is the possibility of offline dictionary attacks. Here, the attacker computes the hash of every word in some dictionary and then compares each hash with the stored password hashes. If any match, the attacker has learned a password. An alternative to confidentiality for defending against offline dictionary attacks is use of salt. Salt is a random number that is associated with a user and is added to that user's password when the hash is computed. With high probability, a given pair of users will not have the same salt value. And the system stores both h(password + salt) and the salt for each account.
Salt does not make it more difficult for an attacker to guess the password for a given account, since the salt for each account is stored in the clear. What salt does, however, is make it harder for the attacker to perpetrate an offline dictionary attack against all users. When salt is used, all the words in the dictionary would have to be rehashed for every user. What formerly could be seen as a "wholesale" attack has been transformed into a "retail" one.
Salt is used in most UNIX implementations. The salt in early versions of UNIX was 12 bits, and it was formed from the system time and the process identifier when an account is created. Unfortunately, 12 bits is hopelessly small, nowadays. Even an old PC can perform 13,000 crypt/sec, which means such a PC so can hash a 20k word dictionary with every possible value of a 12 bit salt in 1 hour.
Another defense against offline dictionary attacks is to use secret salt (invented by Manber and independently by Abadi and Needham). In this scheme, we select a small set of possible "secret salt" values from a large space. The password file then stores for each user: userid, h(password, public salt, secret salt), public salt. Note that the value of the secret salt used in computing the hash is not saved anyplace. When secret salt is being employed, a user login involves having the system guess the value of secret salt that was used in computing the stored, hashed password; the guess involves checking through the possible secret salt values. The effect is to make computing a hashed password very expensive for attackers.
We now outline several widely-used password systems.
Note that if upper and lower case were both allowed, then there would be (2 x 26) + 10 = 62 possible characters and thus 627 = 3,512,614,606,208 possible values, which is 100 times greater than the LanMan value.
The NT hash is somewhat better. In the NT operating system, there was still a 14 character limit, although this limit was removed in Windows 2000 and XP. The password is then passed through 48 iterations of MD4 to get a 128 bit hash. This hash is stored in the system, but no salt is used at all.
Given schemes that make passwords hard to guess, an attacker might be tempted to try theft. The attack is: install some some sort of program to produce a window that resembles a login prompt or otherwise invites the user to reveal a password. Users will then type their passwords into this program, where the password is saved for later use by the attacker.
How can you defend against such attacks? What we would like is some way for a user to determine the pedigree of any window purporting to be a login prompt. If each point in the pedigree is trusted, then the login prompt window must be trusted and it is safe to enter a password. This idea is called a trusted path.
To implement a trusted path, the keyboard driver recognizes a certain key sequence (Ctl-Alt-Del in Windows) and always then transfers control to some trusted software that displays a (password prompt) window and reads the contents. Users are educated to type passwords only into windows that appear after typing that special key sequence.
Notice, however, that this scheme requires that a trusted keyboard driver is executing. So, that means the system must be running an operating system that is trusted to prevent keyboard driver substitutions. One might expect that rebooting the machine would be a way to ensure that a trusted operating system is executing (presuming you trust whatever operating system is installed), but what if the OS image on the disk had been altered by an attacker? So, one must be certain that the operating system software stored on the disk has not been modified, too. But even that's not enough. What about the boot loader, which might have been altered to read a boot block from a non-standard location on the disk? And so it goes. Even if you start each session by booting from your own fresh OS CD, a ROM or even the hardware might have been hacked by an attacker. Physical security of the hardware then must also have been maintained. In the end, though, to the extent that you can trust all layers from the hardware to the keyboard driver, the resulting trusted path provides a way to defend against attacks implemented by programs that attempt to steal passwords by spoofing.
Instead of basing authentication on something a principal knows and can forget, maybe we should base it on something the principal has. Various token/card technologies support authentication along these lines. For all, 2-factor authentication becomes important --- an authentication process that involves 2 independent means of authenticating the principal. So, we might require that a principal not only possess a device but also know some secret password (often known as a PIN, or personal identification number). Without 2-factor authentication, stealing the device would allow an attacker to impersonate the owner of the device; with 2-factor authentication, the attacker would still have another authentication burden to overcome.
Here are examples of technologies for authentication based on something a principal might possess:
Short PINs are problematic. First, they admit guessing attacks. Banks defend against this by limiting the number of guesses before they will confiscate the card. Second there is the matter of how to check if a PIN that has been entered is the correct one. Storing the PIN on the card's magnetic stripe is not a good idea because a thief who steals the card can easily determine the associated PIN (and then subvert the 2-factor authentication protocol). Storing an encrypted copy of the PIN on the card's magnetic stripe does not exhibit this vulnerability, though.
There are two types of RF proximity cards: passive and active. The former is not powered, and use the RF energy from the requester to reply with whatever information is being stored by the card. The latter is powered and broadcasts information, allowing anyone who is in range and has a receiver to query the card. You could imagine that if RF tags are put into passports, then some people might start carrying them in special Faraday-cage passport holders, because now an interloper can learn about someone without the victim's knowledge (or permission).
One prevalent form of smartcard is the RSA secure id. It continuously displays encrypted time; and each RSA secure id encrypts with a different key. Whoever has an RSA secure id card responds to server challenges by typing the encrypted time (so, in effect, it is secret) --- a server, knowing what key is associated with each user's card, can then authenticate a user. (The server must be somewhat generous with respect to what times it will accept. Accept too many and replay attacks become possible; accept too few and message delivery delays and execution times prevent people from authenticating themselves).
Since people forget things and lose things, one might contemplate basing an authentication scheme for humans on something that a person is. After all, we recognize people we interact with not because of some password protocol but because of how they look or how they sound --- "something they are". Authentication based on "something you are" will employ behavioral and physiological characteristics of the principal. These characteristics must be easily measured accurately and preferably are things that are difficult to spoof. For example, we might use
Methods to subvert a fingerprint reader give some indication of the difficulties of deploying unsupervised biometric sensors as the sole means of authenticating humans. Attacks include:
There are several well known problems with biometric-based authentication schemes:
The literature on biometric authentication uses the following vocabulary to characterize what a scheme does and how well it works:
Having looked at all these methods for authentication, we can see that as a secondary form of authentication (but not identification!) biometrics might be promising. The most likely form of authentication in the future, however, will be a combination of something you have and something you know. Passwords will be around for a long time yet.