Lecture notes by
Borislav Deianov
Authentication is the process by which the identity of someone or something gets established. We distinguish between two cases -- machines and humans -- because the capabilities of these two classes are quite different. Humans are not good at memorizing things or calculating. (Humans are also cranky, have a narrow operating temperature range, and are expensive to maintain). Machines have large memories and greater computational prowess, but have a different set of weaknesses.
All approaches for authenticating humans are based on three general techniques:
Passwords are a form of secret. The consequence of equating identity with knowledge of a secret is that if someone learns the secret then they can steal the identity. With this in mind, we see that password-based authentication exhibits the following vulnerabilities:
There are two kinds of password-guessing attacks -- online and offline. We consider them separately.
In an online attack, the system itself is used to determine whether guesses are right. To protect against online attacks, we can slow down the rate of guessing or allow only a limited number of guesses (example: 3 wrong guesses of a PIN and the ATM eats your card). Allowing only a limited number of guesses before an account is locked-out has the disadvantage that it enables a denial of service attack. Specifically, the attacker locks legitimate users out of the system by trying to guess those users' passwords. We need to find a compromise between two possible evils: Evil 1 is when somebody, who should not be able to, accesses the system; evil 2 is when somebody, who should be able to access the system, cannot.
In an offline attack, some other method is used to verify the correctness of a guess (not the system itself) before that guess is actually used in attacking the system. The form of attack will depend on how passwords are stored.
One simple approach is to store username/password pairs in a file. To penetrate such a system, we only need to read the password file. Therefore the password file must not be readable. But now the security of the entire system depends on the protection of this one file, including protection of back-up tapes that might contain the file, etc. This scheme is unsatisfactory.
A better idea is to store only encrypted passwords. We store a file containing username/encrypted password pairs. On login, the typed password is encrypted with a cryptographic hash function and compared with what is stored password in the file. A cryptographic hash function h has the following properties:
Storing encrypted passwords is still vulnerable to what is called a dictionary attack. Suppose that the file with encrypted passwords is readable. If the attacker knows what hash function was used, then he/she can hash every word in a dictionary and compare the result to the file with encrypted passwords, thus finding all passwords that are words from the dictionary.
To defend against dictionary attacks we can use salt. "Salt" in cryptography is random stuff you add to plaintext before encrypting. Now in the password file we store: username, rrrr, h(password + rrrr). Here rrrr is the salt. And the dictionary attack no longer works, since the attacker will need to hash every word with every possible salt. For example, UNIX uses 12 bits of salt, thus making a dictionary attack 2048 times more difficult than without the salt. However, the PC on Boris's desktop can perform approximately 13,000 crypt()'s per second. This translates to hashing a 20,000 word dictionary with every 12-bit salt combination in under an hour. An offline attack is quite plausible now, since people don't change their passwords every hour. The obvious solutions are to use a more (longer) salt or have the system refuse crackable passwords.
How long should a password be? Today, guessing a 64 bit random bit string is considered intractably hard. How many characters of password is equivalent to 64 bits? If you use upper and lower case letters, and 10 digits you get approximately 6 bits per character. That suggests strings of 13 characters should work. Unfortunately, people can't remember arbitrary 13-character sequences. English words -- which are far from arbitrary sequences -- carry 1.2 bits of information per character, on average. This translates into 49 characters or 10 words of English text to get the desired 64 bits.