Passwords, part 2

Stage 1: Create

Who creates the password?

The user: This is the normal case. Unfortunately, users tend to create weak passwords.
The system: A strong password is generated by the system and assigned to the user. Unfortunately, these passwords are rarely memorable, making them inconvenient to use. Users might resort to writing them down in places near their machines.
The administrator: Administrators must either generate the password or select in manually. Either way, we're back in one of the cases above.

Here are the top five examples of weak passwords chosen by users in 2012:

password
123456
12345678
abc123
qwerty

Those are consistent with older password hacks. For example, in 2010, Gawker Media (parent of big blog sites), was hacked. Of 250,000 disclosed passwords, about 1% were "123456" and another 1% were "password".

All this raises the question: how can we characterize "strong" passwords? They need to be passwords that are hard for attackers to guess. It turns out we already have such a characterization from our study of cryptography. Recall that the security level of an algorithm is the exponent of the maximum number of guesses required to break an algorithm by brute force attack. When we talked about encryption schemes, the guesses were to find the key, and we implicitly assumed that keys were chosen uniformly at random from the space of all keys. For example, a 128-bit key is from a space that requires 2^128 guesses to search exhaustively.

Using entropy to measure password strength. We can use the idea of the number of guesses required for brute force search for passwords. But passwords aren't bit strings; they're character strings. That makes the math a little more complicated. Suppose there are N characters to choose from, and the password is of length L. Then there are N^L possible passwords. We want to find the security level H of that space. That is, we want an H such that 2^H is equal to the number of possible passwords. (Why use the letter H? Because the concept we're describing is known in the field of Information Theory as Shannon entropy, for which the letter H is traditionally used. And from now on, we'll write "entropy" instead of "security level" when we're talking about passwords.) Let's solve for that H:

      N^L = 2^H
  log N^L = log 2^H
  L log N = H log 2
        H = L (log N / log 2)
        H = L log_2 N

So if passwords are chosen uniformly at random from the lower-case latin alphabet of 26 characters, the entropy of an 8 character password is 8 lg 26 ≈ 37.6 bits. That's very low compared to the minimum security level for keys! Is it enough? According to a 2006 NIST report, the minimum level is 14 bits, and 30 is comfortable. But that material assumes an online attack model, in which attackers interactively guess passwords. In an offline attack, in which attackers have direct access to the password database, a higher level of security is necessary.

The last paragraph began by assuming that passwords are chosen uniformly at random from the space of all passwords—for example, the password is just as likely to be "iZ8#j" as "12345". But humans just don't chose randomly. So the entropy of human-chosen passwords is effectively much less than it would be if the passwords were chosen by a machine. Suppose, e.g., that the average high-school graduate has a vocabulary of around 50,000 words [Nagy and Anderson; Pinker "The Language Instinct"]. What if this person chooses an English word as password? There will be lg 50k ≈ 15.6 bits of entropy. That's low! And it assumes that users choosing randomly over their entire vocabulary, which isn't likely either.

The aforementioned NIST report uses the following heuristic for the entropy of user-selected passwords drawn from the full keyboard:

The entropy of the first character is 4 bits.
The entropy of the next 7 chars is 2 bits per character.
The entropy of the characters 9..20 is 1.5 bits per character.
The entropy of characters 21+ is 1 bit per character.
If the user is forced to use both upper-case and non-alphabetic, characters, give a flat bonus of 6 bits of entropy.
Given a bonus of 0..6 bits, usually about 4, for checking against dictionary. This check must prevent simple transformations of any word in an unabridged English dictionary of at least 50,000 words.

Other heuristics have been proposed, summarized in Schneider and in Bishop. "Simple transformations" above could include deleting vowels, capitalizing some letters, adding suffixes/ prefixes, replacing letters with look-alike numbers, leet speak, and more.

Beyond entropy. Weir et al. (2010) show experimentally that the NIST entropy estimates don't do a good job of predicting how long it will take attackers to crack passwords. Kelley et al. (2012) show that, despite the Weir et al. result, passwords chosen according to the most comprehensive NIST requirements (mixtures of characters kinds, no dictionary words, sufficiently long, etc.) are indeed the passwords that are hardest to crack—call these comprehensive passwords. So the NIST recommendations reach the right conclusion, even if the metric they use isn't valid. But comprehensive passwords are hard to remember and hated by users, leading them to reuse passwords or predictably modify passwords. Could we do better? Here are three options that have been explored:

Mandatory randomness: The user chooses part of the password (perhaps poorly); the system chooses the other part of the password (randomly). The system can mitigate weak choices by users, but there is a danger that users begin choosing even weaker passwords in reaction to the mandated randomness.
Passphrases: Users choose long passwords but without any requirements as to the kinds of characters used. Kelley et al. show that passphrases are quite close in difficulty to comprehensive passwords for cracking, and Komanduri et al. (2011) show that passphrases are easier for humans to create and remember.

Password wallets or managers: Users store their passwords in an electronic wallet, which they open with a single master password. This technique enables users to have many high-strength passwords while having to remember only one.

Beyond passwords

Could we replace passwords with a different authentication mechanism? Bonneau et al. (2012) develop criteria against which to judge proposed new mechanisms:

Security:
- Physical observation: shoulder surfing, video recording, sound recording, thermal imaging
- Targeted impersonation: acquaintance or skilled investigator
- Online guessing: server constrains per-user rate of guess attempts
- Offline guessing: attacker's computational resources constrained
- Internal observation: attacker compromises channels, even any crypto on those channels (keyboard, SSL)
- Leaks: compromise at one account doesn't affect others
- Phishing: simulation of real server doesn't affect others
- Theft: physical object can't be used by another user
- Trusted third party: none
- Privacy: explicit consent, unlinkable
Usability:
- Memoryless: humans don't have to remember secrets
- Scalable for users: any burden should scale to hundreds of accounts
- Nothing to carry: no hardware required, or at least not hardware user doesn't already always carry
- Physically effortless: no typing or physical motions, or at least not beyond pushing a button or speaking
- Easy to learn: no training or reminding
- Efficient: time to authenticate is short, and time to enroll is at least reasonable
- Infrequent errors: low false reject rate
- Easy recovery from loss: including latency, convenience, and assured recovery
Deployability:
- Accessible: physical disabilities and conditions don't prevent use
- Cost: negligible per user, or at least plausible for startups with no per-user revenue stream
- Server compatible: nothing special required on server/verifier end
- Browser compatible: doesn't require (non-standard) plugins
- Mature: well tested and fielded beyond research
- Non-proprietary: published openly and not encumbered by IP

Evaluating many proposed schemes for replacing passwords, Bonnaeu et al. conclude that though they generally offer better security, they tend to offer worse deployability, and usability is sometimes better and sometimes worse. It seems that passwords are here to stay, at least for now. Bonnaeu et al. observe that most of the schemes that compare favorably to passwords involve single sign on.

Single sign on

With single sign on (SSO), a user enrolls with many service providers (SPs), shares authentication secrets, e.g. password, with each SP, but authenticates only once to the SSO service. Thereafter, the SSO manages authentication. Note that the SSO can trivially impersonate the user: the SSO has to be trusted.

Variants of SSO include true SSO, in which the SSO does authentication and the SPs simply trust the SSO when it asserts the identity of a use, and pseudo SSO, in which the SSO impersonates the user to the SP through the SP's own native authentication mechanism. Either way, the SSO could be local to the user's machine or could be running as a remote or proxy service.

Password managers are an example of a typically local pseudo SSO offering a limited degree of automation. Browsers that remember passwords and synch them across machines are an example of something approaching a proxy pseudo SSO. Examples of proxy true SSOs include Kerberos and third-party authentication by Google/Facebook credentials. Local true SSOs are harder to exemplify, as they necessitate the remote SP trusting the user's machine not to lie about the user's identity; a trusted cryptographic co-processor might be needed here to ensure that the user cannot subvert the local SSO.

Exercises

A user is required to choose a 4-digit PIN. The allowed digits are 0..9. Assume the user chooses the PIN randomly. What is the entropy of such a PIN?
Continuing the previous exercise, the user is now required to enter their 4-digit PIN on an unusual keypad with five buttons, each of which is labeled with two digits:
```
+---+---+---+---+---+
|1*2|3*4|5*6|7*8|9*0|
+---+---+---+---+---+
```
To enter either a "1" or a "2" on this keypad, the user presses the "1*2" button only one time. Hence, the system cannot distinguish between "1" and "2". The same is similarly true for the other digits.

What is the entropy of a randomly-selected 4-digit PIN as it would be entered on this keypad?
Let X be such that an X-digit PIN chosen randomly from digits 0..9 has entropy equal to that of a 4-digit PIN chosen randomly from the keypad in the previous exercise. Determine what X is to the nearest integer.
According to the NIST SP 800-63 (2008) heuristics, what is the entropy of a 10-character password chosen (non-randomly) by a user from a standard US keyboard? Assume the user isn't forced to use any upper-case or non-alphabetic characters, and that no dictionary checking is done.
Which of the following policies will produce the highest-strength passwords? Which policy do you think will produce passwords that are easiest to remember? Use the NIST heuristics to evaluate policy 2.
- Policy 1: Users are assigned randomly-generated 6-character passwords, where each character is a lower-case Latin letter (i.e., a-z).
- Policy 2: Users choose their own passwords, which must be at least 12 characters long, where each character may be any character from the full keyboard. Users are not required to use any upper-case or non-alphabetic characters, and no dictionary checking need be done.
- Policy 3: Users are assigned randomly-generated passphrases, where each passphrase is the concatenation of four words randomly chosen from a system dictionary of 2,000 very common words (e.g., "correcthorsebatterystaple").
Consider this claim: "Policies that require user-chosen passwords to include upper-case and non-alphabetic characters are not useful, because they do not make passwords harder to guess: once the attacker learns the policy, she can adjust her guessing strategy accordingly." Evaluate that claim.
Choose any three of the potential replacements for passwords discussed in Bonneau et al. (2012). Analyze each replacement against the criteria of security, usability, and deployability. Do you agree with the assessment made by Bonneau et al. in their Table I?