I recently completed a project requiring a few thousand pre-set-up user accounts. For their passwords, I decided to implement XKCD-style passwords instead of the usual collection of random characters.
I specifically wanted to avoid letting the users choose their own passwords. They would probably use the same password as a related, sensitive system. Keeping the two systems isolated was desirable.
I used a fairly complete dictionary that I pulled from a mailing list (the exact URL eludes me, unfortunately).
Dictionaries contain a lot of offensive words; we’d prefer not to use them for passwords. There’s a fuzzy line for what dictates ‘offensive’, though. The plural of ‘ball’, ‘balls’, can be offensive when combined with the right (or wrong) modifiers.
What is offensive varies across cultures. For example, the Brits have a lot of words which are sort of funny and inoffensive if you’re a native English speaker (e.g. ‘spatchcock’). A large proportion of my users were not native English speakers. Many of the funny British-isms had to go.
Even after screening for offensive individual words, it’s possible to get weird combinations of words that have meaning. After stripping out the obvious swear words and other dangerous (but non-sweary) words, I generated a bunch of random passwords and skimmed through them by hand. This turned up some other dangerous combinations, like ‘hate indian’. Not good.
Once, I encountered this problem with a random character password; somehow the string ‘cute’ snuck into a female user’s password.
Dictionaries contain lots of obscure words and non-words words, like ’re’ or ‘b’. They have no meaning to me and thus no recall value.
Dictionary files are big. They take a long time to read from disk and use a lot of memory. I elected to read the dictionary every time I created a user, which took a good fraction of a second each time.
It would have been smarter to keep it in memory and put some effort into freeing that memory when done. Better yet, generate the passwords offline.
Random word passwords look different to normal passwords, and most users have never encountered a passphrase before.
Many users didn’t recognise the string as a password at all; they emailed saying that they hadn’t received a password, or thought that it was part of a sentence that had been mistyped, or asked what the words meant. One thought that it was a cryptic word puzzle that they had to solve and that once solved, that answer would be the password.
Other users didn’t know where to put it. I had to draw a diagram showing that the passphrase should be typed in just like a regular password. It seems almost comical, in hindsight, but this genuinely reduced the number of support emails that I received.
Long passwords are easy to mistype. I elected to not show the user’s password as they typed, but I think that might be a mistake. Such a long password is easy to get wrong.
A lot of users don’t type the spaces. Some users will type in all caps.
If the password is long and difficult to enter, users will just copy and paste it. This somewhat defeats the purpose of providing a memorable password.
Even copy and paste has its problems. A lot of users will select too many or too few characters at the start and end of the password string, and if they can’t see the password when they paste it, they can’t see the error.
I added a new Django auth backend to strip spaces and lowercase everything. It almost eliminated the “my password isn’t working” emails. I strongly recommend it.
Logging your failed password attempts (securely!) will help a lot with diagnosing these problems.
Quantifying security differences is tricky at the best of times.
In this case, probably not. It wasn’t a system that required a high level of security. Once everyone had logged in at least once, there were no more complaints – but a lot of people (0.5%?) had trouble entering that password correctly once.
In hindsight, perhaps this is a policy best restricted to your own personal password security and not enforced on other people.
Jeff Preshing’s xkcd Password Generator. I recommend that you go and mash the ‘Generate Another’ button. The wordlist is dangerously small (a few hundred, I’m guessing), but it’s still easy to generate offensive passwords.
Correct Horse Battery Staple: Another, slightly more paranoid option.
xkpasswd: More paranoia again.
I find the addition of numbers and punctuation to be a bit odd; the whole point of using words is that you get sufficient entropy for your password without having to resort to difficult-to-memorise features such as numbers and punctuation.
Choosing A Secure Password by Bruce Schneier. Long, but an excellent read.
For a later group of users, I used standard Django random passwords (a sequence of 8 random numbers and letters).
Not one user complained that their password was not being accepted. A few couldn’t figure out where to type it in (even with the helpful image!) but they could all readily identify the password.