How to Memorize a Random 60-Bit String

User-generated passwords tend to be memo-rable, but not secure. A random, computer-generated 60-bit string is much more secure. However, users cannot memorize random 60-bit strings. In this paper, we investigate meth-ods for converting arbitrary bit strings into English word sequences (both prose and poetry), and we study their memorability and other properties.


Introduction
Passwords chosen by users (e.g., "Scarlet%2") are easy to remember, but not secure (Florencio and Herley, 2007). A more secure method is to use a system-assigned 60-bit random password, such as 0010100010100...00101001. However, this string is hard to memorize. In this paper, we convert such strings into English phrases, in order to improve their memorability, using natural language processing to select fluent passphrases.
Our methods are inspired by an XKCD cartoon 1 that proposes to convert a randomly-chosen 44-bit password into a short, nonsensical sequence of English words. The proposed system divides the 44-bit password into four 11-bit chunks, and each chunk provides an index into a 2048-word English dictionary. XKCD's example passphrase is correct horse battery staple: -> staple The four-word sequence is nonsense, but it is easier to memorize than the 44-bit string, and XKCD hypothesizes that users can improve memorability by building an image or story around the four words.
In this paper, we investigate other methods for converting a system-generated bit string into a memorable sequence of English words. Our methods produce whole sentences, e.g.
Fox news networks are seeking views from downtown streets.
as well as short poems, e.g.
Diversity inside replied, Soprano finally reside.
We also move to 60-bit passwords, for better security. One source claims: As of 2011, available commercial products claim the ability to test up to 2,800,000,000 passwords a second on a standard desktop computer using a highend graphics processor. 2 If this is correct, a 44-bit password would take one hour to crack, while a 60-bit password would take 11.3 years.
Our concrete task is as follows: Joanna kissing verified soprano finally reside Diversity inside replied retreats or colors justified Surprise celebrity without the dragging allison throughout Table 1: Comparison of methods that convert system-assigned 60-bit strings into English word sequences. Average word lengths range from 4 (XKCD) to 15 (First Letter Mnemonic). Average character lengths include spaces. LM score refers to the log probability assigned by a 5-gram English language model trained on the Gigaword corpus. Capacity tells how many English word sequences are available for an individual 60-bit input string.
• Output: An English word sequence with two properties: -It is memorable.
-We can deterministically recover the original input 60-bit string from it.
This implies that we map 2 60 distinct bit strings into 2 60 distinct English sequences. If a user memorizes the English word sequence supplied to them, then they have effectively memorized the 60-bit string.

Password Generation Methods
We now describe our baseline password generation method, followed by four novel methods. In Section 3 we experimentally test their memorability.

XKCD Baseline
Our baseline is a version of XKCD. Instead of a 2048-word dictionary, we use a 32,7868-word dictionary. We assign each word a distinct 15-bit code.
At runtime, we take a system-assigned 60-bit code and split it into four 15-bit sequences. We then substitute each 15-bit segment with its corresponding word. By doing this, we convert a random 60-bit code into a 4-word password.
The first row of Table 1 shows three sample XKCD passwords, along with other information, such as the average number of characters (including spaces).

First Letter Mnemonic
XKCD passwords are short but nonsensical, so we now look into methods that instead create longer but fluent English sentences. We might think to guarantee fluency by selecting sentences from an alreadyexisting text corpus, but no corpus is large enough to contain 2 60 (∼ 10 18 ) distinct sentences. Therefore, we must be able to synthesize new English strings.
In our first sentence generation method (First Letter Mnemonic), we store our input 60-bit code in the first letters of each word. We divide the 60-bit code into 4-bit sections, e.g., '0100-1101-1101-...'. Every 4-bit sequence type corresponds to an English letter  or two, per Table 2. We build a word-confusion network (or "sausage lattice") by replacing each 4-bit code with all English words that start with a corresponding letter, e.g.: This yields about 10 74 paths, some good (is my frog. . . ) and some bad (income miner feast. . . ).
To select the most fluent path, we train a 5-gram language model with the SRILM toolkit (Stolcke, 2002) on the English Gigaword corpus. 3 SRILM also includes functionality for extracting the best path from a confusion network. Table 1 shows sample sentences generated by the method. Perhaps surprisingly, even though the sentences are much longer than XKCD (15 words versus 4 words), the n-gram language model (LM) score is a bit better. The sentences are locally fluent, but not perfectly grammatical.
We can easily reconstruct the original 60-bit code by extracting the first letter of each word and applying the Table 2 mapping in reverse.

All Letter Method
Most of the characters in the previous methods seem "wasted", as only the word-initial letters bear information relevant to reconstructing the original 60-  bit string. Our next technique (All Letter Method) non-deterministically translates every bit into an English letter, per Table 3. Additionally, we nondeterministically introduce a space (or not) between each pair of letters. This yields 4 · 10 84 possible output strings per input, 3 · 10 56 of which consist of legal English words. From those 3 · 10 56 strings, we choose the one that yields the best word 5-gram score.
It is not immediately clear how to process a letterbased lattice with a word-based language model. We solve this search problem by casting it as one of machine translation from bit-strings to English. We create a phrase translation table by pairing each English word with a corresponding "bit phrase", using Table 3 in reverse. Sample entries include: din ||| 1 0 1 through ||| 1 0 0 0 0 0 0 yields ||| 1 0 0 1 1 1 We then use the Moses machine translation toolkit (Koehn et al., 2007) to search for the 1-best translation of our input 60-bit string, using the phrase table and a 5-gram English LM, disallowing re-ordering. Table 1 shows that these sentences are shorter than the mnemonic method (11.8 words versus 15 words), without losing fluency.
Given a generated English sequence, we can deterministically reconstruct the original 60-bit input string, using the above phrase table in reverse.

Frequency Method
Sentence passwords from the previous method contain 70.8 characters on average (including spaces). Classic studies by Shannon (1951) and others estimate that printed English may ultimately be compressible to about one bit per character. This implies we might be able to produce shorter output (60 characters, including space) while maintaining normal English fluency. Our next technique (Frequency Method) modifies the phrase table by assigning short bit codes to frequent words, and long bit codes to infrequent words. For example: din ||| 0 1 1 0 1 0 1 0 0 through ||| 1 1 1 1 yields ||| 0 1 0 1 1 1 0 1 Note that the word din is now mapped to a 9-bit sequence rather than a 3-bit sequence. More precisely, we map each word to a random bit sequence of length max(1, −α × log P(word) + β) . By changing variables α and β we can vary between smooth but long sentences (α = 1 and β = 0) to XKCD-style phrases (α = 0 and β = 15). Table 1 shows example sentences we obtain with α = 2.5 and β = −2.5, yielding sentences of 9.7 words on average.

Poetry
In ancient times, people recorded long, historical epics using poetry, to enhance memorability. We follow this idea by turning each system-assigned 60-bit string into a short, distinct English poem. Our format is the rhyming iambic tetrameter couplet: • The poem contains two lines of eight syllables each.
• Lines are in iambic meter, i.e., their syllables have the stress pattern 01010101, where 0 represents an unstressed syllable, and 1 represents a stressed syllable. We also allow 01010100, to allow a line to end in a word like Angela.
• The two lines end in a pair of rhyming words. Words rhyme if their phoneme sequences match from the final stressed vowel onwards. We obtain stress patterns and phoneme sequences from the CMU pronunciation dictionary. 4 Monosyllabic words cause trouble, because their stress often depends on context (Greene et al., 2010). For example, eighth is stressed in eighth street, but not in eighth avenue. This makes it hard to guarantee that automatically-generated lines will scan as intended. We therefore eject all monosyllabic words from the vocabulary, except for six unstressed ones (a, an, and, the, of, or).
Here is a sample poem password: The le-gen-da-ry Ja-pan-ese Meter and rhyme constraints make it difficult to use the Moses machine translation toolkit to search for fluent output, as we did above; the decoder state must be augmented with additional short-and longdistance information (Genzel et al., 2010).
Instead, we build a large finite-state acceptor (FSA) with a path for each legal poem. In each path, the second line of the poem is reversed, so that we can enforce rhyming locally.
The details of our FSA construction are as follows. First, we create a finite-state transducer (FST) that maps each input English word onto four sequences that capture its essential properties, e.g.: create -> 0 1 create -> 0 1 EY-T create -> 1r 0r create -> EY-T 1r 0r Here, EY-T represents the rhyme-class of words like create and debate. The r indicates a stress pattern in the right-to-left direction.
We then compose this FST with an FSA that only accepts sequences of the form: 0 1 0 1 0 1 0 1 X X 1r 0r 1r 0r 1r 0r 1r 0r where X and X are identical rhyme classes (e.g., EY-T and EY-T).
It remains to map an arbitrary 60-bit string onto a path in the FSA. Let k be the integer representation of the 60-bit string. If the FSA contains exactly 2 60 paths, we can easily select the kth path using the following method. At each node N of the FSA, we store the total number of paths from N to the final state-this takes linear time if we visit states in reverse topological order. We then traverse the FSA deterministically from the start state, using k to guide the path selection.
Our FSA actually contains 2 79 paths, far more than the required 2 60 . We can say that the information capacity of the English rhyming iambic tetrameter couplet is 79 bits! Some are very good: Sophisticated potentates misrepresenting Emirates.
The supervisor notified the transportation nationwide.
while others are very bad: The shirley emmy plebiscite complete suppressed unlike invite The shirley emmy plebiscite complaints suppressed unlike invite The shirley emmy plebiscite complaint suppressed unlike invite Fortunately, because our FSA contains over a million times the required 2 60 paths, we can avoid these bad outputs. For any particular 60-bit string, we have a million poems to choose from, and we output only the best one.
More precisely, given a 60-bit input string k, we extract not only the kth FSA path, but also the k + i · 2 60 paths, with i ranging from 1 to 999,999. We explicitly list out these paths, reversing the second half of each, and score them with our 5-gram LM. We output the poem with the 1-best LM score. Table 1 shows sample outputs.
To reconstruct the original 60-bit string k, we first find the FSA path corresponding to the user-recalled English string (with second half reversed). We use depth-first search to find this path. Once we have the path, it is easy to determine which numbered path it is, lexicographically speaking, using the nodelabeling scheme above to recover k.

Experiments
We designed two experiments to compare our methods.
The first experiment tests the memorability of passwords. We asked participants to memorize a password from a randomly selected method 5 and recall it two days later. To give more options to users,  Table 4: Memorability of passwords generated by our methods. "Recalls" indicates how many participants returned to type their memorized English sequences, and "Correct Recalls" tells how many sequences were accurately remembered.
Method Name User preference XKCD 5% All Letter Method 39% Frequency Method 37% Poetry 19% we let them select from the 10-best passwords according to the LM score for a given 60-bit code. Note that this flexibility is not available for XKCD, which produces only one password per code. 62 users participated in this experiment, 44 returned to recall the password, and 22 successfully recalled the complete password. Table 4 shows that the Poetry and XKCD methods yield passwords that are easiest to remember.
In the second experiment, we present a separate set of users with passwords from each of the four methods. We ask which they would prefer to use, without requiring any memorization. Table 5 shows that users prefer sentences over poetry, and poetry over XKCD. Table 4 shows that the Poetry and XKCD methods yield passwords that are easiest to memorize. Complete sentences generated by the All Letter and Frequency Methods are harder to memorize. At the same time Table 5 shows that people like the sentences better than XKCD, so it seems that they overestimate their ability to memorize a sentence of 10-12 words. Here are typical mistakes (S = system-generated, R = as recalled by user):

Analysis
(S) Still looking for ruben sierra could be in central michigan (R) I am still looking for ruben sierra in central michigan (S) That we were required to go to college more than action movies (R) We are required to go to college more than action movies (S) No dressing allowed under canon law in the youth group (R) No dresses allowed under canon law for youth groups Users remember the gist of a sentence very well, but have trouble reproducing the exact wording. Post-experiment interview reveal this to be partly an effect of overconfidence. Users put little mental work into memorizing sentences, beyond choosing among the 10-best alternatives presented to them. By contrast, they put much more work into memorizing an XKCD phrase, actively building a mental image or story to connect the four otherwise unrelated words.

Future Directions
Actually, we can often automatically determine that a user-recalled sequence is wrong. For example, when we go to reconstruct the 60-bit input string from a user-recalled sequence, we may find that we get a 62-bit string instead. We can then automatically prod the user into trying again, but we find that this is not effective in practice. An intriguing direction is to do automatic error-correction, i.e., take the user-recalled sequence and find the closest match among the 2 60 English sequences producible by the method. Of course, it is a challenge to do this with 1-best outputs of an MT system that uses heuristic beam search, and we must also ensure that security is maintained.
We may also investigate new ways to re-rank nbest lists. Language model scoring is a good start, but we may prefer vivid, concrete, or other types of words, or we may use text data associated with the user (papers, emails) for secure yet personalized password generation. 6 Related Work Gasser (1975), Crawford and Aycock (2008), and Shay et al. (2012) describe systems that produce meaningless but pronounceable passwords, such as "tufritvi" . However, their systems can only assign ∼ 2 30 distinct passwords. Jeyaraman and Topkara (2005) suggest generating a random sequence of characters, and finding a mnemonic for it in a text corpus. A limited corpus means they again have a small space of systemassigned passwords. We propose a similar method in Section 2.2, but we automatically synthesize a new mnemonic word sequence. Kurzban (1985) and Shay et al. (2012) use a method similar to XKCD with small dictionaries. This leads to longer nonsense sequences that can be difficult to remember.

Conclusion
We introduced several methods for generating secure passwords in the form of English word sequences. We learned that long sentences are seemingly easy to remember, but actually hard to reproduce, and we also learned that our poetry method produced relatively short, memorable passwords that are liked by users.