Cryptopals: 1-3 in python

Challenge 1-3 Single-byte XOR cipher

Description

The hex encoded string:

1
1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736

… has been XOR’d against a single character. Find the key, decrypt the message.

You can do this by hand. But don’t: write code to do it for you.

How? Devise some method for “scoring” a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.

Achievement Unlocked

1
You now have our permission to make "ETAOIN SHRDLU" jokes on Twitter.

Solution

Ahhhhh…. this one was REALLY fun.

Stipulations:

  • XOR a given string to produce a result
  • We don’t know the key
  • The key is a single character
  • We don’t know the result

Assumptions:

  • The key is a printable character (pretty likely)
  • The result is english (seems reasonable based on the ETAOIN SHRDLU comment)

Approach:

I felt like the easiest approach here is to brute force this. We have a small distribution of potential characters. Even including non printable characters gives us only 256 potential characters, but it’s not likely to be a null because of how C strings terminate. Regardless, I figured i’ll just loop through all the printable characters and XOR the string, then return a tuple with score, and another tuple nested inside that has the key and XOR’d string. Once I have that data, I can use heapq to get the key with the highest score (score based on english language letter frequency).

My scoring function looks like this. Each time we xor a string, we parse it out looking for letters used in the english language. The higher the frequency, the higher the score.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
letter_multipliers = {
    'e': 10,
    't': 9,
    'a': 8,
    'o': 7,
    'i': 6,
    'n': 5,
    's': 4,
    'r': 3,
    'h': 2,
    'd': 1,
}

def score_string(str):
    score = 0
    for i in range(0, len(str), 1):
        c = str[i]
        if c in letter_multipliers:
            score += letter_multipliers[c]
    return score

Then to get the most likely candidate, we just push everything into a heap and get the tuple with the highest score.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
score = score_string(ascii_actual)
heapq.heappush(word_scores,
    (
        score,
        (
            key,
            ascii_actual
        )
    )
)
most_likely = heapq.nlargest(1, word_scores) # this one is the most likely

Full code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# convert.py
import sys
import heapq
import string

sys.path.append('lib')

from cryptopals import xor_string_with_key, hex_to_dec

original_string = "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"
expected_length = 68

word_scores = []
letter_multipliers = {
    'e': 10,
    't': 9,
    'a': 8,
    'o': 7,
    'i': 6,
    'n': 5,
    's': 4,
    'r': 3,
    'h': 2,
    'd': 1,
}

def score_string(str):
    score = 0
    for i in range(0, len(str), 1):
        c = str[i]
        if c in letter_multipliers:
            score += letter_multipliers[c]
    return score

def main():
    print("original: {}").format(original_string)

    for i in range(33, 127, 1):
        key = hex(i).replace("0x", "")
        hex_actual = xor_string_with_key(original_string, key)
        if len(hex_actual) < expected_length:
            print("{} < {}").format(len(hex_actual), expected_length)
            return
        ascii_actual = ""
        for i in range(0, len(hex_actual), 2):
            h = hex_actual[i:i+2]
            # print("{}: {}").format(i, h)
            d = hex_to_dec(h)
            c = chr(d)
            ascii_actual += c
        score = score_string(ascii_actual)
        heapq.heappush(word_scores,
            (
                score,
                (
                    key,
                    ascii_actual
                )
            )
        )
    most_likely = heapq.nlargest(1, word_scores)
    print("most likely...")
    print(most_likely)


if __name__ == "__main__":
    main()

Which yields the following output:

1
2
3
4
$ python convert.py
original: 1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736
most likely...
[(93, ('58', "Cooking MC's like a pound of bacon"))]
Licensed under CC BY-NC-SA 4.0
Built with Hugo
Theme Stack designed by Jimmy