Challenge 1-3 Single-byte XOR cipher
Description
The hex encoded string:
1
|
1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736
|
… has been XOR’d against a single character. Find the key, decrypt the message.
You can do this by hand. But don’t: write code to do it for you.
How? Devise some method for “scoring” a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.
Achievement Unlocked
1
|
You now have our permission to make "ETAOIN SHRDLU" jokes on Twitter.
|
Solution
Ahhhhh…. this one was REALLY fun.
Stipulations:
- XOR a given string to produce a result
- We don’t know the key
- The key is a single character
- We don’t know the result
Assumptions:
- The key is a printable character (pretty likely)
- The result is english (seems reasonable based on the ETAOIN SHRDLU comment)
Approach:
I felt like the easiest approach here is to brute force this. We have a small distribution of potential characters. Even including non printable characters gives us only 256 potential characters, but it’s not likely to be a null because of how C strings terminate. Regardless, I figured i’ll just loop through all the printable characters and XOR the string, then return a tuple with score, and another tuple nested inside that has the key and XOR’d string. Once I have that data, I can use heapq to get the key with the highest score (score based on english language letter frequency).
My scoring function looks like this. Each time we xor a string, we parse it out looking for letters used in the english language. The higher the frequency, the higher the score.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
letter_multipliers = {
'e': 10,
't': 9,
'a': 8,
'o': 7,
'i': 6,
'n': 5,
's': 4,
'r': 3,
'h': 2,
'd': 1,
}
def score_string(str):
score = 0
for i in range(0, len(str), 1):
c = str[i]
if c in letter_multipliers:
score += letter_multipliers[c]
return score
|
Then to get the most likely candidate, we just push everything into a heap and get the tuple with the highest score.
1
2
3
4
5
6
7
8
9
10
11
|
score = score_string(ascii_actual)
heapq.heappush(word_scores,
(
score,
(
key,
ascii_actual
)
)
)
most_likely = heapq.nlargest(1, word_scores) # this one is the most likely
|
Full code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
|
# convert.py
import sys
import heapq
import string
sys.path.append('lib')
from cryptopals import xor_string_with_key, hex_to_dec
original_string = "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"
expected_length = 68
word_scores = []
letter_multipliers = {
'e': 10,
't': 9,
'a': 8,
'o': 7,
'i': 6,
'n': 5,
's': 4,
'r': 3,
'h': 2,
'd': 1,
}
def score_string(str):
score = 0
for i in range(0, len(str), 1):
c = str[i]
if c in letter_multipliers:
score += letter_multipliers[c]
return score
def main():
print("original: {}").format(original_string)
for i in range(33, 127, 1):
key = hex(i).replace("0x", "")
hex_actual = xor_string_with_key(original_string, key)
if len(hex_actual) < expected_length:
print("{} < {}").format(len(hex_actual), expected_length)
return
ascii_actual = ""
for i in range(0, len(hex_actual), 2):
h = hex_actual[i:i+2]
# print("{}: {}").format(i, h)
d = hex_to_dec(h)
c = chr(d)
ascii_actual += c
score = score_string(ascii_actual)
heapq.heappush(word_scores,
(
score,
(
key,
ascii_actual
)
)
)
most_likely = heapq.nlargest(1, word_scores)
print("most likely...")
print(most_likely)
if __name__ == "__main__":
main()
|
Which yields the following output:
1
2
3
4
|
$ python convert.py
original: 1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736
most likely...
[(93, ('58', "Cooking MC's like a pound of bacon"))]
|