Featured image of post Cryptopals 1-1

Cryptopals 1-1

But first, a warning…

I have a full time job as a staff software enginer at a startup, and 2 young kids. It’s close to impossible for me to find time to just sit down, and write, so a lot of the things I write will be very short, and to the point… I may even leave things out as I type with a 3 year old sitting on my lap asking about pancakes, and if it’s the weekend. ha ha ha ha. She likes our Saturdays :)

My coding style

I do validation. I write tests. I throw exceptions. I write explicit calls to functions in namespaces every time… I don’t use using.

In this challenge, we’re actually building up a library of functions, and algorithms that we can re-use later in other challenges… So I will be creating classes. Probably with static functions to start, and then as things solidify, linking them together into a lib. Just in case you’re wondering why I did something a specific way. I’m always open to a discussion, and don’t claim to write the most optimizied code… but I do want to write the most easy to understand/modify, with error handling included out of the box, and tests to confirm it works as expected.

Finished code

The final code can be found on github. I did not go through the trouble of making this first challenge into a nice lib yet since the challenge IS the encoding. As part of the next challenge, I will do all the good stuff. Maybe I’ll use CMake or something to build everything. Not sure what I feel like doing at the moment, and this is supposed to be an exercise for me to have fun since I am rarely challenged with real work anymore.

Cryptopals

Cryptopals is a set of challenges centered around gaining an understanding of modern cryptography (not crypto currency), and how to break it. One of their tenets is “always operate on raw bytes never on encoded strings”. The first challenge is about IMPLEMENTING hex parsing, and base64 encoding, yet for some reason I still see people post solutions where they just call a base64 decoding function. What did you learn by calling a function :facepalm:.

I hope someone sees this, and decides to try it. I’ve done some in python, but ultimately, I have always wanted to come back and do these in c++. I may re-do them in rust afterwards as well. It’s hard to say. Doing these in c++ is self serving for another project I’m working on. I happen to just really enjoy rust, and would like to get really good at it.

The old adage w/cryptopals is that if you want to REALLY learn a language well, you do the cryptopals challenges in it!

So here we go!

Set 1, Challenge 1

Cryptopals 1-1 Hex to Base64, involves the following tasks:

  • Take the following hex encoded string 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d, and convert it to binary as plaintext.
  • Yes, this involves writing a simple algorithm to parse, and convert the hex to ascii.
  • At that point, take the plaintext and base64 encode it, returning ciphertext.
  • WRITE THE BASE64 ENCODING/DECODING algorithm LOL… otherwise, what is the point of doing this challenge? You called a function? greeeeaaaaaat…..

Parsing hex string to ascii

The goal here is to take a hex string (groups of 2 chars), convert them to an int value (ascii), then convert that value to a char.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
class Util {
    public:
        static std::string hex_to_ascii(std::string input_string);
};

std::string Util::hex_to_ascii(std::string input_string) {
    // generally speaking, handling invalid data is best done before you operate on it
    // also, yeah, I throw exceptions :)
    if (input_string.length() % 2 != 0) throw std::invalid_argument("string is not hex");
    std::string plaintext {""};

    // iterate over the string in blocks of 2 chars to get hex, then convert to char
    for (auto I = 0; I < input_string.length(); I += 2) {
        std::string hex = "0x" + input_string.substr(i, 2);
        unsigned int ascii = std::stoul(hex, nullptr, 16);
        plaintext.push_back(char(ascii));
    }

    return plaintext;
}

Base64 encoding

When implementing the base64 encoding algorithm, you just need to read the spec. It can be the official RFC, or it can be a wiki if it explains enough for you to implement it. In this particular case, the wikipedia article is fine to reference. When we get to the challenge involving hamming distance, I’ll illustrate why it’s critical that you READ THE FUCKING MANUAL (RTFM), or RFC, so that you can succeed.

https://en.wikipedia.org/wiki/Base64#Implementations_and_history

The background is straight forward. Methods for transporting binary data in message groups, and the like were limited in the past. The easiest way not to lose information (read: ensure accuracy), was to encode it in common characters that could be visually represented. The opposite of the whitespace programming language. LOL.

The algorithm is actually pretty simple. Generate a able of characters from the following set A-Za-z0-9+/. Then assign them a value from 0-63 (64 values). Hence, base 64 encoding. Since we only need 64 characters to represent these values, we can use the first 6 bits of a byte to encode this data.

Each bit position in a binary number is a double of the previous number. This means 1 byte, which is 8 bits, has the following numberical values. The top row is position, the bottom row is the value of the bit.

Dont forget that 0 is a value too! So 255 is 256 bits of unique information, as a value of 63 us 64 unique values as well.

Practically speaking, these means an optimal implementation might be as simple as:

  • take 3 bytes (characters) from the input stream
  • concatenate their bits (3 bytes = 24 bits)
  • since we are base 64 encoding, we only need 6 bits for each value
  • 24 bits / 6 bit per char = 4 chars
  • our final encoded product should be 4 individual bytes with the 2 left most bits always set to 0 (max 63)
  • for each byte, we will get a char from the table at the position corresponding to the value

Setup

Let’s scaffold out our class, types, and methods.

I’m choosing to add a 64th character, =, which is used as padding. Then I’m creating a const definition of PADDING which is equal to that position in the string I get back from Base64::get_char_table(). I will probably change this to be a member later, but that requires initialization, and right now all my methods are static. We can always optimize after we know how we’ll be using our new class right?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#define PADDING 64;

enum base64_operation {
    encode,
    decode
};

class Base64 {
    public:
        static std::string get_char_table();
        static std::string encode(std::string plaintext);
        static std::string decode(std::string ciphertext);
};

std::string Base64::get_char_table() {
    return std::string {"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="};
}

Awesome! We have a base class, with some functions for getting our char_table, which will take a 6 bit value, and convert it to a printable character from the list. This is the basis of how we’ll choose which character to represent our binary data.

Encoding algorithm

Since it IS possible for us to receive a string that is not divisible by 3, we need to test/peek ahead to see if we have a char at that point before we operate on it. This is why we check if i+1, i+2 is there before we operate on them.

FYI: There is another way to do this, where we could iterate over the string buffer, popping 1 char at a time and just storing the remainder bits, but it’s more difficult to comprehend than this super simplified example IMHO.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
std::string Base64::encode(std::string plaintext) {
    std::string ciphertext {""};
    std::string char_table = Base64::get_char_table();

    // iterate over the plaintext in blocks of 3 chars
    // input: 3 chars * 8 bytes = 24 bits
    // base64 is 6 bits per char not 8 bits
    // output: 24 bits / 6 bits = 4 chars
    auto end = plaintext.length();
    for (auto I = 0; I < end; I += 3) {
        std::uint8_t b1, b2, b3, b4;

        // 01001001 00100111 01101101 - first 3 chars
        // 010010010010011101101101 - binary data (24 bits)
        // 010010|010010|011101|101101 - split into 6 bit chunks
        // bytes: 00010010 00010010 00011101 00101101 - base64 output of first 3 chars. note the 00 prefix as bytes are always 8 bits
        // int: 18-18-29-45

        // each byte ->
        // start w/all 0s
        // | gives us any set value
        // & gives us the ability to ONLY give us values where BOTH are set
        // 0x3f is 63, which means we will only ever set 6 bits :thumbsup:
        // 00100111
        // | plaintext[i+1] >> 4;  // want last 2 c1  + first 4 of c2

        b1 = plaintext[i] >> 2;
        if (i+1 >= end) {
            b2 = (plaintext[i] << 4) & 0x3f;
            b3 = PADDING;
            b4 = PADDING;
        } else if (i+2 >= end) {
            b2 = ((plaintext[i] << 4) | plaintext[i+1] >> 4) & 0x3f; 
            b3 = (plaintext[i+1] << 2) & 0x3f;
            b4 = PADDING;
        } else {
            b2 = ((plaintext[i] << 4) | plaintext[i+1] >> 4) & 0x3f;
            b3 = ((plaintext[i+1] << 2) | plaintext[i+2] >> 6) & 0x3f;
            b4 = plaintext[i+2] & 0x3f;
        }

        // take the int value of our bits, and pull a char 
        // from the table to append to the ciphertext string
        ciphertext.push_back(char_table[b1]);
        ciphertext.push_back(char_table[b2]);
        ciphertext.push_back(char_table[b3]);
        ciphertext.push_back(char_table[b4]);
    }

    return ciphertext;
}

Base64 decoding

I actually feel that decoding is easier, since there is no way to represent base64 w/o the length being divisible by 4. Remeber that we always pad out to 4 characters during encoding… well, at least for the RFC I implemented. There are apparently more than one protocol now.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
std::string Base64::decode(std::string ciphertext) {
    std::string plaintext {""};
    std::string char_table = Base64::get_char_table();
    // impossible to not have at least 4 char, or it's an invalid base64 string
    if (ciphertext.length() % 4 != 0) throw std::invalid_argument("invalid base64 string");

    for (auto i = 0; i < ciphertext.length(); i += 4) {
        std::size_t b1 = char_table.find(ciphertext[i]);
        std::size_t b2 = char_table.find(ciphertext[i+1]);
        std::size_t b3 = char_table.find(ciphertext[i+2]);
        std::size_t b4 = char_table.find(ciphertext[i+3]);

        / padding should be all 0s
        if (b3 == 64) b3 = 0;
        if (b4 == 64) b4 = 0;

        char c1, c2, c3;
        
        c1 = (b1 << 2) | b2 >> 4;
        c2 = (b2 << 4) | b3 >> 2;
        c3 = (b3 << 6) | b4;

        plaintext.push_back(char(c1));
        // drop padding bytes, as they're not required
        if (b3 != 0) plaintext.push_back(char(c2));
        if (b4 != 0) plaintext.push_back(char(c3));
    }

    return plaintext;
}

Let’s write some tests!

I encoded 6 strings with lengths from 1-6 characters using the command base64. I put those known good values in this script, and made it easy to execute.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
bool run_tests() {
    // initialize all the shit
    bool tests_passed = true;
    std::vector<std::tuple<std::string, std::string, int>> test_cases;
    test_cases.push_back(std::tuple<std::string, std::string, int>("a",        "YQ==",     base64_operation::encode));
    test_cases.push_back(std::tuple<std::string, std::string, int>("ab",       "YWI=",     base64_operation::encode));
    test_cases.push_back(std::tuple<std::string, std::string, int>("abc",      "YWJj",     base64_operation::encode));
    test_cases.push_back(std::tuple<std::string, std::string, int>("abcd",     "YWJjZA==", base64_operation::encode));
    test_cases.push_back(std::tuple<std::string, std::string, int>("abcde",    "YWJjZGU=", base64_operation::encode));
    test_cases.push_back(std::tuple<std::string, std::string, int>("abcdef",   "YWJjZGVm", base64_operation::encode));
    test_cases.push_back(std::tuple<std::string, std::string, int>("YQ==",     "a",        base64_operation::decode));
    test_cases.push_back(std::tuple<std::string, std::string, int>("YWI=",     "ab",       base64_operation::decode));
    test_cases.push_back(std::tuple<std::string, std::string, int>("YWJj",     "abc",      base64_operation::decode));
    test_cases.push_back(std::tuple<std::string, std::string, int>("YWJjZA==", "abcd",     base64_operation::decode));
    test_cases.push_back(std::tuple<std::string, std::string, int>("YWJjZGU=", "abcde",    base64_operation::decode));
    test_cases.push_back(std::tuple<std::string, std::string, int>("YWJjZGVm", "abcdef",   base64_operation::decode));
    
    std::cout << "running tests..." << std::endl;

    for (auto &test_case : test_cases) {
        std::string input = std::get<0>(test_case);
        std::string expected = std::get<1>(test_case);
        bool operation = std::get<2>(test_case);
        std::string operation_name = (operation) ? "encoding" : "decoding";
        std::string actual;
        
        if (operation == base64_operation::encode) {
            actual = Base64::encode(input);
        } else {
            actual = Base64::decode(input);
        }
        
        std::cout << "TEST ";
        if (actual == expected) {
            std::cout << "PASS";
        } else {
            std::cout << "FAIL... " << operation_name << std::endl; 
            std::cout << "    > expected: " << expected << std::endl;
            std::cout << "    > actual  : " << actual;
            tests_passed = false;
        }
        std::cout << std::endl;
    }

    return tests_passed;
}

which yields the following output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
$ clear && g++ solution.cpp -o solution &&  ./solution
running tests...
TEST PASS
TEST PASS
TEST PASS
TEST PASS
TEST PASS
TEST PASS
TEST PASS
TEST PASS
TEST PASS
TEST PASS
TEST PASS
TEST PASS

Yay!

Challenge 1-1 Solution

Here’s the general solution/text code so you can see output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include "iostream"
#include "string"
#include "bitset"
#include "vector"
#include "tuple"
#include "stdlib.h"

...

int main() {
    std::string hex_input_string {"49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d"};
    std::string expected_base64_ciphertext {"SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t"};
    std::string actual_base64_ciphertext;
    std::string result;
    bool is_match = false;

    try {
        std::string plaintext = Util::hex_to_ascii(hex_input_string);
        actual_base64_ciphertext = Base64::encode(plaintext);
        std::string decoded_plaintext = Base64::decode(actual_base64_ciphertext);

        std::cout << "hex_input_string:           " << hex_input_string << std::endl;
        std::cout << "expected_base64_ciphertext: " << expected_base64_ciphertext << std::endl;
        std::cout << "actual_base64_ciphertext:   " << actual_base64_ciphertext << std::endl;
        std::cout << "expected_plaintext:         " << plaintext << std::endl;
        std::cout << "decoded_plaintext:          " << decoded_plaintext << std::endl;
        std::cout << "ciphertext matches:         ";
        if (expected_base64_ciphertext == actual_base64_ciphertext) std::cout << "yes";
        else std::cout << "no";
        std::cout << std::endl;
        std::cout << "plaintext matches:          ";
        if (plaintext == decoded_plaintext) std::cout << "yes";
        else std::cout << "no";
        std::cout << std::endl;

        run_tests();

        return is_match;
    } catch (const std::exception& ex) {
        std::cerr << "Error occurred: " << ex.what() << std::endl;
        return 1;
    }
}

Yields the following output:

1
2
3
4
5
6
7
8
$ clear && g++ solution.cpp -o solution &&  ./solution
hex_input_string:           49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
expected_base64_ciphertext: SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
actual_base64_ciphertext:   SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
expected_plaintext:         I'm killing your brain like a poisonous mushroom
decoded_plaintext:          I'm killing your brain like a poisonous mushroom
ciphertext matches:         yes
plaintext matches:          yes
Built with Hugo
Theme Stack designed by Jimmy