I already solved this a long time ago in python. I haven’t had a lot of personal time in general to blog. I’ve been writing down fun ideas for posts, and believe me, I have a TON in the works but unfortunately I don’t have much finished material to release yet.
Better Content?
As part of a cleanup effort of my backup drive, I’ve been merging, and deleting repos, and files as I consolidate things. I stumbled across many articles from my previous blogs/sites, and will be releasing that content as i touch it up or rewrite it.
One of my focuses needs to be on grammar, correct spelling, and content. I was going to use being busy as an excuse to just let the posts fly, but as I want these posts to be a reflection of me, and the way I approach problems, I don’t think it makes sense to just release them without checking them more thoroughly.
Regarding Hamming Distance
I did a reasonably good write up on hamming distance previously, but decided rathern than link it here, I would rewrite it.
To that end, I would describe hamming distance as “the sum difference between two values”.
In Cryptopals challenge 1-6, we are asked to compute the Hamming Distance between two strings. I will say this over, and over, and over again…. It is mentioned in the first exercise for Cryptopals that we ALWAYS OPERATE ON BITS.
As is expected, a solution written by different software engineers, can yield different results if the understanding about the GOAL is not agreed upon. This is where things like User Stories, and Test Driven Development can be crucial planning tools used to help deliver the correct solution, which could also be called the agreed upon solution.
Research, and planning the algorithm
When possible, I will read descriptions, RFCs, protocols, etc to understand how a piece of software should behave. In this case, since hamming distance is a simple calculation, I headed to wikipedia first to read about what it is.
I will say that having gone back to wikipedia this morning to talk through what makes the algorithms there wrong, I can see that the description is significantly better than it used to be.
Definition
The Hamming distance between two equal-length strings of symbols is the number of positions at which the corresponding symbols are different.
The symbols may be letters, bits, or decimal digits, among other possibilities. For example, the Hamming distance between:
- “karolin” and “kathrin” is 3.
- “karolin” and “kerstin” is 3.
- “kathrin” and “kerstin” is 4.
- 0000 and 1111 is 4.
- 2173896 and 2233796 is 3.
So given the updated definition, I would say the algorithms on wikipedia are correct when determining the hamming distance of the characters as the symbol, but we want the sum distance of hamming distance using bits as the symbol or metric.
Input
String 1:
|
|
String 2:
|
|
Expected Output
|
|
Using bytes as the distance metric in Python
This is the original algorithm I found on wikipedia, which determines the distance in CHARs which is not helpful for our use case.
|
|
Yields the following output:
|
|
Using bits as the distance metric in C++
|
|
Yields the following output:
|
|
Table Showing Distance Calculation
- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
s1 | 01110100 | 01101000 | 01101001 | 01110011 | 00100000 | 01101001 | 01110011 | 00100000 | 01100001 | 00100000 | 01110100 | 01100101 | 01110011 | 01110100 |
s2 | 01110111 | 01101111 | 01101011 | 01101011 | 01100001 | 00100000 | 01110111 | 01101111 | 01101011 | 01101011 | 01100001 | 00100001 | 00100001 | 00100001 |
sum | 2 | 3 | 1 | 2 | 2 | 3 | 1 | 5 | 2 | 4 | 3 | 2 | 3 | 4 |