Wednesday, February 23, 2011

Dispatching The D'Agapeyeff Cipher



If the D'Agapeyeff Cipher is indeed a numerical version of the ADFGX cipher, then I now possess the ability to check the entire likely keyspace . . . and in under a lifetime . . . with help from a fellow crypto-enthusiast.

As I outlined in my last post, the beauty of a fractionated cipher like the ADFGX is that each permutation of the columns will produce a unique phi value (or IC if you'd rather talk in terms of a ratio).

As I pointed out in my last post, for any even length transposition key there are only:

N! / ((N/2)! + (N/2)!) unique ways to create a phi value.


14! / (7! + 7!) = 8.7 million ways to create a phi value for the D'Agapeyeff cipher when treated as a fractionated transposition.  I refer to each of these 8.7 million permutations as "key variants".  From each key variant I can obtain all 7! = 5040 permutations of the column pairings.

For instance, let's say this is the correct key: 4, 5, 9, 11, 0, 1, 7, 13, 8, 3, 2, 12, 6, 10 and generates a phi value of 574.

I don't need to find this exact key to find the solution, I just need to find one of the 5,040 variations of the same column pairings, say:  9, 11, 4, 5, 7, 13, 0, 1, 6, 10, 2, 12, 8, 3.  This also generates the same phi value because all the columns are pairing up the same to create the same "letters".  If I were to check all 5,040 permutations of this key, I'd eventually come across 4, 5, 9, 11, 0, 1, 7, 13, 8, 3, 2, 12, 6, 10 and through log tetragraph scoring arrive at the correct plaintext after hill-climbing the substitution key.

Of course, I don't know the phi value of the plaintext in question, but I do know that it should fall into some acceptable range for English.  While 634 is expected for English, the range actually spans from 480 to 780.  I took a random sample of many different books from Project Gutenberg and randomly pulled different 98 characters of text and tested their phi values.  As you can imagine, 75% of all the phi values fall between 540 and 660.

For the D'Agapeyeff cipher, only 0.001% of the 8.7 million key variants fall within the 540 to 660 range.  This represents 74 key variants.  If the D'Agapeyeff Cipher has a solution in this keyspace, I'll need only to check 74 * 5040 = 372,960 total keys to find the solution.  This is much better than 87,178,291,200.

And what if the D'Agapeyeff cipher has a phi value closer to 500?  There are 3,744 key variants with a phi value greater than or equal to 500.  This represents 18,869,760 total keys to be checked.  While this would take approximately 100 days to check each key in the solver, that's much better than 800 years.


Friday, February 4, 2011

Digging For D'Agapeyeff DNA

What is the biggest obstacle to solving a 14 column fractionated transposition with an underlying substitution?

The solution space.  It's not feasible to check and score the entire space.  Or is it?

Let's start with a smaller example before leaping into the ADFGX cipher.

There is a dance tonight, and you have to form dance partners from a group of 4 people.  Two men:  Bob and Tom.  Two women:  Jenny and Sara.

There are N! ways to make dance partners, or in this case 4! = 24.  However there are only (N! / ( (N/2)! + (N/2)!) ) = 6 ways to make unique pairs where the order matters.  Bob & Jenny, Tom & Sara.  If you flip the men and women:  Jenny & Bob, Sara & Tom.  If you change the dance partners:  Tom & Jenny, Bob & Sara.  And if you flip the men and women:  Jenny & Tom, Sara & Bob.  The other two unique pairs are pairing the men together and the women together, and flipping their order.

How does this apply to the ADFGX cipher and my theory surrounding the D'Agapeyeff cipher?  For that we'll need one more statistic.  Phi value.  With the ADFGX cipher, each permutation of the column order produces unique pairs of columns that come together to make one "letter".  Even when you don't know the underlying substitution, you can still calculate the Phi value for the ciphertext.

Consider a 14x14 completed filled transposition grid.  There are 196 characters total which will eventually pair together to form 98 plaintext alphabetic characters.  As I've stated previously here, the expected Phi value for 98 letters from English text is 634 (98 * 97 * 0.0667).

Although there are 14! ways to arrange the columns, there are only 14! / ( (14/2)! + (14/2)! ) = 8,648,640 unique ways to calculate a Phi value.  Going back to a smaller example:

1,2 and 3,4 = unique phi value, and will result in the correct plaintext in the substitution phase.
2,1 and 4,3 = same phi value and will in fact produce the same plaintext in the substitution phase.  All you've done is flip all the row and column coordinates from the polybius square.

3,4 and 1,2 = same phi value, but will not result in the correct plaintext during the substitution phase because the letters are not in the correct order.
4,3 and 2,1 = same phi value, but will also not result in the correct plaintext during the substitution phase because the letters are not in the correct order.

So you might be asking, what does this accomplish?  One, for an even numbered transposition key, completely filled rectangle the ADFGX cipher has two correct keys.  Secondly, if you have any permutation of the correct key, you need only try (N/2)! permutations in the substitution solving phase to yield the correct plaintext.  For the 4 column example, I need only try (4/2)! = 2 permutations of the best potential keys to find the solution and correct plaintext.

Back to identifying the best potential keys.  Here is where the Phi value is key.  It's basically like the DNA fingerprint for the correct plaintext.  If 634 is the Phi value you'd expect to see for a 98 character plaintext you'd need only check potential solutions within a certain proximity to that value.

Let's go back to the 14 column example and it's 8,648,640 unique Phi values.  If I generated 8,648,640 random permutations of the 14 columns and had perfect distribution, within that pile of Phi values, one would contain some permutation of the correct transposition key.  It would have all the correct column pairings but be in the wrong order.  I'd need to try all (N/2)! or 7! or 5,040 permutations of that key in a substitution solver to yield the correct plaintext.

Think of it this way, you are looking for a toothpick in a large pile of toothpicks, however there are 5,040 variations of each of the 8,648,640 toothpicks.  Randomly drawing, let's say 20,000,000 toothpicks of the pile should give you at least 1 or 2 of the 5,040 variations of the toothpick you are looking for.  And on top of it, only 0.002% of the toothpicks look like yours.

What I found on trial runs with a known plaintext is that only about 40 high phi values show up in those 8,648,640 phi values.  Thus far, in limited testing with different plaintexts and polybius squares the correct phi value for my plaintext has always shown up in the top 40 unqiue phi values.  Meaning I'd need only check 40 * 5040 = 201,600 possible transposition keys for mono-alphabetic substitution.

In my view this is far superior to starting with a hill-climber and trying to maximize a scoring function on 99.99998% of potential keys that never have a chance to yield a correct plaintext.