Friday, July 1, 2011

Kryptos: A Potential K4 Algorithm



Before I begin -  Thank you for reading, and comments are welcome and appreciated.  Also, please be as forgiving as possible with my use of terminology.  I do my best not to misuse jargon.


The facts as stated by the men who designed Kryptos and it's encryption methods:
  • K4 uses an unknown algorithm (at least at the time it was created)
  • The English in K4 is masked
  • The cryptographer gave the artist ways to modify existing systems for use in Kryptos
  • Each step forward in Kryptos represents a more challenging cipher than the previous one
  • The biggest clue is in plain view 

For quite some time I've been concentrating on these facts.  Masking English can be done in many ways using modern crypto techniques, but one easy way to accomplish this in classical cryptography is to use fractionated substitution.  This can be accomplished using a Polybius square or a straddling checkerboard.  Examples are the bifid, trifid, CM Bifid, ADFGX, and other monome-dinome type ciphers based on the straddling checkerboard.

Fractionated substitution is good as a precursor to a transposition step.  Being able to diffuse the pieces that make up a plaintext letter, in a systematic format, is an algorithm.

The algorithms for ciphers such as the bifid are well known, with their weaknesses well documented and their general solutions published.  They no longer represent a challenge to the cryptographic community given enough ciphertext.

However, the concept for some of these ciphers fit all the criteria for K4 should they be modified in some way:
  • They mask the English by breaking the relationships between the ciphertext characters and plaintext letters.
  • This type of cipher, substitution plus transposition, fits the criteria of being more challenging than the previous layers of the Kryptos puzzle (Morse Code, Vignere, Double Row-Column Transposition)

That leaves only one problem . . . what is the algorithm?  If it's unknown, we can't solve for it, right?

For that I look to the last fact in our list.  It's in plain view.

Could it be that the algorithm isn't a secret, but instead placed in front of us in the familiar dYAhR out of alignment letters?  I believe so.  Let me explain.

As noted, the traditional algorithms for bifids are known.  One such method includes placing the row coordinates (from the Polybius square substitution) for each plaintext letter on a top line, while placing all the column coordinates a line below. An example of a period 7 bifid encipherment is included here (Polybius square not shown):

S H A R P E N
1 2 1 5 3 2 3
1 1 5 4 1 5 3

Pairing up the numbers on the top row, followed by the bottom row yields the following number pairs and corresponding ciphertext characters:


1 2 1 5 3 2 3 1 1 5 4 1 5 3
Z
A
J
P
A
M
B

When deciphering a bifid by hand  you  use the known algorithm to place half-letters in a particular order, and use those half-letters to attempt decryption.  Since this is not a reflective cipher, we don't have access to the coordinate numbers from the polybius square.  A letter followed by a 1 represents a row coordinate from the polybius square and a 2 represents a column coordinate:

Z1 Z2 A1 A2 J1 J2 P1
P2 A1 A2 M1 M2 B1 B2

As you can see, in the third position, what is called a "natural" appears in the reconstructed ciphertext.  Reading vertically, A1 A2 results in a plaintext "A" revealing itself.  The bifid leaks information in the form of naturals throughout the course of decryption.  This is possible in all the odd numbered locations where it reads 1,2 vertically.

Now, let's consider a bifid with a different algorithm.  Take the pattern from the dYAhR pattern.  Down, Up, Up, Down, Up.  Let's encode 5 letters at a time (10 numbers) from our plaintext "SHARPEN":

S
H
A
R
P
1 1 2 1 1 5 5 4 3 1











1 2
1
5 4
1
1

1
5

3










1 2 1 5 4 1 1 1 5 3
Z
A
M
S
B

Now let's consider the potential decryption, since we now know the algorithm, we can work backwards:


Z1 Z2
A1
A2 M1
M2
S1

S2
B1

B2

The first plaintext letter is made up of S1 Z1.  The second letter Z2 S2.  The third A1 B1.  The fourth A2 M1.  The fifth B2 M2.

This bifid doesn't leak.  No possibilities of naturals - ever.  Previously, not only did our bifid have a chance to leak plaintext letters, it only had two patterns available to the cryptanalyst.  12 and 21.  This bifid has three patterns:  11, 22, and 21.

What this means to the cryptanalyst is that there are 75 ways each plaintext letter could be enciphered.  Essentially, what results for a 97 character cipher is a homophonic representation of the plaintext.

This system also presents a bigger challenge than the known bifid due to two of the patterns potentially only helping place letters in the same row, or in the same column, but never giving the full link to row and column.

When solving bifids, such as the ones that appear in the Cryptogram, you are generally given a crib to get started.  Consider our previous example "SHARPEN" with only 5 letters given as a crib:




A R P E N
Z1 Z2 A1 A2 J1 J2 P1
P2 A1 A2 M1 M2 B1 B2


What we are able to glean from this crib is that J & P share a row, M & P share a column, N & P share a row, and N & B share a column.

Given the same type of crib for the dYAhR bifid, we are only able to take a fraction of that information:






A
R
P

Z1 Z2
A1
A2 M1
M2
S1

S2
B1

B2


A shares a row with A, nothing to glean from the "21" pattern, and P shares a column with M.

As you can see, this modified bifid cipher is slightly more cryptographically secure than the traditional bifid algorithm and uses an algorithm previously not documented.

It combines what's best about the bifid - digraphic substitution -  and improves it's strength.  While at the same time it improving its known weakness - naturals.

In the end what's left is a new algorithm to cryptanalyze and create a general solution for - just like Friedman and Callimahos presented in their Military Cryptanalytics series.

Some concluding remarks:

  • On all 26 letters are present in K4.  If I/J or U/V (or any other two letters) share a square, it would be very easy for the cryptographer to alternate which letter he chose to put in the final ciphertext.  In the end, the letters themselves don't matter anyway, just the underlying relationships.

  • On the statistical phenomenon seen in K4 - using only 5 characters to encipher the plaintext typically yields between 3 to 8 instances of double letters in the final ciphertext.  On many occasions these can fall on the same intervals, but overall they are random.
  • By flattening the distribution of 1s, 2s, 3s, 4s, and 5s in the intermediate ciphertext you will have a higher probability of lowering the final Index of Coincidence (IC).  It is all dependent on the Polybius square chosen and the plaintext.  A well mixed Polybius square with some low, medium, and high frequency letters in each row will accomplish this.

  • Lastly, it should be noted that without a crib this cipher is very tough to crack by hand.  I would imagine that perhaps the morse code contains a crib, much like VIRTUALLY INVISIBLE for K2.  Or perhaps LAYERTWO was meant to be a crib for K4 prior to the "BERLIN" release.


Thursday, June 30, 2011

And Waiting . . .

Status Update:  925 key variants checked.  4,662,000 total keys checked.  20.5% complete.

3584 remaining key variants.  18,063,360 total keys remaining.  79.5% remaining.

Tuesday, May 31, 2011

Still Waiting for D'Agapeyeff

Today I calculated the percent of the likely key space I've checked to date.  I have checked:

564 Key Variants
2,842,560 Transposition Keys

The likely keyspace, as I've defined it, is any permutation of column pairings with a phi value over 500.

4509 Total Key Variants
22,725,360 Total Transposition Keys

Thus far I've checked only 12.51% of the likely key space.  It's slow going, but I hope to be rewarded for my patience.

Tuesday, May 10, 2011

Waiting for D'Agapeyeff

Waiting for a solution to present itself is both exciting and challenging.  I've never been accused of being patient, but in this endeavor it's my only choice.

This entire process reminds me quite a bit of the play Waiting for Godot.

D'Agapeyeff will not be here today, but . . .
he will surely be here tomorrow . . . right?

Monday, April 25, 2011

Another Short Update

Over the past month I've spent some time ensuring that I'll be able to see this theory through to completion.

That means two things.  The first is that I now I have a completely automated program that solves ADFGX ciphers and has been tested on known ciphertexts.  This program now only needs a list of key variants to check, which I've automated with another process.  For the D'Agapeyeff cipher I've chosen the top 4,509 key variants for testing which represent phi values 498 through 600.  As previously stated I've eliminated all keys from 538 through 600 and feel confident that those do not contain a solution for this cipher.

Testing now continues on phi values 536 down to 498.  This process will take several months.

Tuesday, March 15, 2011

Quick Update

I see that quite a few people stumble across this space on a given week, so I figured I'd give an update since it's been nearly a month since my last post.

As noted earlier, when treating the D'Agapeyeff cipher as a numerical version of ADFGX (treating the 6-0 digits as nulls) the max attainable phi value is 600 which is associated with an IC of 1.64 (6.3%).

From tests I've run, the fact that the IC tops somewhere below the value expected for English leads me to believe that if the D'Agapeyeff cipher is in ADFGX form, the plaintext has a lower IC much like the cipher D'Agapeyeff decoded from his friend in the cryptanalysis section of his book.

Thus far I've eliminated all key variants from 600 down to 538.  This represents 215 key variants and 1,083,600 total keys.  The program I'm working with allows me to input one key variant, and it then checks the 5,040 permutations of the 7 column pairings and scores the potential plaintext with log tetragraphs.  It's a slick program, but does require me to input keys.  If I make it down to a phi value of 520 with no plaintext the key distinct variants do get tougher to distinguish.  And by tougher I just mean time consuming.  I've not come up with a great programmatic way to sort through all the keys that fall into a given phi/IC and pick out just the unique column pairings.

That's all for now.

Wednesday, February 23, 2011

Dispatching The D'Agapeyeff Cipher



If the D'Agapeyeff Cipher is indeed a numerical version of the ADFGX cipher, then I now possess the ability to check the entire likely keyspace . . . and in under a lifetime . . . with help from a fellow crypto-enthusiast.

As I outlined in my last post, the beauty of a fractionated cipher like the ADFGX is that each permutation of the columns will produce a unique phi value (or IC if you'd rather talk in terms of a ratio).

As I pointed out in my last post, for any even length transposition key there are only:

N! / ((N/2)! + (N/2)!) unique ways to create a phi value.


14! / (7! + 7!) = 8.7 million ways to create a phi value for the D'Agapeyeff cipher when treated as a fractionated transposition.  I refer to each of these 8.7 million permutations as "key variants".  From each key variant I can obtain all 7! = 5040 permutations of the column pairings.

For instance, let's say this is the correct key: 4, 5, 9, 11, 0, 1, 7, 13, 8, 3, 2, 12, 6, 10 and generates a phi value of 574.

I don't need to find this exact key to find the solution, I just need to find one of the 5,040 variations of the same column pairings, say:  9, 11, 4, 5, 7, 13, 0, 1, 6, 10, 2, 12, 8, 3.  This also generates the same phi value because all the columns are pairing up the same to create the same "letters".  If I were to check all 5,040 permutations of this key, I'd eventually come across 4, 5, 9, 11, 0, 1, 7, 13, 8, 3, 2, 12, 6, 10 and through log tetragraph scoring arrive at the correct plaintext after hill-climbing the substitution key.

Of course, I don't know the phi value of the plaintext in question, but I do know that it should fall into some acceptable range for English.  While 634 is expected for English, the range actually spans from 480 to 780.  I took a random sample of many different books from Project Gutenberg and randomly pulled different 98 characters of text and tested their phi values.  As you can imagine, 75% of all the phi values fall between 540 and 660.

For the D'Agapeyeff cipher, only 0.001% of the 8.7 million key variants fall within the 540 to 660 range.  This represents 74 key variants.  If the D'Agapeyeff Cipher has a solution in this keyspace, I'll need only to check 74 * 5040 = 372,960 total keys to find the solution.  This is much better than 87,178,291,200.

And what if the D'Agapeyeff cipher has a phi value closer to 500?  There are 3,744 key variants with a phi value greater than or equal to 500.  This represents 18,869,760 total keys to be checked.  While this would take approximately 100 days to check each key in the solver, that's much better than 800 years.


Friday, February 4, 2011

Digging For D'Agapeyeff DNA

What is the biggest obstacle to solving a 14 column fractionated transposition with an underlying substitution?

The solution space.  It's not feasible to check and score the entire space.  Or is it?

Let's start with a smaller example before leaping into the ADFGX cipher.

There is a dance tonight, and you have to form dance partners from a group of 4 people.  Two men:  Bob and Tom.  Two women:  Jenny and Sara.

There are N! ways to make dance partners, or in this case 4! = 24.  However there are only (N! / ( (N/2)! + (N/2)!) ) = 6 ways to make unique pairs where the order matters.  Bob & Jenny, Tom & Sara.  If you flip the men and women:  Jenny & Bob, Sara & Tom.  If you change the dance partners:  Tom & Jenny, Bob & Sara.  And if you flip the men and women:  Jenny & Tom, Sara & Bob.  The other two unique pairs are pairing the men together and the women together, and flipping their order.

How does this apply to the ADFGX cipher and my theory surrounding the D'Agapeyeff cipher?  For that we'll need one more statistic.  Phi value.  With the ADFGX cipher, each permutation of the column order produces unique pairs of columns that come together to make one "letter".  Even when you don't know the underlying substitution, you can still calculate the Phi value for the ciphertext.

Consider a 14x14 completed filled transposition grid.  There are 196 characters total which will eventually pair together to form 98 plaintext alphabetic characters.  As I've stated previously here, the expected Phi value for 98 letters from English text is 634 (98 * 97 * 0.0667).

Although there are 14! ways to arrange the columns, there are only 14! / ( (14/2)! + (14/2)! ) = 8,648,640 unique ways to calculate a Phi value.  Going back to a smaller example:

1,2 and 3,4 = unique phi value, and will result in the correct plaintext in the substitution phase.
2,1 and 4,3 = same phi value and will in fact produce the same plaintext in the substitution phase.  All you've done is flip all the row and column coordinates from the polybius square.

3,4 and 1,2 = same phi value, but will not result in the correct plaintext during the substitution phase because the letters are not in the correct order.
4,3 and 2,1 = same phi value, but will also not result in the correct plaintext during the substitution phase because the letters are not in the correct order.

So you might be asking, what does this accomplish?  One, for an even numbered transposition key, completely filled rectangle the ADFGX cipher has two correct keys.  Secondly, if you have any permutation of the correct key, you need only try (N/2)! permutations in the substitution solving phase to yield the correct plaintext.  For the 4 column example, I need only try (4/2)! = 2 permutations of the best potential keys to find the solution and correct plaintext.

Back to identifying the best potential keys.  Here is where the Phi value is key.  It's basically like the DNA fingerprint for the correct plaintext.  If 634 is the Phi value you'd expect to see for a 98 character plaintext you'd need only check potential solutions within a certain proximity to that value.

Let's go back to the 14 column example and it's 8,648,640 unique Phi values.  If I generated 8,648,640 random permutations of the 14 columns and had perfect distribution, within that pile of Phi values, one would contain some permutation of the correct transposition key.  It would have all the correct column pairings but be in the wrong order.  I'd need to try all (N/2)! or 7! or 5,040 permutations of that key in a substitution solver to yield the correct plaintext.

Think of it this way, you are looking for a toothpick in a large pile of toothpicks, however there are 5,040 variations of each of the 8,648,640 toothpicks.  Randomly drawing, let's say 20,000,000 toothpicks of the pile should give you at least 1 or 2 of the 5,040 variations of the toothpick you are looking for.  And on top of it, only 0.002% of the toothpicks look like yours.

What I found on trial runs with a known plaintext is that only about 40 high phi values show up in those 8,648,640 phi values.  Thus far, in limited testing with different plaintexts and polybius squares the correct phi value for my plaintext has always shown up in the top 40 unqiue phi values.  Meaning I'd need only check 40 * 5040 = 201,600 possible transposition keys for mono-alphabetic substitution.

In my view this is far superior to starting with a hill-climber and trying to maximize a scoring function on 99.99998% of potential keys that never have a chance to yield a correct plaintext.

Wednesday, January 26, 2011

Since I've Got The Time

Genetic search algorithms and hill-climbers are all very serious business, so I thought I'd try out a couple of educated guesses on the transposition key for the D'Agapeyeff Cipher for fun.   Why?  Why not?  I've definitely got the time.  Sure it's a 1 : 87 billion guess, but it's EDUCATED! :)

I started with a few of my usual assumptions:

1.  Strip out the odd digits from the ciphertext (all the 5, 6, 7, 8, 9, and 0 digits)
2.  Write the ciphertext vertically into 14 columns (14x14 grid)
3.  Assume that "04" near the middle of the cipher text marks the last character in the cryptogram

That leaves us looking for a 14 character transposition key, where the last column of the original matrix is what I like to call C7 (Column 7 after transposition).

What do we know about Alexander D'Agapeyeff?  We know he was a cartographer, born in Russia, and lived (and died) in England.  We also know that he showed us an example of substitution + fractionated transposition in his Codes and Ciphers book:

A cartographer living in the UK you say?  I believe it.  Mr. D'Agapeyeff used MANCHEST(E)R as the keyword in his example.  As you can see, he removed the 2nd instance of "E" to give a resulting 9 columns for the transposition key width.

Let's assume for a moment that he's playing by the same rules with his challenge cipher.

We're looking for a key phrase with 14 unique letters.  Not exactly a simple task.

NOT EXACTLY A SIMPLE TASK.  There are 21 characters in that sentence, but only 14 unique.  But let's revisit the mind of a cartographer.  A man who made maps for a living.  No doubt his mind is littered with names of places.  Towns, villages, cities, countries, etc.

Let's also revisit one of the first assumptions:

"That leaves us looking for a 14 character transposition key, where the last column of the original matrix is what I like to call C7 (Column 7 after transposition)."
 Okay, so we need a key phrase where the last unique character is somewhere near the middle of the alphabet.  Why is that?

Remember when a key is finished being transposed, essentially what you have is:

MANCHESTR to ACEHMNRST

If we "know" (and I use that term loosely, remember this is just for fun) the last letter of the keyword alphabetizes to the 7th position there needs to be 6 letters preceding it and 7 following it in the alphabet.

ABCDEF     GHIJKLMNOPQRS     TUVWXYZ

We can rule out A-F being the last unique letter in the key phrase, as well as T-Z.  That leaves us with G-S as the possibilities.  But let's be realistic.  I highly doubt all of ABCDEF or TUVWXYZ are all used in the keyphrase.  We are likely looking for a key phrase where the last unique letter is IJKLMNOPQ.

So back to D'Agapeyeff's keyword of choice in his book.  MANCHEST(E)R.

MANCHESTER UNITED KINGDOM.  15 unique characters.  Probably not it (I tried)

But hey, there are a ton of cities in the UK for a map maker to draw from, and there are 8 unique letters just in "UNITED K(IN)G(D)OM".  Added bonus that M and O are the last unique letters provided that they both don't show up in the city name you choose.  Take for instance Liverpool.

LIVERPOOL, UNITED KINGDOM

LIVERPO(OL)UN(I)T(E)DK(IN)G(DO)M.  14 unique characters.  When alphabetized:

DEGIKLMNOPRTUV.  M is the last unique character and falls into the 7th spot when reordered for transposition.  Beautiful.  Cipher Solved!  Re-arrange the columns, pair up the digits to form the 2 digit numbers from the polybius square used for substitution and plug into my mono-alphabetic substitution solver
, and . . . . nope, garbage.  Coventry fit the bill as well, but no dice there either.

But hey, it wasn't hard to try with a spreadsheet setup to do all the leg work.  Realistically I checked the phi-test value (380 something) and it told me it wasn't English.

Tuesday, January 11, 2011

D'Agapeyeff Cipher: Searching For A Solution

It's been almost  two months since my last post on the D'Agapeyeff Cipher, and I guess you could say that I've moved from "interested in this cipher" to "actively looking for a solution". As far as I can tell, there aren't many in this category, but I am interested in discussing this cipher with other interested parties, so please feel free to contact me.

In my last post from November 24, 2010 I attempted to identify the "row-columns" and "column-columns" that exist from using a polybius square for the substitution phase of the encryption.

I did do some due diligence and check the frequency counts for each digit of a 7x28 matrix, but saw no compelling evidence to suggest this shape was used.

My focus then remains on the 14x14 transposition matrix. Once the columns are paired (un-fractionated?) there will be a 7x14 (98 character) simple substitution cipher remaining. The trick is first solving the the transposition encryption independently of the substitution.

Tiago Rodrigues does a great job summarizing this strategy on his D'Agapeyeff.com website.


Back to the 14 columns, and the transposition space:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14
5 4 4 3 5 4 3 4 2 5 5 3 2 2
2 4 5 5 4 5 5 1 2 3 2 3 3 4
2 4 3 5 3 5 2 1 5 2 4 5 2 2
5 2 2 1 2 3 1 1 1 4 4 2 2 2
1 4 4 3 5 2 4 4 5 4 4 3 2 3
2 3 1 1 4 2 3 3 1 1 1 4 2 5
1 1 5 5 3 2 3 5 4 2 3 2 3 1
4 3 4 5 1 3 2 4 1 4 5 2 2 4
1 1 5 5 5 2 5 5 2 2 4 2 5 5
4 1 3 5 1 1 1 4 5 2 1 5 4 4
1 4 5 4 4 1 3 5 4 4 1 2 5 5
4 4 5 2 5 2 3 5 5 1 5 3 2 1
5 2 5 2 4 1 3 2 5 1 2 5 1 2
4 2 3 2 5 4 4 4 1 3 4 2 3 2


I surmised that columns C3, C7, C12, and C13 were likely to be "row-columns" while C1, C8, C11, and C14 were likely to be "column-columns" based on differing frequency counts for each of the 5 ciphertext characters.  The remaining six columns were sorted into row-columns or column-columns based on their frequency distribution.


Another member of the D'Agapeyeff cipher Yahoo! forum continued with some analysis of his own based on my assumptions to date.  That analysis is located here.  The outcome is similar to my findings with the caveat that C4 and C10 don't seem to favor being a row or a column.

Additionally this analysis output the largest Phi values possible - where 584 is the maximum value and is attained when the pairs are:

C1 - C6
C2 - C12
C5 - C7
C8 - C14
C9 - C13
C10 - C4
C11 - C3

If you recall, 634 is the expected Phi value for English text where N=98.  584, as well as several other sets of column pairings fall within an acceptable range, however no value of Phi(o) exceeds 634.

Assume for a moment that these are the correct column pairings, what remains is a 7! transposition which when in the correct order will leave a very simple substitution cipher.

Perhaps we're making headway against this as yet unsolved cipher.