Wednesday, November 24, 2010

Identifying Odd & Even Numbered Columns: D'Agapeyeff Cipher

In the last post I identified the mostly likely columns to have occupied even numbered positions in the original matrix:  C3, C7, C12, and C13.  I'm going to isolate these columns for purposes of this post.






As I referenced in a previous post, Friedman outlined a general solution for solving the ADFG(V)X cipher that relied on extrapolating data from cryptograms in the same key.  I'm going to use that same technique/assumption to try and tease out which three columns also belong in the same family.

If all four of these columns were even numbered in the original matrix, they should begin to show us a pattern of what the other even numbered columns look like by way of frequency counts.



  
If our assumption is correct, we can see a pattern emerging here.  The other even numbered columns should show few instances of 1 or 4.  Let's check the original matrix.  I've added counts for the other cipher numbers.


I want to first check the columns that have the next highest amounts of 3s.  C2, C4, C5, C6 and C10.

Of those five columns, one seems to fit the profile exactly:  C4.  It has only three instances of ciphertext 1 and ciphertext 4, while at the same time has six instances of ciphertext 2 and five instances of ciphertext 5.

The next most likely suspects are C6 and C14.  Not only do they have the fewest ciphertext 1s and ciphertext 4s (five each), they both have the next highest counts of ciphertext 2s (five each).

At this point, I feel comfortable moving forward to try and reconstruct the original pairs of columns.  But I'm a realist, I've only taken the possible transpositions from 14! or 87,178,291,200 down to 7! * 7! or 25,401,600.  So don't worry, I'm still daunted.


Tuesday, November 23, 2010

Back At It: D'Agapeyeff Cipher

To recap my assumptions before we dive in:

1.  The D'Agapeyeff Cipher consists of numbers 1-5 derived from a polybius square, numbers 6-9 which are nulls to disguise the ciphertext, and the number zero used as a termination point and to pad out the cipher.
 "The cipher is of course easily made out, but if every third, fourth, or fifth letter, as may be previously arranged, is a dummy inserted after a message has been put into cipher, it is then extremely difficult to decipher unless you are in the secret."

2.  The D'Agapeyeff Cipher can be written into a 14 x 14 square matrix.

3.  Phi tests on the cipher text as pairs of numbers running column-wise, row-wise, or pairing digits before and after the zero near the middle fail for mono-alphabetic substitution.  Therefore, columnar-transposition must be in play.

4.  Once removing the nulls from the ciphertext, the remaining digits should be placed into columns of 14 digits:



5.  What remains is a substitution followed by fractionated transposition.


There are the assumptions, and now for a little preliminary analysis on the digits to set the table:

Count of 1s     33
Count of 2s     46
Count of 3s     29
Count of 4s     43
Count of 5s     45

One more assumption we'll need to make is that the polybius square was keyed with a keyword and that the most infrequently used letters fell into a row together (example below):


In this case, "VWXYZ" fall into a row together and their frequencies in the English language would mean two things for our impending analysis.  First of which is that 5, in this example above, would be the least represented number in the ciphertext.  Second, 5 would be more likely to appear as an even numbered column rather than an odd numbered column before the columnar transposition occurred.

Using this information and applying it to our cipher text, it would appear that ciphertext 3 represents this row coordinate of these infrequently used letters as it only appears 29 times.  Based on the frequency count for each digit, we may be looking at a polybius square that looks something like this:


It's helpful to disassociate the numerical order that these figures represent normally.

Let's take a look again at our ciphertext.  I've added column headers for referencing the columns and shaded all instances of 3.



Since it's likely that 3 was used more often than not as a second number in a pair rather than a first we should be able to identify which columns were likely odd numbered columns in the original matrix (the first number in a pair) and which columns were likely even numbered columns in the original matrix (the second number in the pair).


There are four columns that contain more instances of 3 than most:  C7, C12, C3, and C13.  Interestingly enough the column containing the digit 4 that was previous attached to zero (04) seems to slot in as an even numbered column from the original matrix.  My hypothesis is that "04" was used to signify the last column in the original matrix.

There are five columns that contain zero or one instances of 3.  These columns are the most probable to be odd numbered columns from the original matrix.

To be continued . . . .

Monday, November 15, 2010

Is The D'Agapeyeff Cipher The ADFGX Cipher?

As a continuation of my previous post, I decided to break the D'Agapeyeff cipher into two smaller 7x7 squares.

To achieve this meant pairing up the digits in some fashion.  In one test used the "04" spot in the original ciphertext to mark the break point for the first 49 pairs and second 49 pairs.

In the second test, I used the "04" as the stopping point for the first digit in the pairs and all the numbers after the "04" as the second digit.

As mentioned in that post, I did remove all the 6-9 digits and the zeros before pairing up the remaining 196 digits.

My assumption was that if it was enciphered in this manner the same polybius square would have been used as there would be no need to complete two squares.  When comparing the frequencies of the pairs of numbers I didn't see enough commonalities between the two sets of 49 pairs to continue pursuing this method.

One example would be that in one group of 49 pairs the most frequent number was 22, but 22 wasn't represented at all in the other 49 pairs.

This brought me back to the drawing board, and I went back to review D'Agapeyeff's section on substitution plus transposition and my corresponding notes, which read:

I wonder if he understood how complex this concept is when introducing fractionated letters during the transposition step?

Look again at his example from the book:
This brings me back to one of my original hypotheses, although I was hoping it not to be this complex.  A completed filled 14x14 square with fractionated substitution.  To put it very simply, we'd be looking at the ADFGX cipher or as it became after the polybius square was expanded to accommodate numeric characters, the ADFGVX cipher.  This cipher was employed by the German military during World War I.

If you've read Friedman's Military Cryptanalysis, Part IV, Transposition and Fractionating Systems then you may recall there a couple different ways to solve the ADFGX cipher.  During the war they relied on high volumes of traffic and special circumstances such as messages with the same beginning, same ending, or completely filled rectangles.  One thing remains constant with all those special circumstance solutions:   high volume.

We have only one cryptogram, but we do have a couple things working for us.

  1. There is a completely filled rectangle.  196 factors into 4x49, 7x28, or 14x14
  2. If we assume the 14x14 square was used, we have an even number of columns which also reduces the cryptanalytic work
Let's consider the steps.  The 98 character plaintext was converted to digits via a polybius square resulting in pairs of numbers for each letter.  Consider the plaintext:

The bridge is out on the southeast road.  Use the bridge on the road north of the city for attack.  Hold position two day more.

The intermediate text was then inscribed into a 14x14 square horizontally as shown in D'Agapeyeff's example above.  A keyword/phrase is inscribed in the column headers for transposition.


Transposition is then completed via alphabetizing the columns:


The final ciphertext can be written out be reading column-wise (vertically) starting with the first column.  Nulls may be inserted every other character to re-create the appearance of the D'Agapeyeff cipher.

What does this mean in terms of attack strategies for the D'Agapeyeff cipher?  If we can assume the cipher was written on a 14x14 square we can assume there are 7 columns which represent the row coordinates from the polybius square and 7 columns which represent the column coordinates from the polybius square.

Unfortunately, we don't have additional ciphers to draw upon, so if this is the method by which the cipher was constructed it will be tough to crack.  Furthermore, if D'Agapeyeff's polybius square didn't have the low frequency letters "VWXYZ" in a row together as in my example the cipher becomes impervious to attack fairly quickly.

I've done some initial calculations assuming a 14x14 transposition rectangle (square) and there is evidence to suggest that VWXYZ or some combination of low frequency letters may have ended up in the third row, or D'Agapyeff may have changed the order of the row/column coordinates from the traditional 12345 to something like 34512.  I'll go into more details after further testing.

Thursday, November 11, 2010

Phi Test Results On D'Agapeyeff Cipher

The follow are results from phi tests I ran.  The first is with the original ciphertext, where each of the 392 numbers are paired after removing the final three zeros.


The observed score, Phi(o), in all three tests indicate the frequencies of the number pairs resemble English text.  This holds true when breaking the ciphertext into two 7x14 rectangles before and after the "04".




The second test I ran was on the ciphertext after removing the 6s, 7s, 8s, 9s, and all zeros. The remaining numbers (1-5) were then paired.


The observed phi value for the 98 number pairs was much closer to Phi(r) for random letters than for Phi(e) which is what would have been expected for English text.

Breaking the 98 pairs into two groups of 49, before and after the zero in the middle of the ciphertext yields a phi value of 80 for the first half and 136 for the second half.  While the first half appears to be random letters, the second half could certainly  be English text.



The next test I ran on the D'Agapeyeff ciphertext again relied on removing the nulls (6-0), however I instead paired the digits before and after the zero in the middle of the ciphertext..  Pairing the 1st number with the 99th, the 2nd with the 100th, etc.

Again, the phi test for all 98 number pair frequencies is much too low to be considered readable English text, however testing the first 49 pairs and second 49 pairs separately yields phi values that can be considered in line with English.

I did want to complete one other test to see what the results would be for a 49 character cipher similar in language to what D'Agapeyeff was writing in his book.  So I tested the phrase:

THEENEM
YISCOMI
NGFROMT
HENORTH
WESTFRI
DAYATFO
URPMHRT

"THE ENEMY IS COMING FROM THE NORTHWEST FRIDAY AT FOUR PM HRT"

The score was as follows:



The observed phi value was very similar to the observed values on the previous test.  I'm not using this as a proof for the other test, merely as a point of comparison for a short message.


In my next post I will include some of my observations from breaking the cipher into smaller rectangles/squares.



Wednesday, November 10, 2010

D'Agapeyeff Cipher As Two 7x7 Squares

Picking up where I left off last time, I want to continue testing the D'Agapeyeff cipher as a combination substitution-transposition cipher with nulls added to the final ciphertext.


As stated in the previous post, removing all the 6s, 7s, 8s, 9s and the zeros would leave 196 numbers or 98 number pairs.

Before:
75628 28591 62916 48164 91748 58464 74748 28483 81638 18174
74826 26475 83828 49175 74658 37575 75936 36565 81638 17585
75756 46282 92857 46382 75748 38165 81848 56485 64858 56382
72628 36281 81728 16463 75828 16483 63828 58163 63630 47481
91918 46385 84656 48565 62946 26285 91859 17491 72756 46575
71658 36264 74818 28462 82649 18193 65626 48484 91838 57491
81657 27483 83858 28364 62726 26562 83759 27263 82827 27283
82858 47582 81837 28462 82837 58164 75748 58162 92000
After:
    52 251 21 414 14 544 44 243 13 114
    42 245 32 415 45 355 53 355 13 155
    55 422 25 432 54 315 14 545 45 532
    22 321 12 143 52 143 32 513 33 441
    11 435 45 455 24 225 15 141 25 455
    15 324 41 242 24 113 52 444 13 541
    15 243 35 234 22 252 35 223 22 223
    25 452 13 242 23 514 54 512 2
The question now becomes, how were these numbers written prior becoming the final ciphertext?  And why was there a zero near the middle?

Some ideas:
  1. The numbers are paired moving left to right, top to bottom (first pair 52).  The zero would mark the 49th pair, splitting the ciphertext into two sets of 49 pairs.  Or possibly two 7x7 squares.
  2. The numbers are paired using the first number (5) and the first number after the middle zero, the 99th overall number (4).  This would yield 98 total number pairs.
  3. The numbers were no longer paired when placed into 14x14 transposition table or two 7x14 tables.  They were fractionated much like the example on pg. 124-125 of the original D'Agapeyeff book.  Shown below.

The thing that steers me away from the use of a 14x14 table in both the conventional setup (no nulls used) and the setup I'm proposing is D'Agapeyeff's use of a keyword to reorder the columns.  Choosing a 14+ letter keyword doesn't seem as plausible, nor did D'Agapeyeff use a longer key phrase in any of his examples in the book.  You can't rule it out completely, but it seems less plausible.  It would seem to make sense that he was building off his own examples since this was included at the end of the book, asking the reader to test the skills they had just learned.

In the next post I'll post results from the phi tests I ran on the 7x7 squares of pairs of numbers.

Tuesday, November 9, 2010

The D'Agapeyeff Cipher

As I stated in my introductory post, I've taken an interest in this challenge cipher from the 1939 book "Codes and Ciphers" by Alexander D'Agapeyeff.

This cipher was included on the last page of the book and you can read more about it on Wikipedia.  Not a lot of other information exists about this cipher even though it appears on lists of the most famous unsolved codes.

Tiago Rodrigues' site does a nice job of analyzing the cipher text.

75628 28591 62916 48164 91748 58464 74748 28483 81638 18174
74826 26475 83828 49175 74658 37575 75936 36565 81638 17585
75756 46282 92857 46382 75748 38165 81848 56485 64858 56382
72628 36281 81728 16463 75828 16483 63828 58163 63630 47481
91918 46385 84656 48565 62946 26285 91859 17491 72756 46575
71658 36264 74818 28462 82649 18193 65626 48484 91838 57491
81657 27483 83858 28364 62726 26562 83759 27263 82827 27283
82858 47582 81837 28462 82837 58164 75748 58162 92000

Before acquiring and reading D'Agapeyeff's book, I took a look at the cipher for the first time and just noted down what I saw that was interesting.

  1. The first thing that catches my eye about this cipher is the sheer number of 8s that appear.
  2. Zeros occur only twice.  Once to pad out the end of the cipher, and one other time near the middle of the cipher.
  3. The pattern of the numbers.  The first digit is a 6, 7, 8, or 9 and the second number in the pair is a 1, 2, 3, 4, 5.

As I said before, not much information regarding analysis or attempts to solve this cipher exists.  What little information I have seen normally suggests that this cipher should be analyzed in the 196 number pairs that exist when you remove the three zeros from the end (000).

That same school of thought then suggests a polybius square was used to encode the plaintext letters into 2 digit numbers.  This theory is statistically sound:

N = 196
Phi(r) = 1472
Phi(e) = 2549
Phi(o) = 2664
IC = 0.069

For those 196 number pairs the expected phi value is 1472 for random text, 2549 for English, and the observed phi value for the counts of these pairs of numbers is 2664.  This yields an IC of 0.069.  Perfectly in line with what you'd expect for English text that's undergone a substitution.

Here's the problem I have with this method of attack.  The setup for this allows the zero near the middle of the cipher to remain.  Why?  If "000" is removed from the tail-end of the ciphertext, shouldn't we assume the other zero was also performing the same function?

Let's go back to my first and third observations.  Plethora of 8s.  Number pattern.

D'Agapeyeff makes mention of several ideas in his book to make cryptanalysis more difficult.  One such idea is the insertion of nulls into the ciphertext.  If the pattern of 9/8/7/6 and 1/2/3/4/5 is man-made through the use of null values, perhaps the ciphertext is actually just made of up pairs of 1-5 from another sort of polybius square:

Removing all the 6s, 7s, 8s, 9s and the zeros would leave 196 numbers or 98 number pairs.

After removing the nulls


Number Pair Counts


This would suggest a 7x14 or 14x7 rectangle (less probable are 4x28 and 28x4).

However, when performing the phi test for mono-alphabeticity on the resulting 98 number pairs yields:


N = 98
Phi(r) =366
Phi(e) =634
Phi(o) = 402

According to the test, these frequencies more resemble random letters than readable text.  Perhaps the numbers are paired in another manner or the cipher is broken into smaller pieces.

This brings me back to my second observation, the zero near the middle.  It must indicate a break or something similar, otherwise why not just keep using 6s, 7s, 8s, and 9s and save the three zeros to pad out the end of the cipher.  To be continued next post . . .

Monday, November 8, 2010

Inagural Post

It seems like I should have something monumental to say in the first post here, ahem . . .

I've got nothing.  Where to begin?  A little background perhaps.

Somehow in the past year, I've gotten sucked into the world of codes, ciphers, and secret writing.  I find it all very fascinating and I hope one day to teach my son the ins and outs of how it all works.

I must say, this all started in a very innocent manner.  I stumbled back into a childhood hobby of collecting baseball cards when I heard about "The Ginter Code".  This site walks through the solution for this particular code, but the basis of it was there were symbols, 64 or 65ish in total, printed on a subset of 100 baseball cards from this particular set.  The idea was that you'd have to hunt through the cards and figure out how to decipher the hidden message(s).

I don't think I can put into words what I learned while working on that code.  I had no background in cryptography, nor had a I read anything on the subject, and for that I was handed my defeat.  However the biggest lesson I took away from that is that you need to learn how codes and ciphers are constructed before you can learn to take them apart.

Fresh off my drubbing in the 2009 Ginter Code challenge, I spent the next year reading whatever I could get my hands on, and another blogger (in the baseball card hobby) and solver of the 2008 Ginter Code challenge pointed me to this:






That bad boy is Kryptos.  A sculpture designed by artist Jim Sanborn, with help (on the ciphers) from Ed Scheidt.

I spent many hours in 2010 learning about Kryptos, joining the Yahoo community that works toward brainstorming and solving it, and making several friends along the way.  Even more importantly, I formed what I believe was a decently well thought out theory on how the final section may have been encrypted and tested that theory with the help of some very kind individuals from the Yahoo group.  While I don't spend as much time working on Kryptos of late, anyone who's looked at that darn sculpture can't seem to let it go.  Thoughts come and go, ideas come to me in my sleep . . . anyone reading this far probably understands what I'm talking about.

There you have it, the background on ciphers and me in the past year or so.  I went so far as to design my own code also utilizing baseball cards.  It was a fun endeavor, and someone did finally solve it after a few weeks.

I wanted to start this blog as a side outlet to share my thoughts on classical ciphers, code-breaking, code-making, and also share my thoughts on a couple as-yet-unbroken ciphers out there.

My current interest lies in the D'Agapeyeff cipher.  There is not a lot of information available about this cipher, although it's gone unbroken for almost 70 years.  I've got an idea about it and that will have to be my second post here.  Here's a shot of the ciphertext.