Tuesday, November 9, 2010

The D'Agapeyeff Cipher

As I stated in my introductory post, I've taken an interest in this challenge cipher from the 1939 book "Codes and Ciphers" by Alexander D'Agapeyeff.

This cipher was included on the last page of the book and you can read more about it on Wikipedia.  Not a lot of other information exists about this cipher even though it appears on lists of the most famous unsolved codes.

Tiago Rodrigues' site does a nice job of analyzing the cipher text.

75628 28591 62916 48164 91748 58464 74748 28483 81638 18174
74826 26475 83828 49175 74658 37575 75936 36565 81638 17585
75756 46282 92857 46382 75748 38165 81848 56485 64858 56382
72628 36281 81728 16463 75828 16483 63828 58163 63630 47481
91918 46385 84656 48565 62946 26285 91859 17491 72756 46575
71658 36264 74818 28462 82649 18193 65626 48484 91838 57491
81657 27483 83858 28364 62726 26562 83759 27263 82827 27283
82858 47582 81837 28462 82837 58164 75748 58162 92000

Before acquiring and reading D'Agapeyeff's book, I took a look at the cipher for the first time and just noted down what I saw that was interesting.

  1. The first thing that catches my eye about this cipher is the sheer number of 8s that appear.
  2. Zeros occur only twice.  Once to pad out the end of the cipher, and one other time near the middle of the cipher.
  3. The pattern of the numbers.  The first digit is a 6, 7, 8, or 9 and the second number in the pair is a 1, 2, 3, 4, 5.

As I said before, not much information regarding analysis or attempts to solve this cipher exists.  What little information I have seen normally suggests that this cipher should be analyzed in the 196 number pairs that exist when you remove the three zeros from the end (000).

That same school of thought then suggests a polybius square was used to encode the plaintext letters into 2 digit numbers.  This theory is statistically sound:

N = 196
Phi(r) = 1472
Phi(e) = 2549
Phi(o) = 2664
IC = 0.069

For those 196 number pairs the expected phi value is 1472 for random text, 2549 for English, and the observed phi value for the counts of these pairs of numbers is 2664.  This yields an IC of 0.069.  Perfectly in line with what you'd expect for English text that's undergone a substitution.

Here's the problem I have with this method of attack.  The setup for this allows the zero near the middle of the cipher to remain.  Why?  If "000" is removed from the tail-end of the ciphertext, shouldn't we assume the other zero was also performing the same function?

Let's go back to my first and third observations.  Plethora of 8s.  Number pattern.

D'Agapeyeff makes mention of several ideas in his book to make cryptanalysis more difficult.  One such idea is the insertion of nulls into the ciphertext.  If the pattern of 9/8/7/6 and 1/2/3/4/5 is man-made through the use of null values, perhaps the ciphertext is actually just made of up pairs of 1-5 from another sort of polybius square:

Removing all the 6s, 7s, 8s, 9s and the zeros would leave 196 numbers or 98 number pairs.

After removing the nulls


Number Pair Counts


This would suggest a 7x14 or 14x7 rectangle (less probable are 4x28 and 28x4).

However, when performing the phi test for mono-alphabeticity on the resulting 98 number pairs yields:


N = 98
Phi(r) =366
Phi(e) =634
Phi(o) = 402

According to the test, these frequencies more resemble random letters than readable text.  Perhaps the numbers are paired in another manner or the cipher is broken into smaller pieces.

This brings me back to my second observation, the zero near the middle.  It must indicate a break or something similar, otherwise why not just keep using 6s, 7s, 8s, and 9s and save the three zeros to pad out the end of the cipher.  To be continued next post . . .

No comments:

Post a Comment