Friday, July 1, 2011

Kryptos: A Potential K4 Algorithm



Before I begin -  Thank you for reading, and comments are welcome and appreciated.  Also, please be as forgiving as possible with my use of terminology.  I do my best not to misuse jargon.


The facts as stated by the men who designed Kryptos and it's encryption methods:
  • K4 uses an unknown algorithm (at least at the time it was created)
  • The English in K4 is masked
  • The cryptographer gave the artist ways to modify existing systems for use in Kryptos
  • Each step forward in Kryptos represents a more challenging cipher than the previous one
  • The biggest clue is in plain view 

For quite some time I've been concentrating on these facts.  Masking English can be done in many ways using modern crypto techniques, but one easy way to accomplish this in classical cryptography is to use fractionated substitution.  This can be accomplished using a Polybius square or a straddling checkerboard.  Examples are the bifid, trifid, CM Bifid, ADFGX, and other monome-dinome type ciphers based on the straddling checkerboard.

Fractionated substitution is good as a precursor to a transposition step.  Being able to diffuse the pieces that make up a plaintext letter, in a systematic format, is an algorithm.

The algorithms for ciphers such as the bifid are well known, with their weaknesses well documented and their general solutions published.  They no longer represent a challenge to the cryptographic community given enough ciphertext.

However, the concept for some of these ciphers fit all the criteria for K4 should they be modified in some way:
  • They mask the English by breaking the relationships between the ciphertext characters and plaintext letters.
  • This type of cipher, substitution plus transposition, fits the criteria of being more challenging than the previous layers of the Kryptos puzzle (Morse Code, Vignere, Double Row-Column Transposition)

That leaves only one problem . . . what is the algorithm?  If it's unknown, we can't solve for it, right?

For that I look to the last fact in our list.  It's in plain view.

Could it be that the algorithm isn't a secret, but instead placed in front of us in the familiar dYAhR out of alignment letters?  I believe so.  Let me explain.

As noted, the traditional algorithms for bifids are known.  One such method includes placing the row coordinates (from the Polybius square substitution) for each plaintext letter on a top line, while placing all the column coordinates a line below. An example of a period 7 bifid encipherment is included here (Polybius square not shown):

S H A R P E N
1 2 1 5 3 2 3
1 1 5 4 1 5 3

Pairing up the numbers on the top row, followed by the bottom row yields the following number pairs and corresponding ciphertext characters:


1 2 1 5 3 2 3 1 1 5 4 1 5 3
Z
A
J
P
A
M
B

When deciphering a bifid by hand  you  use the known algorithm to place half-letters in a particular order, and use those half-letters to attempt decryption.  Since this is not a reflective cipher, we don't have access to the coordinate numbers from the polybius square.  A letter followed by a 1 represents a row coordinate from the polybius square and a 2 represents a column coordinate:

Z1 Z2 A1 A2 J1 J2 P1
P2 A1 A2 M1 M2 B1 B2

As you can see, in the third position, what is called a "natural" appears in the reconstructed ciphertext.  Reading vertically, A1 A2 results in a plaintext "A" revealing itself.  The bifid leaks information in the form of naturals throughout the course of decryption.  This is possible in all the odd numbered locations where it reads 1,2 vertically.

Now, let's consider a bifid with a different algorithm.  Take the pattern from the dYAhR pattern.  Down, Up, Up, Down, Up.  Let's encode 5 letters at a time (10 numbers) from our plaintext "SHARPEN":

S
H
A
R
P
1 1 2 1 1 5 5 4 3 1











1 2
1
5 4
1
1

1
5

3










1 2 1 5 4 1 1 1 5 3
Z
A
M
S
B

Now let's consider the potential decryption, since we now know the algorithm, we can work backwards:


Z1 Z2
A1
A2 M1
M2
S1

S2
B1

B2

The first plaintext letter is made up of S1 Z1.  The second letter Z2 S2.  The third A1 B1.  The fourth A2 M1.  The fifth B2 M2.

This bifid doesn't leak.  No possibilities of naturals - ever.  Previously, not only did our bifid have a chance to leak plaintext letters, it only had two patterns available to the cryptanalyst.  12 and 21.  This bifid has three patterns:  11, 22, and 21.

What this means to the cryptanalyst is that there are 75 ways each plaintext letter could be enciphered.  Essentially, what results for a 97 character cipher is a homophonic representation of the plaintext.

This system also presents a bigger challenge than the known bifid due to two of the patterns potentially only helping place letters in the same row, or in the same column, but never giving the full link to row and column.

When solving bifids, such as the ones that appear in the Cryptogram, you are generally given a crib to get started.  Consider our previous example "SHARPEN" with only 5 letters given as a crib:




A R P E N
Z1 Z2 A1 A2 J1 J2 P1
P2 A1 A2 M1 M2 B1 B2


What we are able to glean from this crib is that J & P share a row, M & P share a column, N & P share a row, and N & B share a column.

Given the same type of crib for the dYAhR bifid, we are only able to take a fraction of that information:






A
R
P

Z1 Z2
A1
A2 M1
M2
S1

S2
B1

B2


A shares a row with A, nothing to glean from the "21" pattern, and P shares a column with M.

As you can see, this modified bifid cipher is slightly more cryptographically secure than the traditional bifid algorithm and uses an algorithm previously not documented.

It combines what's best about the bifid - digraphic substitution -  and improves it's strength.  While at the same time it improving its known weakness - naturals.

In the end what's left is a new algorithm to cryptanalyze and create a general solution for - just like Friedman and Callimahos presented in their Military Cryptanalytics series.

Some concluding remarks:

  • On all 26 letters are present in K4.  If I/J or U/V (or any other two letters) share a square, it would be very easy for the cryptographer to alternate which letter he chose to put in the final ciphertext.  In the end, the letters themselves don't matter anyway, just the underlying relationships.

  • On the statistical phenomenon seen in K4 - using only 5 characters to encipher the plaintext typically yields between 3 to 8 instances of double letters in the final ciphertext.  On many occasions these can fall on the same intervals, but overall they are random.
  • By flattening the distribution of 1s, 2s, 3s, 4s, and 5s in the intermediate ciphertext you will have a higher probability of lowering the final Index of Coincidence (IC).  It is all dependent on the Polybius square chosen and the plaintext.  A well mixed Polybius square with some low, medium, and high frequency letters in each row will accomplish this.

  • Lastly, it should be noted that without a crib this cipher is very tough to crack by hand.  I would imagine that perhaps the morse code contains a crib, much like VIRTUALLY INVISIBLE for K2.  Or perhaps LAYERTWO was meant to be a crib for K4 prior to the "BERLIN" release.