Hi! Bit of an unusual post. I'm writing about my first open source tool, Ciphey!

Check Ciphey out here.

Ciphey answers the question:

"What does this encrypted text say?"

First, an important distinction.

I define encryption as:

Anything that you cannot read fluently

When normally encryption is defined as:

Text that has an algorithm applied so no one apart from the intended recipients can read it

My tool, Ciphey, isn't made for cryptographers. It's made for your aunt Linda who has never heard of the word cryptography but works in tech, so knows a little about some things.

Because of my target audience and my personal definition of encryption, Ciphey can automatically decrypt all of these:

  • Vigenère cipher
  • Affine cipher
  • Transposition Cipher
  • Pig Latin
  • Morse Code
  • Ascii
  • Binary
  • Base64
  • Hexadecimal
  • Caesar Cipher
  • Reverse (palindrome)
  • Sha512
  • MD5
  • Sha1
  • Sha384
  • Sha256

Now, I know that you're probably cringing. Sha1 can't be decrypted! And decrypting binary? That's not an encryption, it's encoding!

Refer back to my definition and target market.

Okay, so Ciphey is cool. You input encrypted text, and Ciphey decrypts it. How?

First, Ciphey is made up of 2 core components.

  • Language Checker

Language Checker aims to answer the question:

"Is this text English?"

(and in the future, other languages)

It does this by utilising two popular algorithms.

The first is Chi Squared.

Chi Squared

Chi Squared answers:

"How close is the frequency distribution of this text to the frequency distribution of English?"

Check out this cool video on the general idea by VSauce:

Chi-squared is very fast, but the accuracy isn't that good.

Onto the next algorithm.

Dictionary checker

What better way to check when something is English than to loop through an entire English dictionary and see how many dictionary words appear in the text?

The problem is that this is very slow. My dictionary is 500k words. You can't just loop through that every time you want to check when something is English.

Using both for fun and profit

This is where I use both of these algorithms, combined.

Chi squared tells me when something looks like English, and dictionary checker tells me when something consists of primarily English words.

Both together answer the question "is this English?" really well.

It's also a lot faster than normal.

Chi squared keeps a running average of all the scores it comes across. If it sees a score that's below 1 standard deviation, it goes into stage 2 of the algorithm - dictionary checker.

If 35% of the words in the string are English, it's likely to be English.

35% because of slang, passwords, usernames or program names can all exist in the text.

So Ciphey brute-forces all the ciphers?

Yes, but I like to call it Brute Force Enhanced.

Ciphey uses a deep neural network (DNN) trained on Harry Potter to guess how likely a text is to be encrypted using a method.

As an example, the DNN might predict that the text is 81% likely to be SHA1, 1% likely to be Caesar and so on.

Ciphey then runs all of the decryption modules using multi-threading in the order of most likely to least likely.

If at any moment a decryption returns True, as it has found the plain-text, Ciphey stops and returns the answer.

This method of brute-force enhanced as well as language checker means Ciphey is very fast.

The internal data packet

Decryption modules don't just return True. I have an internal data packet that's passed around.

{
    "lc": self.lc,
    "IsPlaintext?": True,
    "Plaintext": translated,
    "Cipher": "Caesar",
    "Extra Information": "The rotation used is {counter}"
}

Self.lc is Language Checker. When a decryption module is done, it passes Language Checker back to the parent. The parent then adds the LC it received to the LC it holds as an attribute. This means that we can keep the running average accurate and the overall program accurate.

What's next?

Ciphey needs a levels mode.

Sure, it's cool to encrypt text using 1 level of encryption. But what if someone uses 2 levels? Ciphey needs to solve this.

The way this would work is using files.

If the user enters the level command:

ciphey -l 3 "ha skjari isa"

Then Ciphey will run through 3 levels of decryption.

Every decryption Ciphey makes will be stored in a file. Ciphey will then have something like this:

for x in range(0, levels):
	for word in file.open(decryption_list.txt, 'r'):
    	one_level_of_decryption(word)

Ciphey needs more decryption methods

Ciphey supports a lot, but not enough for it to be considered super cool. I need to work on this.

Check Ciphey out here.