đCiphey - Automated Decryption Tool
Hi! Bit of an unusual post. Iâm writing about my first open source tool, Ciphey!
Ciphey answers the question:
âWhat does this encrypted text say?â
First, an important distinction.
I define encryption as:
Anything that you cannot read fluently
When normally encryption is defined as:
Text that has an algorithm applied so no one apart from the intended recipients can read it
My tool, Ciphey, isnât made for cryptographers. Itâs made for your aunt Linda who has never heard of the word cryptography but works in tech, so knows a little about some things.
Because of my target audience and my personal definition of encryption, Ciphey can automatically decrypt all of these:
- Vigenère cipher
- Affine cipher
- Transposition Cipher
- Pig Latin
- Morse Code
- Ascii
- Binary
- Base64
- Hexadecimal
- Caesar Cipher
- Reverse (palindrome)
- Sha512
- MD5
- Sha1
- Sha384
- Sha256
Now, I know that youâre probably cringing. Sha1 canât be decrypted! And decrypting binary? Thatâs not an encryption, itâs encoding!
Refer back to my definition and target market.
Okay, so Ciphey is cool. You input encrypted text, and Ciphey decrypts it. How?
First, Ciphey is made up of 2 core components.
- Language Checker
Language Checker aims to answer the question:
âIs this text English?â
(and in the future, other languages)
It does this by utilising two popular algorithms.
The first is Chi Squared.
Chi Squared
Chi Squared answers:
âHow close is the frequency distribution of this text to the frequency distribution of English?â
Check out this cool video on the general idea by VSauce:
Chi-squared is very fast, but the accuracy isnât that good.
Onto the next algorithm.
Dictionary checker
What better way to check when something is English than to loop through an entire English dictionary and see how many dictionary words appear in the text?
The problem is that this is very slow. My dictionary is 500k words. You canât just loop through that every time you want to check when something is English.
Using both for fun and profit
This is where I use both of these algorithms, combined.
Chi squared tells me when something looks like English, and dictionary checker tells me when something consists of primarily English words.
Both together answer the question âis this English?â really well.
Itâs also a lot faster than normal.
Chi squared keeps a running average of all the scores it comes across. If it sees a score thatâs below 1 standard deviation, it goes into stage 2 of the algorithm - dictionary checker.
If 35% of the words in the string are English, itâs likely to be English.
35% because of slang, passwords, usernames or program names can all exist in the text.
So Ciphey brute-forces all the ciphers?
Yes, but I like to call it Brute Force Enhanced.
Ciphey uses a deep neural network (DNN) trained on Harry Potter to guess how likely a text is to be encrypted using a method.
As an example, the DNN might predict that the text is 81% likely to be SHA1, 1% likely to be Caesar and so on.
Ciphey then runs all of the decryption modules using multi-threading in the order of most likely to least likely.
If at any moment a decryption returns True, as it has found the plain-text, Ciphey stops and returns the answer.
This method of brute-force enhanced as well as language checker means Ciphey is very fast.
The internal data packet
Decryption modules donât just return True. I have an internal data packet thatâs passed around.
{
"lc": self.lc,
"IsPlaintext?": True,
"Plaintext": translated,
"Cipher": "Caesar",
"Extra Information": "The rotation used is {counter}"
}
Self.lc is Language Checker. When a decryption module is done, it passes Language Checker back to the parent. The parent then adds the LC it received to the LC it holds as an attribute. This means that we can keep the running average accurate and the overall program accurate.
Whatâs next?
Ciphey needs a levels mode.
Sure, itâs cool to encrypt text using 1 level of encryption. But what if someone uses 2 levels? Ciphey needs to solve this.
The way this would work is using files.
If the user enters the level command:
ciphey -l 3 "ha skjari isa"
Then Ciphey will run through 3 levels of decryption.
Every decryption Ciphey makes will be stored in a file. Ciphey will then have something like this:
for x in range(0, levels):
for word in file.open(decryption_list.txt, 'r'):
one_level_of_decryption(word)
Ciphey needs more decryption methods
Ciphey supports a lot, but not enough for it to be considered super cool. I need to work on this.