I wanted to write a piece on Blockchain but in order to understand it, cryptography is critical. So I decided to defer the Blockchain post and write a post on Cryptography.
Even before you read this, I would like to mention that I am writing this piece with the objective of providing an overview of what encryption is and how it works today. Cryptography is a cast field and there is a lot of algorithms and decryption tools available to us. Encryption is all about math logic and mathematics is reversible.
3 X 4 = 12
12 / 4 = 3
The more complex the logic, the more harder it is to reverse it. Differentiation is the reversal of integration. But more importantly, the advent of computers meant that brute force computing became possible making it ever easier to reverse a logic. The faster the computers became, the easier it became to breaking the code.If this kind of stuff really interests you please read The Code Book by Simon Singh. It has the entire story of encryption and how it evolved.
Cryptography is the composed of two parts; ‘crypto’ which mean secret and ‘graphy’ which is writing. Often times, it is important to send messages across which you do not want others to read. Simply put, you want the message to be read by only the intended recipient.
A Cipher is a code or way of writing a secret. The simplest kind of encryption is the substitution cipher. In such an encryption, I am going to be replacing one alphabet with another. The picture below shows a substitution where we are substituting each alphabet with a letter that is 3 letters ahead.
Now if you were to send this message to a friend, the only way to decrypt the message is, if the key is made available. Here is where a third person comes in. If you were living in medieval times, and a messenger was carrying the encrypted message, you would certainly not send the key across with the same messenger. In case the message was intercepted, you don’t want the person to get the key as well! This introduces a third person; in order to send one message, you need two people. One who carries the message and one who would carry the key. This is known as the key distribution problem.
This kind of substitution encryption is useless, if you have to send a lot of text. Why?
There is a certain frequency at which different alphabets occur in the English language. You can learn more about it here. It is possible to conduct a statistical analysis of the text at hand and find the vowels and then guess the rest of the alphabets to construct the message.
The substitution cipher is a relatively simple cipher. Its very easy to encrypt and very easy to decrypt. As you can see in the image above, it is formed using a simple formula. If you have had your brush with algebra and co-ordinate geometry, you know that formulas can quickly get pretty complex.
Since it was easy to break simple ciphers, more and more complex ciphers were developed in order to keep the message as safe as possible. The Germans created the Enigma Machine, which used to constantly change the logic of the cipher, which during the second world war was superlative. It would not be unfair to say that the Germans lost the war because the Enigma was broken, by none other than Alan Turing.
The Enigma was an amazing crypto engine. The transmitter on the German side used to sign off every message with the name of his girlfriend. This gave them enough data to analyse and come up with the key for each day.
If you are only going to be substituting one letter for another, no matter how complex the logic of a cipher, given sufficient quantity of data, any cipher can be broker.
The only reason that Artificial Intelligence as a field did not find prime time application for 5 decades was because there was not sufficient data to train it! AI involves decoding what the human brain does.
Let us go digital.
The computer does not understand letter, it only does understand numbers and only zeros and ones at that. Everything gets stored in binary form in a computer. Every letter has an associated number that has been attributed by a standard known as the ACSII Code. When every letter is coded as per the ASCII, all data, all information is reduced to a stream of numbers. An image is recorded as number in a computer where every pixel is attributed a colour and the colour codes are stored as numbers.
The only thing that needs to encrypted therefore is numbers.
This makes things a little more convenient. Encryption is about using math logic and creating complex formulas. A formula is used to change a piece of information we have into information that can only be read once the function is reversed. Having all information represented as number makes it easier to play with that information.
Computerisation has made encryption a lot faster, a lot more advanced and a lot more secure in many ways.
So what does the encryption that we use today look like?
Essentially all problems with cryptography comes back to the key distribution. So let us assume the internet to be like a postal department. If this postal department is really corrupt and opens up every single mail that I send through it, the message as well as the key will get opened at some time or the other. The only solution is that I do not send the key through this postal department, which implies meeting in person with the recipient of the message.
But it is not always possible to meet.
Let us play with logic
There are three people John, Jane and Jone. If John needs to send a gift to Jone through Jane, he can put it in a box and lock it. Jane needs to have the key in order to be able to unlock it and access the gift.
If John were to send the box locked with a padlock and then Jane adds her own and sends it back to John. John removes his lock and sends the box back to Jane who is then able to open it in private and access the gift. What happened here was that two sets of locks had to be put at the same time in order for the box to remain secure at all time. But more importantly the locks did not come off in the same order as they were put on.
If we take the simple substitution cipher mentioned at the beginning, if I am replacing A with D and your cipher replaced D with Q, I will no longer be able to decrypt the message or remove my cipher.
The fundamental problem with most encryption is that it has to be decrypted in the order in which it is encrypted. It follows the first on, last off format. The last cipher applied needs to come off first. Thankfully the padlocks do not work like that.
Sending the box back and forth is inefficient and unfortunately mathematics does not work in this manner. Therefore it is essential to find another way.
The solution is to use an asymmetric key. Much like the locks that can be pressed from the top to click them into lock but require a key to open. In this case only the owner of the lock has the key and only that individual can open it. Let us assume that each of us had a lock of our own that were available at a store; John could buy the lock that Jone is selling and lock the box with it. Jone will be able to open it because only she has the key.
The challenge is to find a mathematical equivalent.
Public Key Encryption
How do we make this public lock available to one and all?
We turn to mathematics here. For those who know the entire details of it, I am sorry for bastardising this; but I am going to make this simple to understand.
In mathematics, doing this would involve creating a one-way function. Everything in mathematics is reversible. If the function is engaging in doubling and the output is 42, anybody can work out that the original number was 21. This reversibility breaks down under very specific scenario such as Modulo Operations, it is also called the Step function.*
Now in order to replicate the above lock analogy, we need to have a public key and a private key. The RSA algorithm, named after Ron Rivest, Adi Shamir and Leonard Adleman who came up with it proposed using prime number to make this happen.
Now suppose a number N is the product two prime number a and b. Let us say I came up with a function that uses a, b and N. I could make the product, N, available to you for you to encrypt the message but it can only be decrypted if both a and b are known.
For those of you who are really interested in how the mathematics works are the back of this, you can find out the exact process here.
Let us take two prime number, say, 17 and 13. The product is 221. This number becomes my public key and the message can be decoded only if 17 and 13 are known. Now you would think, if I were given 221, I might have been able to figure out that it was the product of these primes. You would be right.
Now try figuring out a and b for 69,380,921. It will take you long, probably all afternoon.
I must mention here that this number is the product of two 4 digit prime numbers (I just have a few hours off your quest). The 128 bit encryption use two number that is 128 digits in length and the 256 bit encryption uses numbers that are 256 digits in length and so on. When you are dealing with numbers of this length, you are not looking a few hours, but a few decades.
Security expert Simson Garfinkel estimated that a 100 MHz Intel Pentium computer with 8 MB of RAM would take roughly fifty years for the 128 bit encryption.
Computing is moving
100 Mhz and 8 MB of RAM is a joke today. Even your phones have 3 GHz and 3GB of RAM running in them. This has made is necessary to move beyond even the 512 bit encryption if you want to be safe today.
It is apparently possible to crack a 512 bit encryption with AWS EC2 in just 4 hours today. Having said that 512 bit encryption was first introduced in 1999.
It might seem like not so long back but in order to put this in perspective:
•Google was a year old at the time
•Facebook did not exist
•Smartphones did not exist
•Apple was on the verge of a shut down
•A 1 GB RAM system was the highest configuration desktop money could buy, most people got by with a 256 MB RAM system
Today with cloud computing and all of its abundant resources, we require 2048 bit encryption to make it beyond reach for intruders to break the encryption. Unfortunately even Bank use 512 bit encryption today.
It is hard for large organisations to move their entire infrastructure forward in line with the rate at which computing has been advancing. Nevertheless without proper solution, all of us would be left exposed. This will explain part of the allure of bitcoin which I will write about next.
* This is by far the most important idea, which makes almost all of modern encryption possible. I discuss the RSA algorithm below very briefly but there are several different algorithms that are available today. All of them take advantage of the one-way function.