“Believe in your infinite potential. Your only limitations are those you set upon yourself.” ― Roy T. Bennett, The Light in the Heart
Pycrypto is a python module that provides cryptographic services. Pycrypto is somewhat similar to JCE (Java Cryptography Extension) for Java. In our experience JCE is more extensive and complete, and the documentation for JCE is also more complete. That being said, pycrypto is a pretty good module covering many aspects of cryptography.
In this article, we investigate using pycrypto’s implementation of AES for file encryption and decryption.
[Note: We have also covered AES file encryption and decryption in java previously.]
2. Generating a Key
AES encryption needs a strong key. The stronger the key, the stronger your encryption. This is probably the weakest link in the chain. By strong, we mean not easily guessed and has sufficient entropy (or secure randomness).
That being said, for the sake of demonstration of AES encryption, we generate a random key using a rather simple scheme. Do not copy and use this key generation scheme in production code.
AES encryption needs a 16-byte key.
key = ''.join(chr(random.randint(0, 0xFF)) for i in range(16)) print 'key', [x for x in key] # prints key ['+', 'Y', '\xd1', '\x9d', '\xa0', '\xb5', '\x02', '\xbf', ';', '\x15', '\xef', '\xd5', '}', '\t', ']', '9']
3. Initialization Vector
In addition to the key, AES also needs an initialization vector. This initialization vector is generated with every encryption, and its purpose is to produce different encrypted data so that an attacker cannot use cryptanalysis to infer key data or message data.
A 16-byte initialization vector is required which is generated as follows.
iv = ''.join([chr(random.randint(0, 0xFF)) for i in range(16)])
The initialization vector must be transmitted to the receiver for proper decryption, but it need not be kept secret. It is packed into the output file at the beginning (after 8 bytes of the original file size), so the receiver can read it before decrypting the actual data.
4. Encrypting with AES
We now create the AES cipher and use it for encrypting a string (or a set of bytes; the data need not be text only).
The AES cipher is created with CBC Mode wherein each block is “chained” to the previous block in the stream. (You do not need to know the exact details unless you are interested. All you need to know is – use CBC mode).
Also, for AES encryption using pycrypto, you need to ensure that the data is a multiple of 16-bytes in length. Pad the buffer if it is not and include the size of the data at the beginning of the output, so the receiver can decrypt properly.
aes = AES.new(key, AES.MODE_CBC, iv) data = 'hello world 1234' # <- 16 bytes encd = aes.encrypt(data)
5. Decrypting with AES
Decryption requires the key that the data was encrypted with. You need to send the key to the receiver using a secure channel (not covered here).
In addition to the key, the receiver also needs the initialization vector. This can be communicated as plain text, no need for encryption here. One way to send this is to include it in the encrypted file, at the start, in plaintext form. We demonstrate this technique below (under File Encryption with AES). For now, we assume that the IV is available.
aes = AES.new(key, AES.MODE_CBC, iv) decd = adec.decrypt(encd) print decd # prints hello world 1234
And that is how simple it is. Now read on to know how to encrypt files properly.
6. File Encryption with AES
We have three issues to consider when encrypting files using AES. We explain them in detail below.
First step is to create the encryption cipher.
aes = AES.new(key, AES.MODE_CBC, iv)
6.1. Write the Size of the File
First we have to write the size of the file being encrypted to the output. This is required to remove any padding applied to the data while encrypting (check code below).
Determine the size of the file.
fsz = os.path.getsize(infile)
Open the output file and write the size of the file. We use the struct package for the purpose.
with open(encfile, 'w') as fout: fout.write(struct.pack('<Q', fsz))
6.2. Save the Initialization Vector
As explained above, the receiver needs the initialization vector. Write the initialization vector to the output, again in clear text.
6.3. Adjust Last Block
The third issue is that AES encryption requires that each block being written be a multiple of 16 bytes in size. So we read, encrypt and write the data in chunks. The chunk size is required to be a multiple of 16.
sz = 2048
This means the last block written might require some padding applied to it. This is the reason why the file size needs to be stored in the output.
Here is the complete write code.
with open(infile) as fin: while True: data = fin.read(sz) n = len(data) if n == 0: break elif n % 16 != 0: data += ' ' * (16 - n % 16) # <- padded with spaces encd = aes.encrypt(data) fout.write(encd)
7. Decrypting File Using AES
Now we need to reverse the above process to decrypt the file using AES.
First, open the encrypted file and read the file size and the initialization vector. The IV is required for creating the cipher.
with open(encfile) as fin: fsz = struct.unpack('<Q', fin.read(struct.calcsize('<Q'))) iv = fin.read(16)
Next create the cipher using the key and the IV. We assume the key has been communicated using some other secure channel.
aes = AES.new(key, AES.MODE_CBC, iv)
We also write the decrypted data to a “verification file”, so we can check the results of the encryption and decryption by comparing with the original file.
with open(verfile, 'w') as fout: while True: data = fin.read(sz) n = len(data) if n == 0: break decd = aes.decrypt(data) n = len(decd) if fsz > n: fout.write(decd) else: fout.write(decd[:fsz]) # <- remove padding on last block fsz -= n
Note that when the last block is read and decrypted, we need to remove the padding (if any has been applied). This is where we need the original file size.
And that is all there is to encrypting and decrypting a file using AES in python. We need to generate or obtain a key, create the initialization vector and write the original file size followed by the IV into the output file. This is followed by the encrypted data. Finally decryption does the same process in reverse.