In this article, we will see how to calculate the md5 from a large file in Python.
Using the hashlib
module in Python, we can generate the MD5 hash of a file. The hashlib
module has various hash functions including the MD5 hash which in turn simplifies the process of generating the MD5 hash from a file.
Process the file in chunks
The requirement here is to generate an MD5 hash from the large file. Since large files may not fit entirely into memory, we read them in multiple chunks. Such that we can avoid memory issues & process the file incrementally.
Following is an example of how to do this in Python
import hashlib
def calculate_md5(filename, chunk_size=4096):
md5 = hashlib.md5()
with open(filename, 'rb') as file:
while True:
data = file.read(chunk_size)
if not data:
break
md5.update(data)
return md5.hexdigest()
filename = 'path/to/your/file.ext'
md5_hash = calculate_md5(filename)
print(f"MD5 Hash: {md5_hash}")
In the above function, calculate_md5
, to prevent compatibility issues we open the file in binary mode rb
. The default chunk_size is configured as 4096 bytes but you can modify it as per your requirements.
We are reading the file in chunks using a while loop and updating the MD5 object with each chunk until the EOF is reached. Finally, we return the MD5 hash value as a hexadecimal string using the hexdigest()
method.
Make sure you replace the filename in the above example with your desired file name.