A hash or hash sum is a fixed-length sequence of characters obtained by transforming some initial data (numbers, text, a file, etc.) using a special mathematical algorithm; the sequence uniquely corresponds to the initial data, but does not allow it to be restored. The process of converting data into a hash is called hashing, and the hashing algorithm is known as the hash function. Most common hash functions output large numbers in hexadecimal form.
Features of hash functions
A hash has the following properties:
- Irreversibility. It is impossible to restore the initial data from the hash sum either mathematically or by brute force.
- Hashing the same initial data with the same hash function produces the same output.
- Uniqueness. Hashing different initial data produces a different hash — even if the difference is very slight. If, say, two different passwords produce the same hash, it is called a hash collision. If the probability of collision is high, the hash function is unreliable.
Areas of application
Data hashing is used in cryptography, as well as in verifying and storing information. The most common use cases include:
- Storing passwords and authentication. Typically, services store an array of passwords as hashes, denying both administrators and potential hackers direct access to them. To authenticate a user, the system hashes the password they enter and matches the resulting hash against the one stored for the corresponding username.
- Checking data for integrity.When sending messages and files, data may become corrupted — either accidentally (due to glitches) or intentionally. To make sure this doesn’t happen, the sender can forward the hash of the message to the recipient, who then matches it against the hash of the message they have received.
- Detecting malware — an expert or a security solution can compare the hash of a file against a database of hashes of malicious files. If it matches at least one hash in the database, the file is marked as malicious.
Popular hashing algorithms
An algorithm that generates 128-bit hashes. Formerly used for data protection, it was deemed insufficiently reliable in 2011 due to the high probability of collision. Nevertheless, this algorithm is still used for checking the integrity of content and identifying malicious files.
- SHA-1
SHA is the acronym for Secure Hash Algorithm. There are several algorithms that make up the SHA family. SHA-1 creates hashes that are 160 bits long. Like MD5, it was originally used to protect data, but has since been replaced by more modern algorithms.
- SHA-2
An enhanced version of SHA-1, this algorithm comes in several versions — the best-known of which is SHA-256. It generates a 256-bit hash and is used in blockchain technology to verify transactions.
- SHA-3
This algorithm was approved and published in 2015. Though it belongs to the SHA family, it differs significantly from both SHA-1 and SHA-2. SHA-3 can generate values of lengths of 224, 256, 384 or 512 bits.