What is a Hash Function
A cryptographic hash function is a mathematical algorithm that takes any amount of input data and produces a fixed-size string of bytes. The output, called a hash or digest, is unique to each input-even tiny changes to the input produce completely different outputs.
Hash functions have four fundamental properties that make them useful for security and data integrity:
Deterministic
The same input always produces the same output. If you hash "hello" today, it produces the exact same digest tomorrow, next week, or on any computer worldwide. This consistency is critical for verification-you can hash a file and compare it later to confirm nothing changed.
One-Way Function
You cannot reverse the hash to recover the original input. If someone gives you a hash, there's no mathematical way to work backward to find what created it. This makes hashes useful for password storage and data protection.
Fixed Output Size
Hash functions always produce the same length output regardless of input size. You can hash a single character or an entire movie-the digest length stays constant. This makes hashes predictable and suitable for storage and comparison.
Avalanche Effect
A tiny change in input causes a dramatically different output. Change one character in your data, and the hash is completely unrecognizable. This makes it impossible to incrementally forge a hash-you need the exact original data.
Let's see a real example with SHA-256, one of the most common hash algorithms:
Input: "hello" SHA-256 Output: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 Input: "hallo" (changed one character) SHA-256 Output: d3751713b7e2c4b9e4e8b7c8d8c1a8b9c1a0e2d3c4b5a6d7e8f9a0b1c2d3e4f5
Notice how changing just one letter produces a completely different hash. This is the avalanche effect in action.
Key Properties: Understanding Security
Cryptographic hash functions must possess three critical security properties. These aren't academic theory-they're practical guarantees that enable secure systems.
Pre-Image Resistance (First Preimage Attack)
Given a hash, you cannot find any input that produces it. This is the "one-way" property in action. In practical terms, if an attacker gets a hash of your password, they can't work backward to discover it through computation. They'd have to try billions of guesses, hashing each one until they match-which is slow even for attackers.
Collision Resistance (Second Preimage Attack)
You cannot find two different inputs that produce the same hash. This prevents forgery. If a collision exists, an attacker could create fake data with the same hash as legitimate data, making verification useless. This is why we've stopped using MD5 and SHA-1 for security-researchers found collisions.
Second Pre-Image Resistance
Given an original input and its hash, you cannot find a different input producing the same hash. This prevents someone from swapping out data while preserving the hash. It's slightly different from collision resistance in that you're not searching blindly-you're trying to match a specific known hash.
MD5: The Legacy Warning
MD5 (Message Digest 5) produces a 128-bit hash output, typically displayed as 32 hexadecimal characters.
Input: "hello" MD5 Output: 5d41402abc4b2a76b9719d911017c592
MD5 was designed in 1992 as a cryptographic hash function. For its time, it worked fine. However, cryptanalysis discovered severe weaknesses. In 2004, researchers demonstrated practical collision attacks. By 2012, it was trivially easy to generate two files with identical MD5 hashes.
That said, MD5 remains acceptable for:
- Non-security checksums-detecting accidental file corruption in storage systems where attackers can't manipulate data
- Legacy compatibility-systems where switching requires expensive migration
- Simple change detection-flagging that a file has been modified, without security implications
SHA-1: Deprecated but Still Lingering
SHA-1 (Secure Hash Algorithm 1) produces a 160-bit hash, displayed as 40 hexadecimal characters. It replaced MD5 and was considered secure for years.
Input: "hello" SHA-1 Output: aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d
Google's SHAttered attack in 2017 proved SHA-1 was broken. Researchers crafted two different PDF files with identical SHA-1 hashes, breaking the collision resistance property. The attack required significant computational resources, but it demonstrated the algorithm was obsolete for security.
SHA-256: The Modern Standard
SHA-256 (Secure Hash Algorithm 256) produces a 256-bit hash, displayed as 64 hexadecimal characters. It's part of the SHA-2 family and is the current standard for security-critical applications.
Input: "hello" SHA-256 Output: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
SHA-256 is fast enough for most applications while providing strong collision resistance. No practical attacks exist. Use cases include:
- TLS certificates-securing HTTPS connections
- Bitcoin-the blockchain's core hashing algorithm
- File integrity verification-checksums for software downloads
- HMAC (message authentication)-signing API requests
- General-purpose hashing-when you need a secure, standard algorithm
SHA-512: When You Need More
SHA-512 produces a 512-bit hash (128 hexadecimal characters). It's also part of SHA-2 and offers stronger collision resistance than SHA-256.
When should you choose SHA-512 over SHA-256
- 64-bit systems-SHA-512 is faster on 64-bit architectures because its operations work with 64-bit values
- Longer-term security-the 512-bit output provides larger security margins for data that must remain confidential for decades
- Organizational requirements-some compliance frameworks mandate it for classified data
The tradeoff: SHA-512 produces longer hashes (twice the size of SHA-256) and is slightly slower on 32-bit systems. For most modern applications, this difference is negligible.
SHA-3: A Different Construction
SHA-3, based on the Keccak algorithm, was finalized in 2015 as a new NIST standard. Unlike SHA-2, which uses a traditional Merkle-Damgård construction, SHA-3 uses a "sponge function"-a completely different mathematical approach.
SHA-3 exists alongside SHA-2 because:
- Diversity-if a severe vulnerability is found in SHA-2, the world has an alternative from a trusted source
- Research-it represents the latest thinking in hash algorithm design
- Resistance to length-extension attacks-SHA-3's design prevents a specific class of attack that can affect SHA-2 in certain contexts
However, SHA-3 hasn't yet become dominant. SHA-2 remains the practical standard. Use SHA-3 if it's explicitly required, but SHA-256 will serve you well in nearly all cases.
RIPEMD-160: Bitcoin's Choice
RIPEMD-160 produces a 160-bit hash (40 hexadecimal characters). You won't encounter it often outside Bitcoin and blockchain systems, but it's worth knowing because of Bitcoin's prominence.
Bitcoin uses RIPEMD-160 as part of its address generation: it hashes the public key with SHA-256, then hashes that result with RIPEMD-160. This two-stage approach creates shorter addresses while maintaining security.
Unless you're working with Bitcoin or similar blockchain systems, you'll rarely need RIPEMD-160. SHA-256 is the better choice for new applications.
Hashing vs. Encryption: A Critical Distinction
Developers frequently confuse hashing and encryption-they sound similar and both transform data, but they're fundamentally different tools.
Hashing
- One-way-you cannot reverse the process to recover the original
- Deterministic-same input always produces same output
- No key-the process is the same every time (though HMAC uses a secret key)
- Use case-password storage, file integrity, digital signatures
Encryption
- Reversible-with the correct key, you can decrypt to recover the original
- Key-dependent-the same plaintext encrypted with different keys produces different ciphertexts
- Provides confidentiality-only someone with the key can read the data
- Use case-protecting sensitive data in transit or at rest
The wrong choice leads to security vulnerabilities. If you encrypt a password and lose the encryption key, users are locked out forever. If you hash passwords but someone gains access to your database, they can offline brute-force the hashes. Password hashing requires special algorithms designed for speed resistance.
Password Hashing: The Right Way
This is where many developers go wrong: you cannot use SHA-256 to hash passwords.
Why Because hash functions are fast. SHA-256 computes in microseconds. An attacker with access to your password database can try billions of guesses per second, hashing each one until they find a match. Modern GPUs can attempt trillions of SHA-256 hashes per second.
Instead, use password hashing algorithms designed to be slow and resistant to attacks:
- bcrypt-industry standard, includes automatic salting, adaptive difficulty
- scrypt-memory-hard, resistant to GPU attacks
- argon2-newest standard, winner of Password Hashing Competition, highly resistant to both GPU and ASIC attacks
These algorithms deliberately waste computational resources, making each guess expensive. What takes milliseconds becomes seconds or minutes, pushing brute-force attacks beyond practical limits.
HMAC: Keyed Hashing for Message Authentication
HMAC (Hash-based Message Authentication Code) combines a hash function with a secret key. Unlike a plain hash, HMAC proves both that a message is authentic and hasn't been tampered with.
HMAC-SHA256("secret_key", "message")
= 8bb7cf97fcfe16e0dac7c7ed05c10fa16e7dd3d8a09dd86f9c5cd1d4b88f0b63
Only someone with the correct secret key can generate a valid HMAC for a given message. This makes HMAC ideal for:
- API request signing-proving a request came from you and wasn't modified
- JWT (JSON Web Tokens)-HMAC-SHA256 is the standard JWT signing algorithm
- Message authentication-authenticating messages in distributed systems without encryption
The difference from encryption: HMAC doesn't hide the message. Anyone can read it. But only someone with the key can produce a valid HMAC, proving authenticity.
Algorithm Comparison Table
| Algorithm | Output Bits | Hex Chars | Security Status | Primary Use Case |
|---|---|---|---|---|
| MD5 | 128 | 32 | Broken | Legacy, non-security checksums |
| SHA-1 | 160 | 40 | Deprecated | Git, legacy systems |
| SHA-256 | 256 | 64 | Secure | Standard choice, TLS, Bitcoin |
| SHA-512 | 512 | 128 | Secure | 64-bit systems, long-term security |
| SHA-3 | 256-512 | 64-128 | Secure | Future-oriented, specialized use |
| RIPEMD-160 | 160 | 40 | Secure | Bitcoin addresses |
| bcrypt | 192 | 60 (base64) | Secure | Password hashing |
| argon2 | Variable | Variable | Secure | Modern password hashing |
Practical Recommendations
For file integrity verification: Use SHA-256. It's fast, standard, and provides strong security guarantees.
For password storage: Use bcrypt, scrypt, or argon2. Never use general-purpose hashes.
For API authentication: Use HMAC-SHA256 to sign requests, proving they came from you and weren't modified.
For digital signatures: Use SHA-256 with RSA or ECDSA for cryptographic proof of authenticity and non-repudiation.
For cryptographic proofs (blockchain): Use SHA-256 as the standard. Bitcoin uses it, Ethereum uses Keccak (SHA-3 variant).
When uncertain: Choose SHA-256. It's been vetted by the cryptographic community for nearly two decades, is widely implemented across all platforms, and provides excellent security-to-performance ratio.
Related Tools and Resources
Explore these online tools to see hashing in action:
- Hash Generator - compute SHA-256, SHA-512, MD5, and other hashes
- JWT Decoder - inspect and verify JWT tokens with HMAC-SHA256 signatures
- Base64 Encoder/Decoder - convert between binary and base64 (commonly used with hashes)
Additional learning resources:
- JWT Guide - JSON Web Tokens use HMAC-SHA256 for signing
- Hash Algorithms Compared - detailed technical comparison
- Web Security Guide - broader security context for hashing