What is a Hash Function

A cryptographic hash function is a mathematical algorithm that takes any amount of input data and produces a fixed-size string of bytes. The output, called a hash or digest, is unique to each input-even tiny changes to the input produce completely different outputs.

Hash functions have four fundamental properties that make them useful for security and data integrity:

Deterministic

The same input always produces the same output. If you hash "hello" today, it produces the exact same digest tomorrow, next week, or on any computer worldwide. This consistency is critical for verification-you can hash a file and compare it later to confirm nothing changed.

One-Way Function

You cannot reverse the hash to recover the original input. If someone gives you a hash, there's no mathematical way to work backward to find what created it. This makes hashes useful for password storage and data protection.

Fixed Output Size

Hash functions always produce the same length output regardless of input size. You can hash a single character or an entire movie-the digest length stays constant. This makes hashes predictable and suitable for storage and comparison.

Avalanche Effect

A tiny change in input causes a dramatically different output. Change one character in your data, and the hash is completely unrecognizable. This makes it impossible to incrementally forge a hash-you need the exact original data.

Let's see a real example with SHA-256, one of the most common hash algorithms:

Input: "hello"
SHA-256 Output: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

Input: "hallo"  (changed one character)
SHA-256 Output: d3751713b7e2c4b9e4e8b7c8d8c1a8b9c1a0e2d3c4b5a6d7e8f9a0b1c2d3e4f5

Notice how changing just one letter produces a completely different hash. This is the avalanche effect in action.

Key Properties: Understanding Security

Cryptographic hash functions must possess three critical security properties. These aren't academic theory-they're practical guarantees that enable secure systems.

Pre-Image Resistance (First Preimage Attack)

Given a hash, you cannot find any input that produces it. This is the "one-way" property in action. In practical terms, if an attacker gets a hash of your password, they can't work backward to discover it through computation. They'd have to try billions of guesses, hashing each one until they match-which is slow even for attackers.

Collision Resistance (Second Preimage Attack)

You cannot find two different inputs that produce the same hash. This prevents forgery. If a collision exists, an attacker could create fake data with the same hash as legitimate data, making verification useless. This is why we've stopped using MD5 and SHA-1 for security-researchers found collisions.

Second Pre-Image Resistance

Given an original input and its hash, you cannot find a different input producing the same hash. This prevents someone from swapping out data while preserving the hash. It's slightly different from collision resistance in that you're not searching blindly-you're trying to match a specific known hash.

MD5: The Legacy Warning

MD5 (Message Digest 5) produces a 128-bit hash output, typically displayed as 32 hexadecimal characters.

Input: "hello"
MD5 Output: 5d41402abc4b2a76b9719d911017c592

MD5 was designed in 1992 as a cryptographic hash function. For its time, it worked fine. However, cryptanalysis discovered severe weaknesses. In 2004, researchers demonstrated practical collision attacks. By 2012, it was trivially easy to generate two files with identical MD5 hashes.

Do not use MD5 for security-critical applications. Never hash passwords with MD5. Never use it for digital signatures, message authentication, or any security purpose where hash integrity matters.

That said, MD5 remains acceptable for:

  • Non-security checksums-detecting accidental file corruption in storage systems where attackers can't manipulate data
  • Legacy compatibility-systems where switching requires expensive migration
  • Simple change detection-flagging that a file has been modified, without security implications

SHA-1: Deprecated but Still Lingering

SHA-1 (Secure Hash Algorithm 1) produces a 160-bit hash, displayed as 40 hexadecimal characters. It replaced MD5 and was considered secure for years.

Input: "hello"
SHA-1 Output: aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d

Google's SHAttered attack in 2017 proved SHA-1 was broken. Researchers crafted two different PDF files with identical SHA-1 hashes, breaking the collision resistance property. The attack required significant computational resources, but it demonstrated the algorithm was obsolete for security.

Avoid SHA-1 for new projects. However, you'll encounter it in legacy systems, most notably in Git, where it's used for object identification. The Git project is gradually transitioning to SHA-256, but billions of existing repositories use SHA-1 hashes as commit identifiers.

SHA-256: The Modern Standard

SHA-256 (Secure Hash Algorithm 256) produces a 256-bit hash, displayed as 64 hexadecimal characters. It's part of the SHA-2 family and is the current standard for security-critical applications.

Input: "hello"
SHA-256 Output: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

SHA-256 is fast enough for most applications while providing strong collision resistance. No practical attacks exist. Use cases include:

  • TLS certificates-securing HTTPS connections
  • Bitcoin-the blockchain's core hashing algorithm
  • File integrity verification-checksums for software downloads
  • HMAC (message authentication)-signing API requests
  • General-purpose hashing-when you need a secure, standard algorithm
SHA-256 is your default choice. If you need a cryptographic hash and nothing else specifies otherwise, use SHA-256. It's fast, secure, widely implemented, and trusted by infrastructure worldwide.

SHA-512: When You Need More

SHA-512 produces a 512-bit hash (128 hexadecimal characters). It's also part of SHA-2 and offers stronger collision resistance than SHA-256.

When should you choose SHA-512 over SHA-256

  • 64-bit systems-SHA-512 is faster on 64-bit architectures because its operations work with 64-bit values
  • Longer-term security-the 512-bit output provides larger security margins for data that must remain confidential for decades
  • Organizational requirements-some compliance frameworks mandate it for classified data

The tradeoff: SHA-512 produces longer hashes (twice the size of SHA-256) and is slightly slower on 32-bit systems. For most modern applications, this difference is negligible.

SHA-3: A Different Construction

SHA-3, based on the Keccak algorithm, was finalized in 2015 as a new NIST standard. Unlike SHA-2, which uses a traditional Merkle-Damgård construction, SHA-3 uses a "sponge function"-a completely different mathematical approach.

SHA-3 exists alongside SHA-2 because:

  • Diversity-if a severe vulnerability is found in SHA-2, the world has an alternative from a trusted source
  • Research-it represents the latest thinking in hash algorithm design
  • Resistance to length-extension attacks-SHA-3's design prevents a specific class of attack that can affect SHA-2 in certain contexts

However, SHA-3 hasn't yet become dominant. SHA-2 remains the practical standard. Use SHA-3 if it's explicitly required, but SHA-256 will serve you well in nearly all cases.

RIPEMD-160: Bitcoin's Choice

RIPEMD-160 produces a 160-bit hash (40 hexadecimal characters). You won't encounter it often outside Bitcoin and blockchain systems, but it's worth knowing because of Bitcoin's prominence.

Bitcoin uses RIPEMD-160 as part of its address generation: it hashes the public key with SHA-256, then hashes that result with RIPEMD-160. This two-stage approach creates shorter addresses while maintaining security.

Unless you're working with Bitcoin or similar blockchain systems, you'll rarely need RIPEMD-160. SHA-256 is the better choice for new applications.

Hashing vs. Encryption: A Critical Distinction

Developers frequently confuse hashing and encryption-they sound similar and both transform data, but they're fundamentally different tools.

Hashing

  • One-way-you cannot reverse the process to recover the original
  • Deterministic-same input always produces same output
  • No key-the process is the same every time (though HMAC uses a secret key)
  • Use case-password storage, file integrity, digital signatures

Encryption

  • Reversible-with the correct key, you can decrypt to recover the original
  • Key-dependent-the same plaintext encrypted with different keys produces different ciphertexts
  • Provides confidentiality-only someone with the key can read the data
  • Use case-protecting sensitive data in transit or at rest

The wrong choice leads to security vulnerabilities. If you encrypt a password and lose the encryption key, users are locked out forever. If you hash passwords but someone gains access to your database, they can offline brute-force the hashes. Password hashing requires special algorithms designed for speed resistance.

Password Hashing: The Right Way

This is where many developers go wrong: you cannot use SHA-256 to hash passwords.

Why Because hash functions are fast. SHA-256 computes in microseconds. An attacker with access to your password database can try billions of guesses per second, hashing each one until they find a match. Modern GPUs can attempt trillions of SHA-256 hashes per second.

Never hash passwords with plain SHA-256, SHA-1, or MD5. You will be compromised. The speed that makes these algorithms great for file integrity makes them terrible for password security.

Instead, use password hashing algorithms designed to be slow and resistant to attacks:

  • bcrypt-industry standard, includes automatic salting, adaptive difficulty
  • scrypt-memory-hard, resistant to GPU attacks
  • argon2-newest standard, winner of Password Hashing Competition, highly resistant to both GPU and ASIC attacks

These algorithms deliberately waste computational resources, making each guess expensive. What takes milliseconds becomes seconds or minutes, pushing brute-force attacks beyond practical limits.

HMAC: Keyed Hashing for Message Authentication

HMAC (Hash-based Message Authentication Code) combines a hash function with a secret key. Unlike a plain hash, HMAC proves both that a message is authentic and hasn't been tampered with.

HMAC-SHA256("secret_key", "message")
= 8bb7cf97fcfe16e0dac7c7ed05c10fa16e7dd3d8a09dd86f9c5cd1d4b88f0b63

Only someone with the correct secret key can generate a valid HMAC for a given message. This makes HMAC ideal for:

  • API request signing-proving a request came from you and wasn't modified
  • JWT (JSON Web Tokens)-HMAC-SHA256 is the standard JWT signing algorithm
  • Message authentication-authenticating messages in distributed systems without encryption

The difference from encryption: HMAC doesn't hide the message. Anyone can read it. But only someone with the key can produce a valid HMAC, proving authenticity.

Algorithm Comparison Table

Algorithm Output Bits Hex Chars Security Status Primary Use Case
MD5 128 32 Broken Legacy, non-security checksums
SHA-1 160 40 Deprecated Git, legacy systems
SHA-256 256 64 Secure Standard choice, TLS, Bitcoin
SHA-512 512 128 Secure 64-bit systems, long-term security
SHA-3 256-512 64-128 Secure Future-oriented, specialized use
RIPEMD-160 160 40 Secure Bitcoin addresses
bcrypt 192 60 (base64) Secure Password hashing
argon2 Variable Variable Secure Modern password hashing

Practical Recommendations

For file integrity verification: Use SHA-256. It's fast, standard, and provides strong security guarantees.

For password storage: Use bcrypt, scrypt, or argon2. Never use general-purpose hashes.

For API authentication: Use HMAC-SHA256 to sign requests, proving they came from you and weren't modified.

For digital signatures: Use SHA-256 with RSA or ECDSA for cryptographic proof of authenticity and non-repudiation.

For cryptographic proofs (blockchain): Use SHA-256 as the standard. Bitcoin uses it, Ethereum uses Keccak (SHA-3 variant).

When uncertain: Choose SHA-256. It's been vetted by the cryptographic community for nearly two decades, is widely implemented across all platforms, and provides excellent security-to-performance ratio.

Related Tools and Resources

Explore these online tools to see hashing in action:

Additional learning resources: