What is a hash function?
A cryptographic hash function takes an input of any size and produces a fixed-length output (the "hash" or "digest"). The same input always produces the same output (deterministic), but even a tiny change to the input — flipping a single bit — produces a completely different hash (the avalanche effect).
Hash functions are one-way: it's computationally infeasible to reconstruct the original input from the hash. This makes them essential for password storage, data integrity verification, digital signatures, and content-addressable storage systems like Git.
The output length is fixed regardless of input size: SHA-256 always produces 256 bits (64 hex characters) whether you hash a single character or a 10 GB file. Good hash functions distribute outputs uniformly across the entire output space, minimising the chance of two different inputs producing the same hash (a "collision").
Hash algorithms compared
MD5 (128-bit output) was once the standard but is now cryptographically broken — practical collision attacks exist. It's still used for non-security checksums (verifying file downloads, cache keys) where speed matters more than collision resistance.
SHA-1 (160-bit output) is deprecated for security purposes after Google demonstrated a practical collision in 2017 (the "SHAttered" attack). Git still uses SHA-1 for object hashing but is migrating to SHA-256. Don't use SHA-1 for anything security-sensitive.
SHA-256 (256-bit output) is the current workhorse of cryptography. It's used in TLS certificates, Bitcoin mining, code signing, and most modern security protocols. No practical attacks are known. This is the default choice for almost all use cases.
SHA-512 (512-bit output) provides an even larger output space. It's actually faster than SHA-256 on 64-bit processors due to its internal architecture. Use it when you need extra margin or when your protocol specifies it.
Hashing vs encryption
This is one of the most common points of confusion in security. Hashing is one-way: you can convert input to a hash, but you cannot convert the hash back to the input. There is no key, no decryption step, no reversal. It's designed to be irreversible.
Encryption is two-way: you use a key to encrypt data, and the same key (symmetric) or a paired key (asymmetric) to decrypt it. The original data is recoverable by design — that's the entire point.
Use hashing when you need to verify data without storing the original (passwords, integrity checks). Use encryption when you need to protect data while preserving the ability to read it later (messages, files, database fields).