Regex for email validation: why it's harder than you think

5 min readregexfundamentals

Every developer writes an email validation regex at some point. Most of them get it wrong — not because they're bad at regex, but because email addresses are far weirder than anyone expects.

The pattern everyone starts with

The first attempt usually looks something like this:

^[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z]{2,}$

This matches john@example.com and rejects obvious nonsense. It feels correct. But it fails on completely valid addresses like john.doe@example.com (dots in the local part), john+tag@example.com (plus addressing, used by Gmail), and admin@sub.domain.example.com (multiple dots in the domain).

A better pattern

Most production email validation uses something closer to this:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Let's break it down:

  • ^ and $ — anchor to the full string, no partial matches
  • [a-zA-Z0-9._%+-]+ — local part: letters, numbers, dots, underscores, percent signs, plus signs, hyphens
  • @ — the literal @ symbol
  • [a-zA-Z0-9.-]+ — domain: letters, numbers, dots, hyphens
  • \.[a-zA-Z]{2,}$ — TLD: a dot followed by at least 2 letters

This handles the vast majority of real-world email addresses. It's the pattern used by HTML5's <input type="email"> validation, with minor variations.

But it's still not perfect.

The addresses you're probably rejecting

Here are valid email addresses that the "better" pattern above rejects:

Quoted local parts: "john doe"@example.com is valid. Spaces, and even @ signs, are allowed inside double quotes in the local part. Almost nobody uses this, but it's legal.

IP address domains: user@[192.168.1.1] is valid. You can use an IP address instead of a domain name, enclosed in square brackets.

International characters: 用户@例え.jp is valid under the EAI (Email Address Internationalization) standard. Unicode is allowed in both the local part and the domain.

Very long TLDs: .photography, .international, .cancerresearch — TLDs can be much longer than 2-3 characters. The pattern [a-zA-Z]{2,} handles this, but patterns that cap at {2,4} or {2,6} will reject valid addresses.

The RFC 5322 problem

The formal specification for email address syntax is RFC 5322, and the regex that fully implements it is infamous. Here's a simplified version — the actual compliant pattern is over 6,000 characters long:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]
|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*
[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|
2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|
[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c
\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Nobody should use this in production. It's unmaintainable, it's slow on long inputs, and it still doesn't tell you whether the address actually receives mail.

What actually works in production

The pragmatic approach to email validation has three layers:

Layer 1: Basic format check (regex). Use the simple pattern from earlier. It catches typos and obviously invalid input. Don't try to be RFC-compliant — you'll reject fewer real users with a permissive pattern than with a strict one.

^[^\s@]+@[^\s@]+\.[^\s@]+$

This ultra-minimal pattern just checks: something, then @, then something with a dot, then something. It's deliberately permissive.

Layer 2: HTML5 validation. If you're building a web form, <input type="email"> gives you browser-native validation that covers the common cases. It uses a pattern similar to the "better" regex above.

Layer 3: Verification email. The only way to know if an email address works is to send a message to it and see if the user responds. Every signup flow that matters uses email verification. This makes the regex layer a courtesy check, not a gate.

Common regex mistakes in email validation

Requiring a dot in the local part. admin@localhost is valid. So is user@company on internal networks. If you're validating for internet-facing use, requiring a dot in the domain is reasonable. Requiring one in the local part is not.

Limiting TLD length. Before 2013, no TLD was longer than 6 characters, so patterns like [a-zA-Z]{2,6} were common. ICANN then introduced hundreds of new TLDs. Use {2,} with no upper limit.

Forgetting case insensitivity. Email addresses are case-insensitive in the domain part (per RFC) and conventionally case-insensitive in the local part (though technically the local part can be case-sensitive). Always match with the i flag or include both a-z and A-Z.

Not trimming whitespace. Users paste email addresses with leading or trailing spaces constantly. Trim the input before validating, not after.

Rejecting plus addressing. user+tag@gmail.com is valid and widely used for filtering. Many users deliberately use plus addresses when signing up for services. Rejecting the + character loses you real signups.

The recommendation

For most applications, this is the right approach:

// Trim whitespace
const email = input.trim();

// Basic format check
const isValid = /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email);

// Then send a verification email

The regex's job is to catch gross errors — missing @, empty strings, spaces in the middle. Everything else is the verification email's job. Trying to perfectly validate email addresses with regex alone is a solved problem in the sense that everyone has agreed the solution is "don't try."

Testing your patterns

Whatever pattern you choose, test it against a real set of edge cases:

| Address | Valid? | Why | | --- | --- | --- | | user@example.com | Yes | Standard address | | user.name@example.com | Yes | Dots in local part | | user+tag@example.com | Yes | Plus addressing | | user@sub.domain.com | Yes | Subdomain | | user@example.co.uk | Yes | Country-code TLD | | user@example.photography | Yes | Long TLD | | user@123.123.123.123 | Depends | IP literal (RFC allows it) | | user@localhost | Depends | No TLD (valid on intranets) | | @example.com | No | Empty local part | | user@ | No | Empty domain | | user @example.com | No | Space in local part | | user@@example.com | No | Double @ |

Paste these into the regex tester with your pattern to see which ones match.