Regex tester

Test regular expressions with live match highlighting and capture group display. Presets for common patterns.

//
g=globali=case-insensitivem=multilines=dotAllu=unicode

What are regular expressions?

Regular expressions (regex or regexp) are a pattern-matching language used across virtually every programming language, text editor, and command-line tool. They describe search patterns using a concise, symbolic syntax — letting you find, extract, validate, and replace text that matches complex rules in a single expression.

The core regex syntax is mostly universal, but implementations differ in what's called "dialect" or "flavour." PCRE (Perl Compatible Regular Expressions) is the most feature-rich, used by PHP, many text editors, and tools like grep -P. JavaScript's ECMAScript regex engine (used by this tool) supports most common features including lookahead, lookbehind (since ES2018), named groups, and Unicode property escapes. POSIX regex, used in Unix utilities like sed and awk, has a more limited feature set.

Despite these differences, the fundamentals — character classes, quantifiers, anchors, and grouping — work the same way everywhere. Learning regex once gives you a skill that transfers across languages and tools, from database queries to CI pipeline configurations.

Regex syntax quick reference

PatternMeaningExample
.Any character (except newline by default)a.c → "abc", "a1c"
\dDigit [0-9]\d{3} → "123", "456"
\wWord character [a-zA-Z0-9_]\w+ → "hello", "var_1"
\sWhitespace (space, tab, newline)a\sb → "a b", "a\tb"
[abc]Any character in the set[aeiou] → vowels
[a-z]Character range[A-Z] → uppercase letters
[^abc]Any character NOT in the set[^0-9] → non-digits
^Start of string (or line with m flag)^Hello → starts with "Hello"
$End of string (or line with m flag)end$ → ends with "end"
*Zero or more (greedy)ab*c → "ac", "abc", "abbc"
+One or more (greedy)ab+c → "abc", "abbc"
?Zero or one (optional)colou?r → "color", "colour"
{n}Exactly n times\d{4} → "2026"
{n,m}Between n and m times\d{2,4} → "12", "123", "1234"
()Capture group(\d+)px → captures "12" from "12px"
(?:)Non-capturing group(?:ab)+ → groups without capturing
|Alternation (OR)cat|dog → "cat" or "dog"
\bWord boundary\bword\b → whole word match

Common regex patterns

Email validation is one of the most frequently searched regex tasks — and one of the most misunderstood. A simple pattern like [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} handles most real-world emails, but a fully RFC 5322-compliant regex is notoriously complex (thousands of characters long). In practice, validate the format loosely with regex and confirm the address exists by sending a verification email.

URL matching can be straightforward for http:// and https:// links, but gets complicated with internationalised domain names, ports, query strings, and fragments. For most use cases, https?://[\w.-]+(?:\.[a-z]{2,})(?:/[^\s]*)? works well enough.

Extracting numbers from text is a common ETL task. Use \d+ for integers, or -?\d+\.?\d* to include negative numbers and decimals. For comma-formatted numbers like "1,234,567", try \d{1,3}(?:,\d{3})*(?:\.\d+)?.

Password strength validation typically uses lookaheads: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%]).{8,}$ requires at least one lowercase, one uppercase, one digit, one special character, and a minimum of 8 characters — all in a single pattern.

Finding duplicate words is a clever use of backreferences: \b(\w+)\s+\1\b matches repeated words like "the the" or "is is". This is a favourite in proofreading and text cleanup workflows.

Frequently asked questions

Why doesn't my regex work the same in JavaScript and Python?
Regular expression engines differ between languages. JavaScript historically lacked lookbehind support (added in ES2018), while Python has had it for years. Named capture group syntax differs — JavaScript uses (?<name>...) while some older engines use (?P<name>...). Unicode handling, flag names, and behaviour around newlines also vary. The s (dotAll) flag is re.DOTALL in Python. Always test in the target language's engine — this tool uses JavaScript's ECMAScript regex implementation.
How do I match a literal dot or bracket?
Special characters like ., [, (, *, +, and ? have meaning in regex syntax. To match them literally, escape with a backslash: \. matches a period, \[ matches a square bracket. Alternatively, place the character inside a character class: [.] matches a literal dot (most special characters lose their meaning inside []). Inside a character class, only ], \, ^ (at start), and - (between characters) need escaping.
What's the difference between .* and .*?
Both match "any character, zero or more times," but they differ in greediness. .* is greedy — it matches as much text as possible, then backtracks. .*? is lazy (or reluctant) — it matches as little as possible. Example: given <b>hello</b> <b>world</b>, the pattern <b>.*</b> matches the entire string (greedy), while <b>.*?</b> matches just <b>hello</b> (lazy). When extracting content between delimiters, lazy quantifiers are almost always what you want.
Is regex the right tool for parsing HTML?
No. This is one of the most famous answers in programming — regular expressions cannot reliably parse HTML because HTML is a nested, context-sensitive language, and regex (in the formal computer science sense) can only match regular languages. A regex can't track matching opening and closing tags, handle self-closing elements, or deal with attributes containing special characters. For HTML parsing, use a proper DOM parser like the browser's built-in DOMParser, cheerio (Node.js), or BeautifulSoup (Python). Regex is fine for quick-and-dirty text extraction from known, simple structures — but never for general HTML processing.