// Match. Extract. Validate.
PATTERNS ARE EVERYWHERE.
Email addresses, phone numbers, URLs, log entries—all have patterns. Regex is the universal language for describing and matching those patterns.
MASTER THE PATTERN.
Once you understand regex, you'll find patterns everywhere. Validation, extraction, search and replace—regex makes it all effortless.
Click a lesson to begin
What are regular expressions?
BeginnerMatch exact characters.
Beginner. ^ $ * + ? { } [ ] \ | ( )
Beginner[abc], [a-z], [^abc], \d \w \s
Beginner* + ? {n} {n,m}
Beginner^ $ \b \B word boundaries.
IntermediateCapturing groups and backreferences.
IntermediateOR matching with |.
IntermediateMatch special characters literally.
IntermediateZero-width assertions.
AdvancedEmail, URL, IP, date validation.
Advancedgrep, sed, awk, and more.
AdvancedRegular expressions (regex) are patterns used to match text. They're used in programming, command line tools, and text editors to find and manipulate text based on patterns.
1. What does regex stand for?
Most characters in regex match themselves. If you write "hello", it matches the string "hello".
# grep for literal "error" grep 'error' logfile.txt # Matches "error", "Error", "ERROR"? Only if case-insensitive flag used grep -i 'error' logfile.txt # Match "127.0.0.1" grep '127.0.0.1' /etc/hosts
# Default is case-sensitive grep 'Error' file.txt # only matches "Error" # Use -i for case-insensitive grep -i 'error' file.txt # matches "Error", "ERROR", "error"
1. By default, is regex case-sensitive?
These characters have special meaning in regex:
. Match any single character
^ Match at start of string
$ Match at end of string
* Match 0 or more of preceding
+ Match 1 or more of preceding
? Match 0 or 1 of preceding
{ } Match specific number
[ ] Match character class
\ Escape special character
| Alternation (OR)
( ) Group
$1 Backreference
# . matches any character grep 'c.t' file.txt # matches "cat", "cot", "cut" # * matches 0 or more grep 'ab*c' file.txt # matches "ac", "abc", "abbc" # + matches 1 or more grep 'ab+c' file.txt # matches "abc", "abbc" but NOT "ac"
1. What does . (dot) match in regex?
[abc] Match a, b, or c [^abc] NOT a, b, or c [a-z] Any lowercase letter [A-Z] Any uppercase letter [0-9] Any digit [a-zA-Z] Any letter [a-zA-Z0-9] Any alphanumeric
\d Digit [0-9] \D Non-digit [^0-9] \w Word character [a-zA-Z0-9_] \W Non-word character \s Whitespace (space, tab, newline) \S Non-whitespace
# Match any digit grep '[0-9]' file.txt grep '\d' file.txt # Match not a digit grep '[^0-9]' file.txt # Match vowel grep '[aeiou]' file.txt # Match not a vowel grep '[^aeiouAEIOU]' file.txt
1. What does \d match?
* 0 or more
+ 1 or more
? 0 or 1 (optional)
{n} Exactly n times
{n,} n or more times
{n,m} Between n and m times
# Match US phone number
grep '[0-9]{3}-[0-9]{3}-[0-9]{4}' file.txt
# Match IP address (simplified)
grep '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' file.txt
# Match optional s
grep 'colors?' file.txt # matches "color" and "colors"
# 1 or more digits
grep '[0-9]+' file.txt
# Whitespace
grep '\s+' file.txt
# Greedy (matches as much as possible) grep '<.+>' file.txt # matches entire...# Non-greedy (matches as little as possible) grep '<.+?>' file.txt # matchesandseparately
1. What does {3,5} mean?
Anchors don't match characters—they match positions.
^ Start of string (or line with -E) $ End of string (or line with -E) \b Word boundary \B Non-word boundary
# Lines starting with "error" grep '^error' logfile.txt # Lines ending with "failed" grep 'failed$' logfile.txt # Entire line is "error" grep '^error$' logfile.txt # Word "the" (not in "there") grep '\bthe\b' file.txt
1. What does ^ match?
Parentheses () create groups that capture matched text.
# Basic group grep '(ab)+' file.txt # matches "ab", "abab", "ababab" # Capture group reference # sed replacement: \1 refers to first group sed 's/\(capture\) \1/\1/g' file.txt
# Find repeated words grep '\b\([a-zA-Z]+\) \1\b' file.txt # Replace with single word (sed) sed 's/\([a-zA-Z]+\) \1/\1/g' file.txt
# PCRE (grep -P) named groups grep -P '(?[a-zA-Z]+) \k ' file.txt
1. What do parentheses create in regex?
The | operator matches either the expression before or after it.
# Match "error" or "warning" grep -E 'error|warning' logfile.txt # Match "cat" or "dog" or "bird" grep -E 'cat|dog|bird' file.txt # With grouping grep -E 'gr(e|a)y' file.txt # matches "grey" or "gray"
# Without parens (matches "abc" OR "xyz") grep -E 'abc|xyz' # With parens (matches "abc123" OR "xyz123") grep -E '(abc|xyz)123'
1. What does | match in regex?
Use backslash \ to match special characters literally.
# Match literal dot grep 'example\.com' file.txt # Match literal asterisk grep '\*' file.txt # Match literal dollar sign grep '\$' file.txt # Match literal backslash grep '\\' file.txt # Match [ literally grep '\[' file.txt
# In grep, escape these: . ^ $ * + ? { } [ ] \ |
# With -E (extended), fewer escapes needed
grep -E '\.' file.txt # still escape . in BRE
1. How do you match a literal dot?
Lookahead and lookbehind check for patterns without consuming characters.
(?=pattern) Positive lookahead
(?!pattern) Negative lookahead
(?<=pattern) Positive lookbehind
(?
Examples
# Positive lookahead: password followed by =
grep -P 'password(?==)' file.txt
# Negative lookahead: not followed by =
grep -P 'password(?!=)' file.txt
# Positive lookbehind: preceded by Mr.
grep -P '(?<=Mr\.)\w+' file.txt
# Practical: extract domain from URLs (lookbehind)
grep -oP '(?<=@)[^@]+' file.txt # email after @
Quiz
1. Do lookahead assertions consume characters?
Show Answers
- No (they're zero-width)
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
https?://[^\s]+
[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}
# YYYY-MM-DD
[0-9]{4}-[0-9]{2}-[0-9]{2}
# MM/DD/YYYY
[0-9]{2}/[0-9]{2}/[0-9]{4}
# DD.MM.YYYY
[0-9]{2}\.[0-9]{2}\.[0-9]{4}
# US format
[0-9]{3}-[0-9]{3}-[0-9]{4}
# International
\+[0-9]{1,3}[0-9]{4,14}
1. What is a common email regex pattern for the @ domain?
# Basic grep (BRE) grep 'pattern' file.txt # Extended grep (ERE) - less escaping needed grep -E 'pattern+' file.txt # Perl-compatible (most features) grep -P 'pattern' file.txt # Case insensitive grep -i 'pattern' file.txt # Invert match grep -v 'pattern' file.txt # Line numbers grep -n 'pattern' file.txt
# Find and replace sed 's/old/new/g' file.txt # Extended regex sed -E 's/(capture) this/\1 that/' file.txt
# Match pattern
awk '/pattern/ { print }' file.txt
# With condition on field
awk '$3 ~ /pattern/ { print }' file.txt
You've mastered regex! You now understand:
Regex is the universal language for pattern matching. Once you understand it, you'll find patterns everywhere—in validation, extraction, search, and replace operations.
Master regex and you'll handle text processing tasks that would otherwise require complex programming.
Match. Extract. Validate.