Regular Expressions
Summary: in this tutorial, you will learn master pattern matching in powershell. learn regex syntax, -match operator, select-string, [regex] class, named captures, and practical text processing patterns.
Regular Expressions
Regular expressions (regex) are a powerful pattern-matching language for text processing. In PowerShell, regex integrates seamlessly with operators and cmdlets, enabling sophisticated text search, validation, and transformation.
This chapter teaches you regex from fundamentals to advanced patterns, focusing on PowerShell-specific features like the -match operator, Select-String cmdlet, and the [regex] .NET class.
Why Learn Regular Expressions
Text processing is everywhere:
- Log analysis: Extract error codes, timestamps, IP addresses
- Data validation: Verify email formats, phone numbers, credit cards
- Text transformation: Replace patterns, extract data
- File searching: Find files containing specific patterns
Without regex, you'd write brittle code with IndexOf, Substring, and fragile string manipulation. Regex provides a declarative language for pattern matching.
Regex Basics: Literal Matching
The simplest regex matches literal characters:
# Does the string contain "error"?
"Error occurred" -match "error" # True (case-insensitive by default)
"Success" -match "error" # False
# Case-sensitive matching
"Error occurred" -cmatch "error" # False (lowercase "e" doesn't match "E")
"Error occurred" -cmatch "Error" # True
# Case-insensitive (explicit)
"Error occurred" -imatch "ERROR" # True
PowerShell's -match operator:
- Returns
Trueif the pattern matches anywhere in the string - Case-insensitive by default (unlike many languages)
- Use
-cmatchfor case-sensitive matching
Capturing Matched Text
When -match succeeds, PowerShell populates the automatic variable $Matches:
"Server: 192.168.1.100" -match "(\d+\.\d+\.\d+\.\d+)"
$Matches[0] # Full match: "192.168.1.100"
$Matches[1] # First capture group: "192.168.1.100"
# Multiple captures
"Error code: 404" -match "Error code: (\d+)"
$Matches[1] # "404"
# Named captures
"User: alice" -match "User: (?<username>\w+)"
$Matches.username # "alice"
$Matches['username'] # Same thing
Character Classes: Matching Sets
Character classes match one character from a set:
# Basic classes
"cat" -match "[abc]" # True (contains 'c' or 'a')
"dog" -match "[abc]" # False
# Ranges
"file1.txt" -match "[0-9]" # True (contains digit)
"fileA.txt" -match "[a-z]" # True (contains lowercase letter)
"FILE1" -match "[A-Z]" # True (contains uppercase letter)
# Negated classes (NOT)
"file1.txt" -match "[^0-9]" # True (contains non-digit)
"12345" -match "[^0-9]" # False (only digits)
# Predefined classes
"file_123" -match "\d" # \d = digit [0-9]
"file_123" -match "\D" # \D = non-digit [^0-9]
"hello world" -match "\s" # \s = whitespace (space, tab, newline)
"hello" -match "\S" # \S = non-whitespace
"var_name" -match "\w" # \w = word character [a-zA-Z0-9_]
"@#$" -match "\W" # \W = non-word character
Common character classes:
\d— Digit (0-9)\w— Word character (letters, digits, underscore)\s— Whitespace (space, tab, newline).— Any character (except newline)
Escaping special characters:
Many characters have special meaning in regex (. * + ? [ ] { } ( ) ^ $ | \). To match them literally, escape with \:
# Wrong: . matches ANY character
"test.txt" -match "test.txt" # Matches "testAtxt" too!
# Correct: \. matches literal dot
"test.txt" -match "test\.txt" # True
"testAtxt" -match "test\.txt" # False
# Escape other special characters
"Price: $100" -match "\$\d+" # \$ = literal dollar sign
"[ERROR]" -match "\[ERROR\]" # \[ \] = literal brackets
Quantifiers: Repetition
Quantifiers specify how many times a pattern should repeat:
# * = 0 or more
"file" -match "file\d*" # Matches "file", "file1", "file123"
"" -match "a*" # True (0 'a' characters)
# + = 1 or more
"file" -match "file\d+" # False (no digits)
"file1" -match "file\d+" # True (at least one digit)
# ? = 0 or 1 (optional)
"color" -match "colou?r" # Matches "color" or "colour"
"honour" -match "honou?r" # True
# {n} = exactly n times
"file123" -match "\d{3}" # True (exactly 3 digits)
"file12" -match "\d{3}" # False
# {n,} = n or more times
"file123" -match "\d{2,}" # True (at least 2 digits)
# {n,m} = between n and m times
"192.168.1.1" -match "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" # IP address pattern
Greedy vs Lazy Quantifiers
By default, quantifiers are greedy — they match as much as possible:
# Greedy: .* matches as much as possible
"<div>Hello</div>" -match "<.*>"
$Matches[0] # "<div>Hello</div>" (entire string)
# Lazy: .*? matches as little as possible
"<div>Hello</div>" -match "<.*?>"
$Matches[0] # "<div>" (stops at first >)
Make any quantifier lazy by adding ?:
*?— 0 or more (lazy)+?— 1 or more (lazy)??— 0 or 1 (lazy){n,}?— n or more (lazy)
Anchors: Position Matching
Anchors match positions, not characters:
# ^ = start of string
"hello" -match "^h" # True (starts with 'h')
"ahello" -match "^h" # False
# $ = end of string
"hello" -match "o$" # True (ends with 'o')
"hello " -match "o$" # False (ends with space)
# \b = word boundary
"hello world" -match "\bhello\b" # True (whole word)
"helloworld" -match "\bhello\b" # False (not a whole word)
# Full line matching
"192.168.1.1" -match "^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$" # True
"IP: 192.168.1.1" -match "^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$" # False (extra text)
When to use anchors:
^and$— Validate entire string format (emails, phone numbers)\b— Match whole words only (avoid partial matches)
Groups and Capturing
Parentheses () create groups for capturing or applying quantifiers:
# Capture groups extract parts of the match
"Error code: 404" -match "Error code: (\d+)"
$Matches[1] # "404"
# Multiple captures
"Date: 2026-02-11" -match "(\d{4})-(\d{2})-(\d{2})"
$Matches[1] # "2026" (year)
$Matches[2] # "02" (month)
$Matches[3] # "11" (day)
# Named captures (much clearer!)
"Date: 2026-02-11" -match "(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})"
$Matches.year # "2026"
$Matches.month # "02"
$Matches.day # "11"
# Non-capturing groups (?: ) — group without capturing
"file123.txt" -match "file(?:\d+)\.txt" # Groups digits but doesn't capture
# Groups with quantifiers
"192.168.1.1" -match "^(\d{1,3}\.){3}\d{1,3}$" # Match IP pattern (repeat "nnn." 3 times)
Named captures are powerful for extracting structured data:
$logLine = "[2026-02-11 14:32:05] ERROR: Connection timeout"
$pattern = '^\[(?<date>[\d-]+) (?<time>[\d:]+)\] (?<level>\w+): (?<message>.+)$'
if ($logLine -match $pattern) {
[PSCustomObject]@{
Date = $Matches.date
Time = $Matches.time
Level = $Matches.level
Message = $Matches.message
}
}
Alternation: OR Logic
Use | to match one pattern or another:
# Match "cat" or "dog"
"I have a cat" -match "cat|dog" # True
"I have a bird" -match "cat|dog" # False
# Match file extensions
"document.pdf" -match "\.(pdf|docx|txt)$" # True
"image.png" -match "\.(pdf|docx|txt)$" # False
# Match multiple error levels
"[WARNING] Disk space low" -match "\[(ERROR|WARNING|INFO)\]"
$Matches[1] # "WARNING"
Common Practical Patterns
Email Validation
$email = "user@example.com"
$emailPattern = '^[\w\.-]+@[\w\.-]+\.\w{2,}$'
$email -match $emailPattern # True
# Breakdown:
# ^ — Start of string
# [\w\.-]+ — Username (letters, digits, dot, dash)
# @ — Literal @
# [\w\.-]+ — Domain name
# \. — Literal dot
# \w{2,} — TLD (2+ letters)
# $ — End of string
Phone Numbers
# US format: (123) 456-7890 or 123-456-7890
$phone = "(555) 123-4567"
$phonePattern = '^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$'
$phone -match $phonePattern # True
# Extract parts
$phone -match '^\(?(?<area>\d{3})\)?[-.\s]?(?<prefix>\d{3})[-.\s]?(?<line>\d{4})$'
$Matches.area # "555"
$Matches.prefix # "123"
$Matches.line # "4567"
IP Addresses
$ip = "192.168.1.100"
$ipPattern = '^(\d{1,3}\.){3}\d{1,3}$'
$ip -match $ipPattern # True
# More precise (0-255 range) - complex but accurate
$ipPrecise = '^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'
"192.168.1.300" -match $ipPrecise # False (300 > 255)
URLs
$url = "https://www.example.com/path?query=value"
$urlPattern = '^https?://[\w\.-]+(?:/[\w\.-]*)*(?:\?[\w=&]*)?$'
$url -match $urlPattern # True
# Extract parts
$url -match '^(?<protocol>https?)://(?<domain>[\w\.-]+)(?<path>/[\w\.-]*)*(?<query>\?[\w=&]*)?$'
$Matches.protocol # "https"
$Matches.domain # "www.example.com"
$Matches.path # "/path"
$Matches.query # "?query=value"
Select-String: PowerShell's Grep
Select-String searches files and text for regex patterns:
# Search file for pattern
Select-String -Path "app.log" -Pattern "ERROR"
# Case-sensitive
Select-String -Path "app.log" -Pattern "ERROR" -CaseSensitive
# Multiple files
Select-String -Path "*.log" -Pattern "Connection timeout"
# Recursive search
Select-String -Path "C:\Logs\*.log" -Pattern "ERROR" -Recurse
# Show context (lines before/after)
Select-String -Path "app.log" -Pattern "ERROR" -Context 2,3 # 2 lines before, 3 after
# Multiple patterns
Select-String -Path "app.log" -Pattern "ERROR|WARNING|CRITICAL"
# Invert match (lines that DON'T match)
Select-String -Path "app.log" -Pattern "DEBUG" -NotMatch
Processing Select-String Results
# Get matches
$matches = Select-String -Path "*.log" -Pattern "Error code: (\d+)"
foreach ($match in $matches) {
[PSCustomObject]@{
File = $match.Filename
Line = $match.LineNumber
Text = $match.Line
ErrorCode = $match.Matches.Groups[1].Value
}
} | Format-Table -AutoSize
# Extract all IP addresses from logs
Select-String -Path "access.log" -Pattern "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" -AllMatches |
ForEach-Object { $_.Matches.Value } |
Sort-Object -Unique
The [regex] Class: Advanced Operations
PowerShell's [regex] class provides more control than operators:
# Create regex object
$pattern = [regex]'ERROR: (?<code>\d+)'
# Test match
$pattern.IsMatch("ERROR: 404") # True
# Get match details
$match = $pattern.Match("ERROR: 404")
$match.Success # True
$match.Value # "ERROR: 404"
$match.Groups['code'].Value # "404"
# Find all matches
$text = "ERROR: 404, ERROR: 500, ERROR: 503"
$pattern.Matches($text) | ForEach-Object {
$_.Groups['code'].Value
} # Output: 404, 500, 503
Replace with Regex
# Simple replacement
$text = "Phone: 123-456-7890"
$text -replace "\d{3}-\d{3}-\d{4}", "[REDACTED]" # "Phone: [REDACTED]"
# Use capture groups in replacement
$text = "Date: 2026-02-11"
$text -replace "(\d{4})-(\d{2})-(\d{2})", '$3/$2/$1' # "Date: 11/02/2026"
# Named captures in replacement
$text = "Hello, Alice!"
$text -replace "Hello, (?<name>\w+)!", 'Greetings, ${name}.' # "Greetings, Alice."
# Replace with expression
$text = "Price: $50"
[regex]::Replace($text, '\$(\d+)', {
param($match)
'$' + ([int]$match.Groups[1].Value * 1.10)
}) # "Price: $55" (10% increase)
Split with Regex
# Split by multiple delimiters
$text = "apple,banana;cherry|date"
$text -split '[,;|]' # Array: apple, banana, cherry, date
# Split by whitespace (any amount)
"word1 word2 word3" -split '\s+' # Array: word1, word2, word3
# Split and keep delimiters
$text = "Part1:Part2:Part3"
[regex]::Split($text, '(:)') # Array: Part1, :, Part2, :, Part3
Performance Considerations
Regex can be expensive. Optimize where possible:
# Slow: Recompile regex each iteration
1..1000 | ForEach-Object {
"test$_" -match "test\d+"
}
# Fast: Compile once, reuse
$pattern = [regex]'test\d+'
1..1000 | ForEach-Object {
$pattern.IsMatch("test$_")
}
# Even faster: Compiled regex
$pattern = [regex]::new('test\d+', [System.Text.RegularExpressions.RegexOptions]::Compiled)
Use compiled regex for patterns used repeatedly in loops.
Exercises
Parse Apache-style log files:
192.168.1.100 - - [11/Feb/2026:14:32:05 +0000] "GET /api/users HTTP/1.1" 200 1234
Extract IP, date, method, path, status code, and size into objects.
Show Solution
function Parse-ApacheLog {
[CmdletBinding()]
param(
[Parameter(Mandatory, ValueFromPipeline)]
[string]$LogLine
)
begin {
# Pattern breaks down:
# IP: \d{1,3}(\.\d{1,3}){3}
# Date: \[([^\]]+)\]
# Method/Path: "(\w+) ([^"]+) HTTP
# Status: (\d{3})
# Size: (\d+)
$pattern = @'
^(?<ip>\d{1,3}(?:\.\d{1,3}){3})\s+-\s+-\s+\[(?<date>[^\]]+)\]\s+"(?<method>\w+)\s+(?<path>[^"]+)\s+HTTP/[\d\.]+"\s+(?<status>\d{3})\s+(?<size>\d+)
'@
}
process {
if ($LogLine -match $pattern) {
[PSCustomObject]@{
IPAddress = $Matches.ip
DateTime = $Matches.date
Method = $Matches.method
Path = $Matches.path
StatusCode = [int]$Matches.status
Bytes = [int]$Matches.size
}
}
else {
Write-Warning "Failed to parse: $LogLine"
}
}
}
# Test
$logs = @(
'192.168.1.100 - - [11/Feb/2026:14:32:05 +0000] "GET /api/users HTTP/1.1" 200 1234'
'192.168.1.101 - - [11/Feb/2026:14:32:10 +0000] "POST /api/login HTTP/1.1" 401 89'
'192.168.1.102 - - [11/Feb/2026:14:32:15 +0000] "GET /images/logo.png HTTP/1.1" 404 0'
)
$parsed = $logs | Parse-ApacheLog
$parsed | Format-Table -AutoSize
# Analyze
$parsed | Group-Object StatusCode | Select-Object Name, Count
$parsed | Where-Object { $_.StatusCode -ge 400 } | Format-Table
Create validators for:
- Email addresses
- Phone numbers (US format)
- Credit card numbers (basic format, not real validation)
- URLs
Show Solution
function Test-DataFormat {
[CmdletBinding()]
param(
[Parameter(Mandatory)]
[string]$Data,
[Parameter(Mandatory)]
[ValidateSet('Email', 'Phone', 'CreditCard', 'URL')]
[string]$Type
)
$patterns = @{
Email = '^[\w\.-]+@[\w\.-]+\.\w{2,}$'
Phone = '^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$'
CreditCard = '^\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}$'
URL = '^https?://[\w\.-]+(?:/[\w\.-]*)*(?:\?[\w=&]*)?$'
}
$result = $Data -match $patterns[$Type]
[PSCustomObject]@{
Data = $Data
Type = $Type
IsValid = $result
Pattern = $patterns[$Type]
}
}
# Test cases
$testData = @(
@{ Data = 'user@example.com'; Type = 'Email' }
@{ Data = 'invalid-email'; Type = 'Email' }
@{ Data = '(555) 123-4567'; Type = 'Phone' }
@{ Data = '555-123-4567'; Type = 'Phone' }
@{ Data = '12345'; Type = 'Phone' }
@{ Data = '1234-5678-9012-3456'; Type = 'CreditCard' }
@{ Data = '1234 5678 9012 3456'; Type = 'CreditCard' }
@{ Data = 'https://www.example.com/path?q=test'; Type = 'URL' }
@{ Data = 'not-a-url'; Type = 'URL' }
)
$results = $testData | ForEach-Object {
Test-DataFormat -Data $_.Data -Type $_.Type
}
$results | Format-Table -AutoSize
# Summary
Write-Host "`nValidation Summary:" -ForegroundColor Cyan
$results | Group-Object Type | ForEach-Object {
$valid = ($_.Group | Where-Object IsValid).Count
$total = $_.Count
Write-Host "$($_.Name): $valid/$total valid" -ForegroundColor $(if ($valid -eq $total) { 'Green' } else { 'Yellow' })
}
Create a function that:
- Redacts sensitive data (SSNs, credit cards, emails)
- Replaces with
[REDACTED] - Handles multiple patterns
Show Solution
function Protect-SensitiveData {
[CmdletBinding()]
param(
[Parameter(Mandatory, ValueFromPipeline)]
[string]$Text,
[Parameter()]
[ValidateSet('SSN', 'CreditCard', 'Email', 'Phone', 'All')]
[string[]]$RedactTypes = 'All'
)
begin {
# Define patterns
$patterns = @{
SSN = @{
Pattern = '\b\d{3}-\d{2}-\d{4}\b'
Label = '[SSN REDACTED]'
}
CreditCard = @{
Pattern = '\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b'
Label = '[CARD REDACTED]'
}
Email = @{
Pattern = '\b[\w\.-]+@[\w\.-]+\.\w{2,}\b'
Label = '[EMAIL REDACTED]'
}
Phone = @{
Pattern = '\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'
Label = '[PHONE REDACTED]'
}
}
# Determine which patterns to apply
if ($RedactTypes -contains 'All') {
$typesToRedact = $patterns.Keys
}
else {
$typesToRedact = $RedactTypes
}
}
process {
$redacted = $Text
$redactionsMade = @()
foreach ($type in $typesToRedact) {
$pattern = $patterns[$type].Pattern
$label = $patterns[$type].Label
$matches = [regex]::Matches($redacted, $pattern)
if ($matches.Count -gt 0) {
$redactionsMade += "$type ($($matches.Count) occurrence(s))"
$redacted = $redacted -replace $pattern, $label
}
}
[PSCustomObject]@{
Original = $Text
Redacted = $redacted
RedactionsMade = $redactionsMade -join ', '
}
}
}
# Test
$testText = @"
Customer contact:
Name: John Doe
Email: john.doe@example.com
Phone: (555) 123-4567
SSN: 123-45-6789
Credit Card: 4532-1234-5678-9010
Please reach out via email or phone for assistance.
"@
Write-Host "Original Text:" -ForegroundColor Yellow
Write-Host $testText
Write-Host "`nRedacted Text:" -ForegroundColor Cyan
$result = Protect-SensitiveData -Text $testText
Write-Host $result.Redacted
Write-Host "`nRedactions Made:" -ForegroundColor Green
Write-Host $result.RedactionsMade
# Test selective redaction
Write-Host "`n--- Redact Emails Only ---" -ForegroundColor Magenta
$emailOnly = Protect-SensitiveData -Text $testText -RedactTypes Email
Write-Host $emailOnly.Redacted
Summary
Regular expressions are essential for text processing in PowerShell:
- Basic patterns: Literals, character classes (
\d,\w,\s), quantifiers (*,+,?,{n,m}) - Anchors:
^(start),$(end),\b(word boundary) - Groups: Capture with
(), name with(?<name>...), extract with$Matches - Operators:
-match,-replace,-split(case-insensitive),-cmatch(case-sensitive) - Select-String: Search files with context, recursion, multiple patterns
- [regex] class: Compile patterns, fine control over matching and replacement
Master regex to handle log parsing, data validation, text transformation, and complex string operations efficiently.
Written by the ShellRAG Team
The ShellRAG editorial team writes practical, beginner-friendly PowerShell tutorials with tested code examples and real-world use cases. Every article is technically reviewed for accuracy and updated regularly.
Learn more about us →