Text Processing with grep, sed, and awk

Summary: in this tutorial, you will learn master grep for searching patterns and sed for stream editing — two essential text processing tools for the command line.

Text Processing with grep, sed, and awk

Unix was designed around the philosophy of small, specialized tools that do one thing well and can be combined. Three tools embody this philosophy perfectly for text manipulation:

  • grep: Search for patterns in text
  • sed: Transform text with stream editing
  • awk: Analyze and process structured data

Together, these tools can handle virtually any text processing task—from log analysis to data transformation, configuration file manipulation to report generation.

Why master these tools:

  • Speed: Process gigabytes of text in seconds
  • Ubiquity: Available on every Unix-like system
  • Composability: Chain them with pipes for complex workflows
  • Power: Replace hundreds of lines of code with one-liners
  • Essential skill: Core to system administration, DevOps, and data processing

This tutorial takes you from basic searches to advanced text transformations.

grep — Search for Patterns

grep stands for Global Regular Expression Print. It searches input line by line, printing lines that match a pattern.

Why grep matters:

  • Fast: Optimized for searching large files
  • Regex support: Match complex patterns, not just literal strings
  • Recursive: Search entire directory trees
  • Filter pipelines: Essential for filtering command output

Basic Search Operations

# Search for a word in a file
grep "error" logfile.txt
 
# What this does:
# - Reads logfile.txt line by line
# - Prints each line containing "error"
# - Case-sensitive by default
 
# Example output:
# [2026-02-11 10:23:45] ERROR: Connection failed
# [2026-02-11 10:24:12] error in module auth.py
 
# Search in multiple files
grep "TODO" *.py
 
# Output includes filename:
# app.py:# TODO: Add error handling
# utils.py:# TODO: Optimize this function
 
# Search recursively in directories
grep -r "function" src/
 
# -r (recursive): Searches all files under src/
# Useful for searching entire codebases
 
# Case-insensitive search
grep -i "error" logfile.txt
 
# -i flag: Matches error, Error, ERROR, ErRoR, etc.
# Essential when you don't know the exact case
 
# Show line numbers
grep -n "error" logfile.txt
 
# Output with line numbers:
# 15:[2026-02-11 10:23:45] ERROR: Connection failed
# 42:[2026-02-11 10:24:12] error in module auth.py
 
# Count matching lines
grep -c "error" logfile.txt
 
# Output: 2
# Just the count, not the lines themselves
 
# Show only matching part (not whole line)
grep -o "error[0-9]*" logfile.txt
 
# If line contains "error123", prints just "error123", not entire line
# Useful for extracting specific patterns from verbose output
 
# Invert match (show lines that DON'T match)
grep -v "debug" logfile.txt
 
# Prints all lines EXCEPT those containing "debug"
# Filters out noise in logs
 
# Show context around matches
grep -B 2 "error" logfile.txt    # 2 lines Before match
grep -A 3 "error" logfile.txt    # 3 lines After match
grep -C 2 "error" logfile.txt    # 2 lines Context (before and after)
 
# Example with -C 1:
# [2026-02-11 10:23:44] INFO: Connecting to database
# [2026-02-11 10:23:45] ERROR: Connection failed    <-- Match
# [2026-02-11 10:23:46] INFO: Retrying connection
 

Common option combinations:

# Search recursively, case-insensitive, show line numbers
grep -rin "TODO" src/
 
# List only filenames containing pattern
grep -rl "import React" src/
 
# -l: Just filenames, not the matching lines
# Useful when you want to know WHICH files, not WHERE in the files
 
# Count matches across all files
grep -rc "error" logs/
 
# Output per file:
# logs/app.log:42
# logs/auth.log:13
# logs/db.log:0
 
# Show filenames that DON'T contain pattern
grep -rL "test" src/
 
# Files without any "test" in them
 

Regular Expressions in grep

Regular expressions (regex) let you match patterns, not just literal text.

Basic regex (default in grep):

# Anchor: Start of line (^)
grep "^Error" logfile.txt
# Matches: Error at start
# Doesn't match:   Error (leading spaces)
# Doesn't match: An Error occurred
 
# Anchor: End of line ($)
grep "done$" logfile.txt
# Matches: Process completed done
# Doesn't match: done processing (has text after)
 
# Empty lines
grep "^$" file.txt
# ^ = start, $ = end, nothing between = empty line
 
# Character class: Any character in brackets
grep "err[01]r" file.txt
# Matches: err0r, err1r
# Doesn't match: error, err2r
 
# Character class: Range
grep "[0-9]" file.txt          # Any digit
grep "[A-Z]" file.txt          # Any uppercase letter
grep "[a-zA-Z]" file.txt       # Any letter
grep "[^0-9]" file.txt         # NOT a digit (^ inverts)
 
# Wildcard: Any single character (.)
grep "c.t" file.txt
# Matches: cat, cot, cut, c9t
# Doesn't match: ct (need exactly one character)
 
# Repetition: Zero or more (*)
grep "ab*c" file.txt
# Matches: ac (zero b's), abc (one b), abbc (two b's), abbbc...
# * applies to the PREVIOUS character
 
grep "colou*r" file.txt
# Matches: color (zero u), colour (one u), colouur (two u's)
 
# Word boundary (\b)
grep "\bcat\b" file.txt
# Matches: "the cat sat" (whole word)
# Doesn't match: "category" or "bobcat"
 

Extended regex (grep -E or egrep):

Extended regex adds more powerful operators. Use -E flag:

# OR: Match multiple alternatives (|)
grep -E "error|warning|fatal" logfile.txt
# Matches any line with error OR warning OR fatal
 
# One or more (+)
grep -E "ab+c" file.txt
# Matches: abc, abbc, abbbc (one or more b's)
# Doesn't match: ac (need at least one b)
 
# Zero or one (?)
grep -E "colou?r" file.txt
# Matches: color (zero u), colour (one u)
# Doesn't match: colouur (more than one u)
 
# Exact count {n}
grep -E "[0-9]{3}" file.txt
# Matches: Exactly 3 digits (like 123, 456)
 
# Range {n,m}
grep -E "[0-9]{2,4}" file.txt
# Matches: 2 to 4 digits (12, 123, 1234)
 
# At least n {n,}
grep -E "[0-9]{3,}" file.txt
# Matches: 3 or more digits (123, 1234, 12345...)
 
# Phone numbers
grep -E "[0-9]{3}-[0-9]{4}" file.txt
# Matches: 555-1234 format
 
grep -E "\([0-9]{3}\) [0-9]{3}-[0-9]{4}" file.txt
# Matches: (555) 123-4567 format
# Note: Parentheses need escaping with backslash
 
# Email addresses (simple pattern)
grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" file.txt
# Breaks down:
# [a-zA-Z0-9._%+-]+ = username part (one or more valid chars)
# @ = literal @
# [a-zA-Z0-9.-]+ = domain name
# \. = literal dot (escaped)
# [a-zA-Z]{2,} = TLD (at least 2 letters)
 
# URLs
grep -E "(https?://)[^ ]+" file.txt
# https? = http or https (? makes s optional)
# [^ ]+ = one or more non-space characters
 
# Lines starting with capital letter
grep -E "^[A-Z]" file.txt
 
# Lines that are only digits
grep -E "^[0-9]+$" file.txt
 
# Grouping with parentheses
grep -E "(error|warning): [0-9]+" logfile.txt
# Matches "error: 123" or "warning: 456"
 

Perl-compatible regex (grep -P):

# -P enables Perl regex (even more powerful)
 
# IP addresses
grep -P "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" logfile.txt
# \d = digit (shorthand for [0-9])
 
# Non-greedy matching
grep -P "start.*?end" file.txt
# *? = match as few characters as possible
 
# Lookahead
grep -P "password(?=\s+\w+)" file.txt
# Match "password" only if followed by whitespace and word
 

Practical grep Patterns

# Find comment lines in config files
grep "^#" config.txt
grep "^#\|^$" config.txt        # Comments or empty lines
 
# Find non-comment, non-empty lines (actual config)
grep -Ev "^#|^$" config.txt
# -E: Extended regex for |
# -v: Invert (NOT matching)
# This is extremely useful for reading config files!
 
# Find all Python function definitions
grep -rn "^def " *.py
grep -rn "^\s*def " *.py        # With possible leading whitespace
 
# Find TODOs, FIXMEs, HACKs in codebase
grep -rnE "TODO|FIXME|HACK|XXX" src/
 
# Better: Show type and filename
grep -rnE "TODO|FIXME|HACK" src/ --color=always | \
    awk -F: '{print $1":"$2" ["$3"]"}'
 
# Extract all email addresses from text
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt
 
# Extract all IPv4 addresses
grep -oE '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' logfile.txt
 
# Find lines with specific word count
grep -E "^(\w+\s+){4}\w+$" file.txt    # Lines with exactly 5 words
 
# Find shebang lines in scripts
find . -type f -name "*.sh" -exec grep -l "^#!/bin/bash" {} \;
 
# Find processes (exclude grep itself)
ps aux | grep "[n]ginx"
# The [n] trick: grep searches for "[n]ginx" but the grep process
# itself shows "grep [n]ginx" which doesn't match the pattern!
 

Analyzing logs with grep:

# Count error types
grep -oE "ERROR:[A-Z_]+" app.log | sort | uniq -c | sort -rn
# -o: Only matching part
# sort: Group identical errors
# uniq -c: Count occurrences
# sort -rn: Sort by count, descending
 
# Find errors in specific time range
grep "2026-02-11 14:" app.log | grep -i error
 
# Find failed login attempts
grep "Failed password" /var/log/auth.log | awk '{print $11}' | sort | uniq -c | sort -rn
# Shows which IPs have most failed attempts
 
# Track response times over 1 second
grep -E "response_time: [0-9]{4,}" app.log
# [0-9]{4,} = 4+ digits = 1000+ milliseconds = 1+ seconds
 

sed — Stream Editor

sed is a stream editor—it reads text, transforms it according to commands, and outputs the result. Think of it as programmable find-and-replace on steroids.

Why sed matters:

  • Stream processing: Handles files too large for memory
  • In-place editing: Modify files directly
  • Automation: Scriptable text transformations
  • Powerful patterns: Regex-based substitution
  • Essential for: Config file updates, data cleaning, batch processing

Substitution (s command)

The most common sed operation is substitution:

# Basic syntax: sed 's/pattern/replacement/' file
 
# Replace first occurrence on each line
echo "cat cat cat" | sed 's/cat/dog/'
# Output: dog cat cat
# Only the FIRST "cat" on the line is replaced
 
# Replace ALL occurrences (g flag = global)
echo "cat cat cat" | sed 's/cat/dog/g'
# Output: dog dog dog
# All "cat"s are replaced
 
# Case-insensitive replacement (I flag)
echo "Error ERROR error" | sed 's/error/WARNING/gI'
# Output: WARNING WARNING WARNING
# I flag: Matches any case, replaces with exact case specified
 
# Replace only nth occurrence
echo "cat cat cat cat" | sed 's/cat/dog/2'
# Output: cat dog cat cat
# Only the 2nd "cat" is replaced
 
# Replace from nth occurrence onward
echo "cat cat cat cat" | sed 's/cat/dog/g2'
# Output: cat dog dog dog
# Replaces 2nd and subsequent occurrences
 

Why use sed instead of text editor:

  • Automate repetitive edits across many files
  • Process files larger than available RAM
  • Script transformations for reproducibility
  • Work on remote servers without GUI

Addressing: Operate on Specific Lines

# Line number addressing
sed '3s/old/new/' file.txt       # Only line 3
sed '5,10s/old/new/g' file.txt   # Lines 5-10
sed '1s/old/new/' file.txt       # First line
sed '$s/old/new/' file.txt       # Last line ($)
 
# Pattern addressing: Lines matching pattern
sed '/ERROR/s/level=[0-9]/level=5/' logfile.txt
# Only on lines containing "ERROR", replace level=X with level=5
 
# Range between patterns
sed '/START/,/END/s/foo/bar/g' file.txt
# From line with START to line with END, replace foo with bar
 
# Negation: Lines NOT matching pattern
sed '/^#/!s/foo/bar/g' file.txt
# On non-comment lines, replace foo with bar
 
# Multiple ranges
sed '1,5s/old/new/g; 10,15s/foo/bar/g' file.txt
 

Deletion (d command)

# Delete lines matching pattern
sed '/^#/d' config.txt
# Remove all comment lines
 
sed '/^$/d' file.txt
# Remove all empty lines
 
# Delete specific line numbers
sed '3d' file.txt                # Delete line 3
sed '1d' file.txt                # Delete first line (skip header)
sed '$d' file.txt                # Delete last line
sed '2,5d' file.txt              # Delete lines 2-5
sed '10,$d' file.txt             # Delete line 10 to end
 
# Delete range between patterns
sed '/START/,/END/d' file.txt
# Delete from START line through END line
 
# Common: Remove comments and empty lines
sed '/^#/d; /^$/d' config.txt
# Or in one pattern:
sed -E '/^(#|$)/d' config.txt
 

In-Place Editing (-i option)

By default, sed prints to stdout. Use -i to edit files directly:

# Edit file in-place (DANGEROUS without backup!)
sed -i 's/old/new/g' file.txt
 
# Edit with backup (RECOMMENDED)
sed -i.bak 's/old/new/g' file.txt
# Creates file.txt.bak as backup, modifies file.txt
 
# Backup with custom extension
sed -i.backup 's/old/new/g' file.txt
# Creates file.txt.backup
 
# Edit multiple files
sed -i.bak 's/old/new/g' *.txt
# Creates .bak files for each .txt file
 

⚠️ Always use -i.bak for in-place edits

Always create backups with -i.bak when testing sed commands. In-place editing is permanent—if your pattern is wrong, you can corrupt files. Test on a copy first, or always keep backups:

# Safe workflow:
sed -i.bak 's/pattern/replacement/g' file.txt
# Check if it worked:
diff file.txt file.txt.bak
# If good, remove backup:
rm file.txt.bak
 

Advanced sed Operations

# Insert text before line (i command)
sed '3i\This is a new line' file.txt
# Insert "This is a new line" BEFORE line 3
 
# Insert before pattern match
sed '/^## Section/i\--- New content ---' file.txt
 
# Append text after line (a command)
sed '3a\This comes after line 3' file.txt
 
# Append after pattern
sed '/ERROR/a\[Alert sent to admin]' logfile.txt
 
# Replace entire line (c command)
sed '3c\This replaces line 3 completely' file.txt
 
# Replace matching lines
sed '/old config/c\new config line' config.txt
 
# Print specific lines (p command with -n)
sed -n '5p' file.txt              # Print only line 5
sed -n '10,20p' file.txt          # Print lines 10-20
sed -n '/ERROR/p' file.txt        # Print lines with ERROR (like grep)
sed -n '/START/,/END/p' file.txt  # Print between patterns
 
# -n: Suppress automatic printing (only print what p command specifies)
 
# Multiple operations (-e flag or semicolon)
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt
sed 's/foo/bar/g; s/baz/qux/g' file.txt    # Equivalent
 
# Read sed commands from file
sed -f script.sed file.txt
 

Using Different Delimiters

When patterns contain slashes, use different delimiters to avoid escaping:

# Standard (messy with paths):
sed 's/\/usr\/local\/bin/\/opt\/bin/g' config.txt
 
# Using | as delimiter (cleaner):
sed 's|/usr/local/bin|/opt/bin|g' config.txt
 
# Using # as delimiter:
sed 's#/usr/local/bin#/opt/bin#g' config.txt
 
# Using @ as delimiter:
sed 's@http://@https://@g' urls.txt
 
# Rule: The character after 's' becomes the delimiter
 

Capture Groups and Back-References

Capture parts of the matched pattern and reuse them:

# Basic capture (use \( \) in sed)
echo "John Smith" | sed 's/\([A-Z][a-z]*\) \([A-Z][a-z]*\)/\2, \1/'
# Output: Smith, John
# \1 = first capture (John), \2 = second capture (Smith)
 
# With extended regex (-E or -r)
echo "John Smith" | sed -E 's/([A-Z][a-z]*) ([A-Z][a-z]*)/\2, \1/'
# Cleaner syntax without escaping parentheses
 
# Reformat phone numbers
echo "555-1234" | sed -E 's/([0-9]{3})-([0-9]{4})/(\1) \2/'
# Output: (555) 1234
 
# Extract and reformat dates
echo "2026-02-11" | sed -E 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\3\/\2\/\1/'
# Output: 11/02/2026 (DD/MM/YYYY)
 
# Wrap words in HTML tags
echo "important" | sed 's/\(.*\)/<strong>\1<\/strong>/'
# Output: <strong>important</strong>
 
# Duplicate first word on each line
sed -E 's/^(\w+)/\1 \1/' file.txt
# "hello world" becomes "hello hello world"
 

Practical sed Examples

# Remove HTML tags
sed 's/<[^>]*>//g' page.html
# <[^>]*> = < followed by any non-> characters, then >
 
# Remove leading whitespace
sed 's/^[[:space:]]*//' file.txt
 
# Remove trailing whitespace
sed 's/[[:space:]]*$//' file.txt
 
# Remove both leading and trailing whitespace
sed 's/^[[:space:]]*//; s/[[:space:]]*$//' file.txt
 
# Add line numbers
sed = file.txt | sed 'N;s/\n/\t/'
# First sed adds line numbers, second merges with content
 
# Double-space a file
sed G file.txt
# G: Append a newline after each line
 
# Remove duplicate consecutive lines
sed '$!N; /^\(.*\)\n\1$/!P; D'
# Complex but useful for cleaning up repeated output
 
# Comment out lines matching pattern
sed '/ServerName/s/^/#/' apache.conf
# Adds # at start of lines with ServerName
 
# Uncomment lines
sed '/ServerName/s/^#//' apache.conf
# Removes # from start of ServerName lines
 
# Add prefix to every line
sed 's/^/>> /' file.txt
# Adds ">> " to start of each line
 
# Add suffix to every line
sed 's/$/ [done]/' file.txt
# Adds " [done]" to end of each line
 
# Convert Windows line endings to Unix
sed 's/\r$//' windows_file.txt
# Removes carriage return
 
# Convert Unix to Windows line endings
sed 's/$/\r/' unix_file.txt
 
# Remove everything after a pattern on each line
sed 's/#.*//' config.txt
# Remove comments (everything after #)
 
# Extract content between tags
sed -n 's/.*<title>\(.*\)<\/title>.*/\1/p' page.html
# Extracts content between <title> tags
 
Was this page helpful?
SR

Written by the ShellRAG Team

The ShellRAG editorial team writes practical, beginner-friendly Bash Shell tutorials with tested code examples and real-world use cases. Every article is technically reviewed for accuracy and updated regularly.

Learn more about us →