AWK — Pattern Processing Language
Summary: in this tutorial, you will learn master awk for structured text processing, field extraction, patterns, built-in variables, and combining grep, sed, and awk in pipelines.
AWK — Pattern Processing Language
While grep finds lines and sed transforms them, AWK is a complete pattern-processing language designed for structured text. It excels at extracting and manipulating data organized in fields and records — think CSV files, log entries, command output, and tabular data.
awk — Pattern Processing Language
awk is a full programming language designed for processing structured text data. It excels at column-based operations.
Why awk matters:
- Field processing: Built for tabular data (columns/fields)
- Pattern-action: Elegant syntax for conditional processing
- Built-in variables: Automatic line counting, field splitting
- Programming features: Variables, arrays, functions, math
- Perfect for: Log analysis, CSV processing, report generation, data transformation
How awk Works
awk processes input line by line with this model:
- Read a line (called a "record")
- Split into fields (columns) separated by delimiter
- Check pattern — does this line match?
- Execute action — if pattern matches, run the code
# Basic syntax: awk 'pattern { action }' file
# Print entire line
awk '{print $0}' file.txt
# $0 = the whole line (same as cat)
# Print first field
echo "Alice 30 Engineer" | awk '{print $1}'
# Output: Alice
# $1 = first field (default delimiter: whitespace)
# Print multiple fields
echo "Alice 30 Engineer" | awk '{print $1, $3}'
# Output: Alice Engineer
# Comma adds space between fields
# Print fields in different order
echo "Alice 30 Engineer" | awk '{print $3, $1}'
# Output: Engineer Alice
# Print with custom text
echo "Alice 30 Engineer" | awk '{print "Name:", $1, "Age:", $2}'
# Output: Name: Alice Age: 30
Field numbering:
# Fields: $1, $2, $3, ...
# $0 = entire line
# $NF = last field
# $(NF-1) = second-to-last field
echo "one two three four" | awk '{print $NF}'
# Output: four
echo "one two three four" | awk '{print $(NF-1)}'
# Output: three
echo "one two three four" | awk '{print $2, $NF}'
# Output: two four
Field Separator (-F option)
By default, awk splits on whitespace. Change with -F:
# CSV file (comma-separated)
echo "Alice,30,Engineer" | awk -F, '{print $1}'
# Output: Alice
echo "Alice,30,Engineer" | awk -F, '{print $2, $3}'
# Output: 30 Engineer
# Colon-separated (like /etc/passwd)
awk -F: '{print $1, $7}' /etc/passwd
# Print username and shell
# Tab-separated
awk -F'\t' '{print $1, $3}' data.tsv
# Multiple-character separator
echo "name::value" | awk -F:: '{print $2}'
# Output: value
# Regex separator (spaces or commas)
echo "Alice, 30, Engineer" | awk -F'[, ]+' '{print $1, $3}'
# Output: Alice Engineer
Output Field Separator (OFS)
Control how output fields are separated:
# Default OFS is a space
echo "Alice,30,Engineer" | awk -F, '{print $1, $2, $3}'
# Output: Alice 30 Engineer (spaces between)
# Set custom OFS
echo "Alice,30,Engineer" | awk -F, 'BEGIN{OFS="\t"} {print $1, $2, $3}'
# Output: Alice 30 Engineer (tabs between)
# Convert CSV to pipe-delimited
awk -F, 'BEGIN{OFS="|"} {print $1, $2, $3}' data.csv
# Convert to TSV
awk -F, 'BEGIN{OFS="\t"} {$1=$1; print}' data.csv
# $1=$1 forces awk to rebuild $0 with new OFS
Patterns: When to Execute Actions
# No pattern = match all lines
awk '{print $1}' file.txt
# Pattern: Regex match
awk '/error/ {print $0}' logfile.txt
# Print lines containing "error" (like grep)
# Pattern: NOT matching
awk '!/comment/ {print $0}' file.txt
# Print lines NOT containing "comment"
# Pattern: Field comparison
awk '$3 > 50 {print $1, $3}' scores.txt
# Print name and score where score > 50
# Numeric comparisons: ==, !=, <, <=, >, >=
awk '$2 == 100 {print $1}' data.txt
awk '$3 <= 50 {print $0}' data.txt
awk '$4 != 0 {print $0}' data.txt
# String comparison
awk '$1 == "Alice" {print $0}' users.txt
awk '$2 != "admin" {print $0}' users.txt
# Regex match on specific field
awk '$2 ~ /^A/ {print $0}' file.txt
# Field 2 starts with A (~ means "matches")
awk '$3 !~ /test/ {print $0}' file.txt
# Field 3 does NOT contain "test"
# Multiple conditions with && (AND) and || (OR)
awk '$2 > 50 && $3 < 100 {print $0}' data.txt
awk '$1 == "Alice" || $1 == "Bob" {print $0}' users.txt
# Line number conditions (NR = line number)
awk 'NR == 1 {print "Header:", $0}' file.txt
# First line only
awk 'NR > 1 {print $0}' data.csv
# Skip header (all lines except first)
awk 'NR >= 10 && NR <= 20 {print $0}' file.txt
# Lines 10-20
# Number of fields condition
awk 'NF > 3 {print $0}' file.txt
# Lines with more than 3 fields
awk 'NF == 0 {print "Empty line at", NR}' file.txt
# Find empty lines
Built-in Variables
| Variable | Description | Example |
|---|---|---|
$0 | Entire current line | {print $0} |
$1, $2, ... | Individual fields | {print $1, $3} |
NR | Current line number (total) | {print NR, $0} |
NF | Number of fields in current line | {print "Fields:", NF} |
FNR | Line number in current file | (When processing multiple files) |
FS | Field separator (input) | BEGIN{FS=","} |
OFS | Output field separator | BEGIN{OFS="\t"} |
RS | Record separator (default: newline) | BEGIN{RS=";"} |
ORS | Output record separator | BEGIN{ORS="\n\n"} |
FILENAME | Name of current input file | {print FILENAME, $0} |
# Print line numbers
awk '{print NR, $0}' file.txt
# 1 first line
# 2 second line
# Print line and field count
awk '{print "Line", NR, "has", NF, "fields"}' file.txt
# Print last field on each line
awk '{print $NF}' file.txt
# Print second-to-last field
awk '{print $(NF-1)}' file.txt
# Print filename with each line
awk '{print FILENAME ":", $0}' file1.txt file2.txt
BEGIN and END Blocks
BEGIN runs before processing any lines. END runs after all lines.
# Print header before processing
awk 'BEGIN {print "=== Processing started ==="} {print $1} END {print "=== Done ==="}' file.txt
# Set variables in BEGIN
awk 'BEGIN {FS=","; OFS="\t"} {print $1, $2}' data.csv
# Calculate sum
awk '{sum += $1} END {print "Total:", sum}' numbers.txt
# Adds up first column, prints total at end
# Calculate average
awk '{sum += $1; count++} END {print "Average:", sum/count}' numbers.txt
# Count matching lines
awk '/error/ {count++} END {print "Errors:", count}' logfile.txt
# Statistics
awk 'BEGIN {min=999999; max=0}
{sum += $1; count++;
if ($1 > max) max = $1;
if ($1 < min) min = $1}
END {print "Min:", min, "Max:", max, "Avg:", sum/count}' numbers.txt
Variables and Operators
# Variables (no declaration needed)
awk '{total += $3} END {print total}' data.txt
# Arithmetic operators: +, -, *, /, %, ^
awk '{print $1, $2, $1 + $2}' numbers.txt
awk '{print $1, $2, $1 * $2}' numbers.txt
awk '{print $2 ^ 2}' numbers.txt # Exponent (square)
# Increment/decrement
awk '{count++; print count, $0}' file.txt
awk '{sum += $1} END {print sum}' numbers.txt
# String concatenation (just space)
awk '{name = $1 " " $2; print name}' file.txt
awk '{print $1 $2}' file.txt # No space = concatenate directly
# Assignment
awk '{doubled = $1 * 2; print $1, doubled}' numbers.txt
Conditional Statements (if/else)
# if statement
awk '{
if ($3 >= 90)
print $1, "A"
else if ($3 >= 80)
print $1, "B"
else if ($3 >= 70)
print $1, "C"
else
print $1, "F"
}' scores.txt
# Ternary operator
awk '{grade = ($3 >= 60) ? "Pass" : "Fail"; print $1, grade}' scores.txt
# Check field existence
awk '{
if (NF >= 3)
print $1, $3
else
print $1, "N/A"
}' data.txt
Loops
# for loop: Print all fields one per line
awk '{for (i=1; i<=NF; i++) print $i}' file.txt
# for loop with custom range
awk '{for (i=1; i<=10; i++) print i * $1}' numbers.txt
# while loop
awk '{
i = 1
while (i <= NF) {
print $i
i++
}
}' file.txt
# Loop through array (associative array)
awk '{words[$1]++} END {for (word in words) print word, words[word]}' file.txt
# Counts occurrences of first field
Arrays (Associative)
awk arrays are hash maps (key-value pairs):
# Count occurrences
awk '{count[$1]++} END {for (item in count) print item, count[item]}' data.txt
# Sum by category
awk '{sum[$1] += $2} END {for (cat in sum) print cat, sum[cat]}' data.txt
# Example: Count HTTP status codes
awk '{status[$9]++} END {for (code in status) print code, status[code]}' access.log
# Multi-dimensional arrays (using concatenation)
awk '{key = $1 "," $2; data[key] += $3} END {for (k in data) print k, data[k]}' file.txt
String Functions
# length(): String length
awk '{print length($0)}' file.txt # Line length
awk 'length($0) > 80 {print $0}' file.txt # Long lines
# toupper() / tolower()
awk '{print toupper($1)}' file.txt
awk '{print tolower($0)}' file.txt
# substr(string, start, length)
awk '{print substr($1, 1, 3)}' file.txt # First 3 chars
awk '{print substr($0, 5)}' file.txt # From position 5 to end
# index(string, substring): Find position
awk '{print index($0, "error")}' file.txt
# Returns position of "error" in line (0 if not found)
# split(string, array, delimiter): Split string
awk '{n = split($0, arr, ","); print arr[1], arr[n]}' file.txt
# Split by comma, print first and last
# gsub(regex, replacement, target): Global substitution
awk '{gsub(/old/, "new"); print}' file.txt
# Like sed s/old/new/g
awk '{gsub(/old/, "new", $2); print}' file.txt
# Replace only in field 2
# sub(regex, replacement, target): First substitution only
awk '{sub(/old/, "new"); print}' file.txt
# match(string, regex): Test if matches
awk '{if (match($0, /[0-9]+/)) print "Contains number"}' file.txt
Formatted Output with printf
printf gives precise control over output format:
# printf format: printf "format", var1, var2, ...
# Basic printf
awk '{printf "%s %d\n", $1, $2}' file.txt
# %s = string, %d = integer, \n = newline
# Fixed-width columns
awk '{printf "%-20s %10s %10s\n", $1, $2, $3}' file.txt
# %-20s = left-aligned, 20 chars wide
# %10s = right-aligned, 10 chars wide
# Decimal numbers
awk '{printf "%.2f\n", $1}' numbers.txt
# %.2f = 2 decimal places
# Format table with header
awk 'BEGIN {
printf "%-15s %10s %10s\n", "Name", "Sales", "Commission"
printf "%-15s %10s %10s\n", "----", "-----", "----------"
}
{
commission = $2 * 0.10
printf "%-15s %10.2f %10.2f\n", $1, $2, commission
total += commission
}
END {
printf "%-15s %10s %10.2f\n", "", "TOTAL:", total
}' sales.txt
# Format specifiers:
# %s - string
# %d - integer
# %f - floating point
# %.2f - float with 2 decimals
# %10s - string, 10 chars wide, right-aligned
# %-10s - string, 10 chars wide, left-aligned
# %010d - integer, 10 chars, zero-padded
Practical awk Examples
# Print specific columns from CSV
awk -F, '{print $1, $3}' data.csv
# Sum a column
awk -F, '{sum += $3} END {printf "Total: $%.2f\n", sum}' sales.csv
# Calculate average
awk '{sum += $1; n++} END {print sum/n}' numbers.txt
# Find min and max
awk 'NR==1 {min=max=$1}
{if ($1<min) min=$1; if ($1>max) max=$1}
END {print "Min:", min, "Max:", max}' numbers.txt
# Count unique values in column
awk '{count[$1]++} END {print length(count)}' data.txt
# Print lines longer than 80 characters
awk 'length > 80' file.txt
# Remove duplicate lines (preserving order)
awk '!seen[$0]++' file.txt
# Print every 3rd line
awk 'NR % 3 == 0' file.txt
# Transpose rows and columns
awk '{
for (i=1; i<=NF; i++) {
arr[i] = arr[i] ? arr[i] OFS $i : $i
}
}
END {
for (i=1; i<=NF; i++) print arr[i]
}' data.txt
# Print lines between two patterns
awk '/START/,/END/' file.txt
# Skip first line (header)
awk 'NR > 1 {print $0}' data.csv
# Print with line numbers (like nl)
awk '{printf "%5d %s\n", NR, $0}' file.txt
# Extract field names from CSV header
awk -F, 'NR==1 {for(i=1;i<=NF;i++) print i, $i}' data.csv
# Convert CSV to JSON (simple)
awk -F, 'NR==1 {
for(i=1; i<=NF; i++) header[i] = $i
next
}
{
printf "{"
for(i=1; i<=NF; i++) {
printf "\"%s\":\"%s\"", header[i], $i
if (i < NF) printf ","
}
print "}"
}' data.csv
# Analyze web server log: Top 10 IPs
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10
# Count HTTP status codes
awk '{status[$9]++} END {for (s in status) printf "%s: %d\n", s, status[s]}' access.log | sort
# Calculate response time statistics
awk '{
sum += $10
count++
if ($10 > max) max = $10
if (min == 0 || $10 < min) min = $10
}
END {
print "Requests:", count
print "Min:", min "ms"
print "Max:", max "ms"
print "Avg:", sum/count "ms"
}' response_times.log
Combining grep, sed, and awk
The real power emerges when you chain these tools:
# Find errors, extract timestamps, format as CSV
grep "ERROR" app.log | \
sed 's/\[//g; s/\]//g' | \
awk '{print $1 "," $2 "," $NF}'
# Process log: Filter, clean, analyze
grep "ERROR" app.log | \
sed 's/^.*\[//' | \
sed 's/\].*//' | \
awk '{hour = substr($2, 1, 2); count[hour]++} END {for (h in count) print h":00 -", count[h], "errors"}' | \
sort
# Find all function definitions, sort, number them
grep -rn "def " *.py | \
sed 's/:.*def /: /' | \
awk -F: '{printf "%3d. %s:%s\n", NR, $1, $2}'
# Clean and transform CSV data
grep -v "^#" data.csv | \
sed '/^$/d' | \
awk -F, '$3 > 1000 {print $1 "," $3}' | \
sort -t, -k2 -rn
# Find TODO comments with context
grep -rn "TODO" src/ | \
sed 's/:/ | /' | \
awk -F'|' '{printf "%-40s %s\n", $1, $2}'
# Analyze failed SSH attempts
grep "Failed password" /var/log/auth.log | \
awk '{print $11}' | \
sort | uniq -c | sort -rn | \
awk '{printf "%15s: %d attempts\n", $2, $1}' | \
head -10
# Process access log: Top user agents
awk -F'"' '{print $6}' access.log | \
sed 's/^[[:space:]]*//' | \
sort | uniq -c | sort -rn | \
head -10
# Extract email addresses, clean, count domains
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt | \
awk -F@ '{print $2}' | \
sort | uniq -c | sort -rn
Exercises
Task 1: Find all lines in /etc/ssh/sshd_config that are actual configuration (not comments, not empty lines).
Task 2: Search your home directory for all shell scripts (files with .sh extension or #!/bin/bash shebang).
Task 3: In a log file, find all lines with ERROR or WARNING, show 2 lines of context before each match, and display line numbers.
Show Solution
# Task 1: Active config lines
grep -Ev '^#|^$' /etc/ssh/sshd_config
# Task 2: Find shell scripts
find ~ -type f -name "*.sh"
# Or by shebang:
find ~ -type f -exec grep -l "^#!/bin/bash" {} \;
# Task 3: Errors with context and line numbers
grep -nB 2 -E "ERROR|WARNING" logfile.txt
Task 1: You have a file urls.txt with HTTP URLs. Convert all to HTTPS (in-place with backup).
Task 2: Given a file with "LastName, FirstName" format, convert to "FirstName LastName" format.
Task 3: Remove all trailing whitespace from all Python files in current directory (with backups).
Show Solution
# Task 1: HTTP to HTTPS
sed -i.bak 's|http://|https://|g' urls.txt
# Task 2: Swap name format
sed -E 's/^([^,]+), *(.+)/\2 \1/' names.txt > names_fixed.txt
# Or in-place:
sed -i.bak -E 's/^([^,]+), *(.+)/\2 \1/' names.txt
# Task 3: Remove trailing whitespace from Python files
sed -i.bak 's/[[:space:]]*$//' *.py
Given employees.csv:
Name,Department,Salary
Alice,Engineering,95000
Bob,Marketing,72000
Carol,Engineering,88000
Dave,Marketing,68000
Eve,Engineering,102000
Frank,Sales,75000
Task 1: Print names and salaries of Engineering employees only.
Task 2: Calculate average salary per department.
Task 3: Find the highest-paid employee.
Task 4: Generate a report showing each employee's name and what percentage their salary is of the maximum salary.
Show Solution
# Task 1: Engineering employees
awk -F, 'NR > 1 && $2 == "Engineering" {printf "%-10s $%d\n", $1, $3}' employees.csv
# Task 2: Average salary by department
awk -F, 'NR > 1 {
sum[$2] += $3
count[$2]++
}
END {
for (dept in sum) {
printf "%-15s $%.2f\n", dept, sum[dept]/count[dept]
}
}' employees.csv | sort
# Task 3: Highest-paid employee
awk -F, 'NR > 1 {
if ($3 > max) {
max = $3
name = $1
dept = $2
}
}
END {
printf "Highest paid: %s (%s) - $%d\n", name, dept, max
}' employees.csv
# Task 4: Salary as percentage of maximum
awk -F, 'NR > 1 {
salary[$1] = $3
if ($3 > max) max = $3
}
END {
for (name in salary) {
pct = (salary[name] / max) * 100
printf "%-10s $%-6d (%5.1f%% of max)\n", name, salary[name], pct
}
}' employees.csv | sort -t'$' -k2 -rn
Task: Analyze an Apache/Nginx access log to generate a report showing:
- Total requests
- Unique visitors
- Top 5 most requested pages
- Top 5 IP addresses by request count
- Count of each HTTP status code
Access log format:
192.168.1.100 - - [11/Feb/2026:10:30:45 +0000] "GET /index.html HTTP/1.1" 200 1234
Show Solution
#!/bin/bash
# analyze_access_log.sh
logfile="$1"
echo "=== Access Log Analysis ==="
echo ""
# Total requests
echo "1. Total Requests:"
wc -l < "$logfile"
echo ""
# Unique visitors
echo "2. Unique Visitors:"
awk '{print $1}' "$logfile" | sort -u | wc -l
echo ""
# Top 5 pages
echo "3. Top 5 Most Requested Pages:"
awk '{print $7}' "$logfile" | sort | uniq -c | sort -rn | head -5
echo ""
# Top 5 IPs
echo "4. Top 5 IP Addresses:"
awk '{print $1}' "$logfile" | sort | uniq -c | sort -rn | head -5 | \
awk '{printf "%15s: %d requests\n", $2, $1}'
echo ""
# HTTP status codes
echo "5. HTTP Status Codes:"
awk '{print $9}' "$logfile" | sort | uniq -c | sort -rn | \
awk '{printf " %s: %d\n", $2, $1}'
Alternative one-liner approach:
# All in one pipeline
cat access.log | tee \
>(wc -l | xargs echo "Total requests:") \
>(awk '{print $1}' | sort -u | wc -l | xargs echo "Unique visitors:") \
>(awk '{print $9}' | sort | uniq -c | sort -rn) \
> /dev/null
Summary
You now have mastery of the three essential text processing tools:
grep — Search:
- Basic:
grep pattern file - Recursive:
grep -r pattern dir/ - Extended regex:
grep -E 'pattern1|pattern2' - Context:
grep -C 2 pattern file - Invert:
grep -v pattern file - Use for: Finding patterns, filtering pipelines
sed — Transform:
- Substitute:
sed 's/old/new/g' file - Delete:
sed '/pattern/d' file - In-place:
sed -i.bak 's/old/new/g' file - Address:
sed '1,10s/old/new/g' file - Capture groups:
sed -E 's/(.*)-(.*)/\2-\1/' file - Use for: Find-and-replace, text transformation, file editing
awk — Analyze:
- Print fields:
awk '{print $1, $3}' file - Field separator:
awk -F, '{print $1}' file.csv - Pattern-action:
awk '$3 > 50 {print $1}' file - Sum/average:
awk '{sum+=$1} END {print sum/NR}' - Arrays:
awk '{count[$1]++} END {for (i in count) print i, count[i]}' - Use for: Column processing, calculations, reports
Combining them:
grep pattern file | sed 's/old/new/g' | awk '{print $1, $3}' | sort -u
Next steps:
- Practice on real log files
- Build text processing pipelines
- Automate report generation
- Learn more regex patterns
In the next tutorial, you'll learn process management—controlling running programs, background jobs, and system services.
Written by the ShellRAG Team
The ShellRAG editorial team writes practical, beginner-friendly Bash Shell tutorials with tested code examples and real-world use cases. Every article is technically reviewed for accuracy and updated regularly.
Learn more about us →