Summary: in this tutorial, you will learn master awk for structured text processing, field extraction, patterns, built-in variables, and combining grep, sed, and awk in pipelines.

AWK — Pattern Processing Language

While grep finds lines and sed transforms them, AWK is a complete pattern-processing language designed for structured text. It excels at extracting and manipulating data organized in fields and records — think CSV files, log entries, command output, and tabular data.

awk — Pattern Processing Language

awk is a full programming language designed for processing structured text data. It excels at column-based operations.

Why awk matters:

Field processing: Built for tabular data (columns/fields)
Pattern-action: Elegant syntax for conditional processing
Built-in variables: Automatic line counting, field splitting
Programming features: Variables, arrays, functions, math
Perfect for: Log analysis, CSV processing, report generation, data transformation

How awk Works

awk processes input line by line with this model:

Read a line (called a "record")
Split into fields (columns) separated by delimiter
Check pattern — does this line match?
Execute action — if pattern matches, run the code

# Basic syntax: awk 'pattern { action }' file
 
# Print entire line
awk '{print $0}' file.txt
# $0 = the whole line (same as cat)
 
# Print first field
echo "Alice 30 Engineer" | awk '{print $1}'
# Output: Alice
# $1 = first field (default delimiter: whitespace)
 
# Print multiple fields
echo "Alice 30 Engineer" | awk '{print $1, $3}'
# Output: Alice Engineer
# Comma adds space between fields
 
# Print fields in different order
echo "Alice 30 Engineer" | awk '{print $3, $1}'
# Output: Engineer Alice
 
# Print with custom text
echo "Alice 30 Engineer" | awk '{print "Name:", $1, "Age:", $2}'
# Output: Name: Alice Age: 30

Field numbering:

# Fields: $1, $2, $3, ...
# $0 = entire line
# $NF = last field
# $(NF-1) = second-to-last field
 
echo "one two three four" | awk '{print $NF}'
# Output: four
 
echo "one two three four" | awk '{print $(NF-1)}'
# Output: three
 
echo "one two three four" | awk '{print $2, $NF}'
# Output: two four

Field Separator (-F option)

By default, awk splits on whitespace. Change with -F:

# CSV file (comma-separated)
echo "Alice,30,Engineer" | awk -F, '{print $1}'
# Output: Alice
 
echo "Alice,30,Engineer" | awk -F, '{print $2, $3}'
# Output: 30 Engineer
 
# Colon-separated (like /etc/passwd)
awk -F: '{print $1, $7}' /etc/passwd
# Print username and shell
 
# Tab-separated
awk -F'\t' '{print $1, $3}' data.tsv
 
# Multiple-character separator
echo "name::value" | awk -F:: '{print $2}'
# Output: value
 
# Regex separator (spaces or commas)
echo "Alice, 30, Engineer" | awk -F'[, ]+' '{print $1, $3}'
# Output: Alice Engineer

Output Field Separator (OFS)

Control how output fields are separated:

# Default OFS is a space
echo "Alice,30,Engineer" | awk -F, '{print $1, $2, $3}'
# Output: Alice 30 Engineer (spaces between)
 
# Set custom OFS
echo "Alice,30,Engineer" | awk -F, 'BEGIN{OFS="\t"} {print $1, $2, $3}'
# Output: Alice	30	Engineer (tabs between)
 
# Convert CSV to pipe-delimited
awk -F, 'BEGIN{OFS="|"} {print $1, $2, $3}' data.csv
 
# Convert to TSV
awk -F, 'BEGIN{OFS="\t"} {$1=$1; print}' data.csv
# $1=$1 forces awk to rebuild $0 with new OFS

Patterns: When to Execute Actions

# No pattern = match all lines
awk '{print $1}' file.txt
 
# Pattern: Regex match
awk '/error/ {print $0}' logfile.txt
# Print lines containing "error" (like grep)
 
# Pattern: NOT matching
awk '!/comment/ {print $0}' file.txt
# Print lines NOT containing "comment"
 
# Pattern: Field comparison
awk '$3 > 50 {print $1, $3}' scores.txt
# Print name and score where score > 50
 
# Numeric comparisons: ==, !=, <, <=, >, >=
awk '$2 == 100 {print $1}' data.txt
awk '$3 <= 50 {print $0}' data.txt
awk '$4 != 0 {print $0}' data.txt
 
# String comparison
awk '$1 == "Alice" {print $0}' users.txt
awk '$2 != "admin" {print $0}' users.txt
 
# Regex match on specific field
awk '$2 ~ /^A/ {print $0}' file.txt
# Field 2 starts with A (~  means "matches")
 
awk '$3 !~ /test/ {print $0}' file.txt
# Field 3 does NOT contain "test"
 
# Multiple conditions with && (AND) and || (OR)
awk '$2 > 50 && $3 < 100 {print $0}' data.txt
awk '$1 == "Alice" || $1 == "Bob" {print $0}' users.txt
 
# Line number conditions (NR = line number)
awk 'NR == 1 {print "Header:", $0}' file.txt
# First line only
 
awk 'NR > 1 {print $0}' data.csv
# Skip header (all lines except first)
 
awk 'NR >= 10 && NR <= 20 {print $0}' file.txt
# Lines 10-20
 
# Number of fields condition
awk 'NF > 3 {print $0}' file.txt
# Lines with more than 3 fields
 
awk 'NF == 0 {print "Empty line at", NR}' file.txt
# Find empty lines

Built-in Variables

Variable	Description	Example
`$0`	Entire current line	`{print $0}`
`$1`, `$2`, ...	Individual fields	`{print $1, $3}`
`NR`	Current line number (total)	`{print NR, $0}`
`NF`	Number of fields in current line	`{print "Fields:", NF}`
`FNR`	Line number in current file	(When processing multiple files)
`FS`	Field separator (input)	`BEGIN{FS=","}`
`OFS`	Output field separator	`BEGIN{OFS="\t"}`
`RS`	Record separator (default: newline)	`BEGIN{RS=";"}`
`ORS`	Output record separator	`BEGIN{ORS="\n\n"}`
`FILENAME`	Name of current input file	`{print FILENAME, $0}`

# Print line numbers
awk '{print NR, $0}' file.txt
# 1 first line
# 2 second line
 
# Print line and field count
awk '{print "Line", NR, "has", NF, "fields"}' file.txt
 
# Print last field on each line
awk '{print $NF}' file.txt
 
# Print second-to-last field
awk '{print $(NF-1)}' file.txt
 
# Print filename with each line
awk '{print FILENAME ":", $0}' file1.txt file2.txt

BEGIN and END Blocks

BEGIN runs before processing any lines. END runs after all lines.

# Print header before processing
awk 'BEGIN {print "=== Processing started ==="} {print $1} END {print "=== Done ==="}' file.txt
 
# Set variables in BEGIN
awk 'BEGIN {FS=","; OFS="\t"} {print $1, $2}' data.csv
 
# Calculate sum
awk '{sum += $1} END {print "Total:", sum}' numbers.txt
# Adds up first column, prints total at end
 
# Calculate average
awk '{sum += $1; count++} END {print "Average:", sum/count}' numbers.txt
 
# Count matching lines
awk '/error/ {count++} END {print "Errors:", count}' logfile.txt
 
# Statistics
awk 'BEGIN {min=999999; max=0}
     {sum += $1; count++;
      if ($1 > max) max = $1;
      if ($1 < min) min = $1}
     END {print "Min:", min, "Max:", max, "Avg:", sum/count}' numbers.txt

Variables and Operators

# Variables (no declaration needed)
awk '{total += $3} END {print total}' data.txt
 
# Arithmetic operators: +, -, *, /, %, ^
awk '{print $1, $2, $1 + $2}' numbers.txt
awk '{print $1, $2, $1 * $2}' numbers.txt
awk '{print $2 ^ 2}' numbers.txt    # Exponent (square)
 
# Increment/decrement
awk '{count++; print count, $0}' file.txt
awk '{sum += $1} END {print sum}' numbers.txt
 
# String concatenation (just space)
awk '{name = $1 " " $2; print name}' file.txt
awk '{print $1 $2}' file.txt    # No space = concatenate directly
 
# Assignment
awk '{doubled = $1 * 2; print $1, doubled}' numbers.txt

Conditional Statements (if/else)

# if statement
awk '{
    if ($3 >= 90)
        print $1, "A"
    else if ($3 >= 80)
        print $1, "B"
    else if ($3 >= 70)
        print $1, "C"
    else
        print $1, "F"
}' scores.txt
 
# Ternary operator
awk '{grade = ($3 >= 60) ? "Pass" : "Fail"; print $1, grade}' scores.txt
 
# Check field existence
awk '{
    if (NF >= 3)
        print $1, $3
    else
        print $1, "N/A"
}' data.txt

Loops

# for loop: Print all fields one per line
awk '{for (i=1; i<=NF; i++) print $i}' file.txt
 
# for loop with custom range
awk '{for (i=1; i<=10; i++) print i * $1}' numbers.txt
 
# while loop
awk '{
    i = 1
    while (i <= NF) {
        print $i
        i++
    }
}' file.txt
 
# Loop through array (associative array)
awk '{words[$1]++} END {for (word in words) print word, words[word]}' file.txt
# Counts occurrences of first field

Arrays (Associative)

awk arrays are hash maps (key-value pairs):

# Count occurrences
awk '{count[$1]++} END {for (item in count) print item, count[item]}' data.txt
 
# Sum by category
awk '{sum[$1] += $2} END {for (cat in sum) print cat, sum[cat]}' data.txt
 
# Example: Count HTTP status codes
awk '{status[$9]++} END {for (code in status) print code, status[code]}' access.log
 
# Multi-dimensional arrays (using concatenation)
awk '{key = $1 "," $2; data[key] += $3} END {for (k in data) print k, data[k]}' file.txt

String Functions

# length(): String length
awk '{print length($0)}' file.txt    # Line length
awk 'length($0) > 80 {print $0}' file.txt    # Long lines
 
# toupper() / tolower()
awk '{print toupper($1)}' file.txt
awk '{print tolower($0)}' file.txt
 
# substr(string, start, length)
awk '{print substr($1, 1, 3)}' file.txt    # First 3 chars
awk '{print substr($0, 5)}' file.txt       # From position 5 to end
 
# index(string, substring): Find position
awk '{print index($0, "error")}' file.txt
# Returns position of "error" in line (0 if not found)
 
# split(string, array, delimiter): Split string
awk '{n = split($0, arr, ","); print arr[1], arr[n]}' file.txt
# Split by comma, print first and last
 
# gsub(regex, replacement, target): Global substitution
awk '{gsub(/old/, "new"); print}' file.txt
# Like sed s/old/new/g
 
awk '{gsub(/old/, "new", $2); print}' file.txt
# Replace only in field 2
 
# sub(regex, replacement, target): First substitution only
awk '{sub(/old/, "new"); print}' file.txt
 
# match(string, regex): Test if matches
awk '{if (match($0, /[0-9]+/)) print "Contains number"}' file.txt

Formatted Output with printf

printf gives precise control over output format:

# printf format: printf "format", var1, var2, ...
 
# Basic printf
awk '{printf "%s %d\n", $1, $2}' file.txt
# %s = string, %d = integer, \n = newline
 
# Fixed-width columns
awk '{printf "%-20s %10s %10s\n", $1, $2, $3}' file.txt
# %-20s = left-aligned, 20 chars wide
# %10s = right-aligned, 10 chars wide
 
# Decimal numbers
awk '{printf "%.2f\n", $1}' numbers.txt
# %.2f = 2 decimal places
 
# Format table with header
awk 'BEGIN {
    printf "%-15s %10s %10s\n", "Name", "Sales", "Commission"
    printf "%-15s %10s %10s\n", "----", "-----", "----------"
}
{
    commission = $2 * 0.10
    printf "%-15s %10.2f %10.2f\n", $1, $2, commission
    total += commission
}
END {
    printf "%-15s %10s %10.2f\n", "", "TOTAL:", total
}' sales.txt
 
# Format specifiers:
# %s   - string
# %d   - integer
# %f   - floating point
# %.2f - float with 2 decimals
# %10s - string, 10 chars wide, right-aligned
# %-10s - string, 10 chars wide, left-aligned
# %010d - integer, 10 chars, zero-padded

Practical awk Examples

# Print specific columns from CSV
awk -F, '{print $1, $3}' data.csv
 
# Sum a column
awk -F, '{sum += $3} END {printf "Total: $%.2f\n", sum}' sales.csv
 
# Calculate average
awk '{sum += $1; n++} END {print sum/n}' numbers.txt
 
# Find min and max
awk 'NR==1 {min=max=$1}
     {if ($1<min) min=$1; if ($1>max) max=$1}
     END {print "Min:", min, "Max:", max}' numbers.txt
 
# Count unique values in column
awk '{count[$1]++} END {print length(count)}' data.txt
 
# Print lines longer than 80 characters
awk 'length > 80' file.txt
 
# Remove duplicate lines (preserving order)
awk '!seen[$0]++' file.txt
 
# Print every 3rd line
awk 'NR % 3 == 0' file.txt
 
# Transpose rows and columns
awk '{
    for (i=1; i<=NF; i++) {
        arr[i] = arr[i] ? arr[i] OFS $i : $i
    }
}
END {
    for (i=1; i<=NF; i++) print arr[i]
}' data.txt
 
# Print lines between two patterns
awk '/START/,/END/' file.txt
 
# Skip first line (header)
awk 'NR > 1 {print $0}' data.csv
 
# Print with line numbers (like nl)
awk '{printf "%5d  %s\n", NR, $0}' file.txt
 
# Extract field names from CSV header
awk -F, 'NR==1 {for(i=1;i<=NF;i++) print i, $i}' data.csv
 
# Convert CSV to JSON (simple)
awk -F, 'NR==1 {
    for(i=1; i<=NF; i++) header[i] = $i
    next
}
{
    printf "{"
    for(i=1; i<=NF; i++) {
        printf "\"%s\":\"%s\"", header[i], $i
        if (i < NF) printf ","
    }
    print "}"
}' data.csv
 
# Analyze web server log: Top 10 IPs
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10
 
# Count HTTP status codes
awk '{status[$9]++} END {for (s in status) printf "%s: %d\n", s, status[s]}' access.log | sort
 
# Calculate response time statistics
awk '{
    sum += $10
    count++
    if ($10 > max) max = $10
    if (min == 0 || $10 < min) min = $10
}
END {
    print "Requests:", count
    print "Min:", min "ms"
    print "Max:", max "ms"
    print "Avg:", sum/count "ms"
}' response_times.log

Combining grep, sed, and awk

The real power emerges when you chain these tools:

# Find errors, extract timestamps, format as CSV
grep "ERROR" app.log | \
    sed 's/\[//g; s/\]//g' | \
    awk '{print $1 "," $2 "," $NF}'
 
# Process log: Filter, clean, analyze
grep "ERROR" app.log | \
    sed 's/^.*\[//' | \
    sed 's/\].*//' | \
    awk '{hour = substr($2, 1, 2); count[hour]++} END {for (h in count) print h":00 -", count[h], "errors"}' | \
    sort
 
# Find all function definitions, sort, number them
grep -rn "def " *.py | \
    sed 's/:.*def /: /' | \
    awk -F: '{printf "%3d. %s:%s\n", NR, $1, $2}'
 
# Clean and transform CSV data
grep -v "^#" data.csv | \
    sed '/^$/d' | \
    awk -F, '$3 > 1000 {print $1 "," $3}' | \
    sort -t, -k2 -rn
 
# Find TODO comments with context
grep -rn "TODO" src/ | \
    sed 's/:/ | /' | \
    awk -F'|' '{printf "%-40s %s\n", $1, $2}'
 
# Analyze failed SSH attempts
grep "Failed password" /var/log/auth.log | \
    awk '{print $11}' | \
    sort | uniq -c | sort -rn | \
    awk '{printf "%15s: %d attempts\n", $2, $1}' | \
    head -10
 
# Process access log: Top user agents
awk -F'"' '{print $6}' access.log | \
    sed 's/^[[:space:]]*//' | \
    sort | uniq -c | sort -rn | \
    head -10
 
# Extract email addresses, clean, count domains
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt | \
    awk -F@ '{print $2}' | \
    sort | uniq -c | sort -rn

Exercises

🏋️ Exercise 1: grep Mastery

Task 1: Find all lines in /etc/ssh/sshd_config that are actual configuration (not comments, not empty lines).

Task 2: Search your home directory for all shell scripts (files with .sh extension or #!/bin/bash shebang).

Task 3: In a log file, find all lines with ERROR or WARNING, show 2 lines of context before each match, and display line numbers.

Show Solution

# Task 1: Active config lines
grep -Ev '^#|^$' /etc/ssh/sshd_config
 
# Task 2: Find shell scripts
find ~ -type f -name "*.sh"
# Or by shebang:
find ~ -type f -exec grep -l "^#!/bin/bash" {} \;
 
# Task 3: Errors with context and line numbers
grep -nB 2 -E "ERROR|WARNING" logfile.txt

🏋️ Exercise 2: sed Transformations

Task 1: You have a file urls.txt with HTTP URLs. Convert all to HTTPS (in-place with backup).

Task 2: Given a file with "LastName, FirstName" format, convert to "FirstName LastName" format.

Task 3: Remove all trailing whitespace from all Python files in current directory (with backups).

Show Solution

# Task 1: HTTP to HTTPS
sed -i.bak 's|http://|https://|g' urls.txt
 
# Task 2: Swap name format
sed -E 's/^([^,]+), *(.+)/\2 \1/' names.txt > names_fixed.txt
# Or in-place:
sed -i.bak -E 's/^([^,]+), *(.+)/\2 \1/' names.txt
 
# Task 3: Remove trailing whitespace from Python files
sed -i.bak 's/[[:space:]]*$//' *.py

🏋️ Exercise 3: awk Data Processing

Given employees.csv:


Name,Department,Salary
Alice,Engineering,95000
Bob,Marketing,72000
Carol,Engineering,88000
Dave,Marketing,68000
Eve,Engineering,102000
Frank,Sales,75000

Task 1: Print names and salaries of Engineering employees only.

Task 2: Calculate average salary per department.

Task 3: Find the highest-paid employee.

Task 4: Generate a report showing each employee's name and what percentage their salary is of the maximum salary.

Show Solution

# Task 1: Engineering employees
awk -F, 'NR > 1 && $2 == "Engineering" {printf "%-10s $%d\n", $1, $3}' employees.csv
 
# Task 2: Average salary by department
awk -F, 'NR > 1 {
    sum[$2] += $3
    count[$2]++
}
END {
    for (dept in sum) {
        printf "%-15s $%.2f\n", dept, sum[dept]/count[dept]
    }
}' employees.csv | sort
 
# Task 3: Highest-paid employee
awk -F, 'NR > 1 {
    if ($3 > max) {
        max = $3
        name = $1
        dept = $2
    }
}
END {
    printf "Highest paid: %s (%s) - $%d\n", name, dept, max
}' employees.csv
 
# Task 4: Salary as percentage of maximum
awk -F, 'NR > 1 {
    salary[$1] = $3
    if ($3 > max) max = $3
}
END {
    for (name in salary) {
        pct = (salary[name] / max) * 100
        printf "%-10s $%-6d (%5.1f%% of max)\n", name, salary[name], pct
    }
}' employees.csv | sort -t'$' -k2 -rn

🏋️ Exercise 4: Combined Challenge

Task: Analyze an Apache/Nginx access log to generate a report showing:

Total requests
Unique visitors
Top 5 most requested pages
Top 5 IP addresses by request count
Count of each HTTP status code

Access log format:


192.168.1.100 - - [11/Feb/2026:10:30:45 +0000] "GET /index.html HTTP/1.1" 200 1234

Show Solution

#!/bin/bash
# analyze_access_log.sh
 
logfile="$1"
 
echo "=== Access Log Analysis ==="
echo ""
 
# Total requests
echo "1. Total Requests:"
wc -l < "$logfile"
echo ""
 
# Unique visitors
echo "2. Unique Visitors:"
awk '{print $1}' "$logfile" | sort -u | wc -l
echo ""
 
# Top 5 pages
echo "3. Top 5 Most Requested Pages:"
awk '{print $7}' "$logfile" | sort | uniq -c | sort -rn | head -5
echo ""
 
# Top 5 IPs
echo "4. Top 5 IP Addresses:"
awk '{print $1}' "$logfile" | sort | uniq -c | sort -rn | head -5 | \
    awk '{printf "%15s: %d requests\n", $2, $1}'
echo ""
 
# HTTP status codes
echo "5. HTTP Status Codes:"
awk '{print $9}' "$logfile" | sort | uniq -c | sort -rn | \
    awk '{printf "  %s: %d\n", $2, $1}'

Alternative one-liner approach:

# All in one pipeline
cat access.log | tee \
    >(wc -l | xargs echo "Total requests:") \
    >(awk '{print $1}' | sort -u | wc -l | xargs echo "Unique visitors:") \
    >(awk '{print $9}' | sort | uniq -c | sort -rn) \
    > /dev/null

Summary

You now have mastery of the three essential text processing tools:

grep — Search:

Basic: grep pattern file
Recursive: grep -r pattern dir/
Extended regex: grep -E 'pattern1|pattern2'
Context: grep -C 2 pattern file
Invert: grep -v pattern file
Use for: Finding patterns, filtering pipelines

sed — Transform:

Substitute: sed 's/old/new/g' file
Delete: sed '/pattern/d' file
In-place: sed -i.bak 's/old/new/g' file
Address: sed '1,10s/old/new/g' file
Capture groups: sed -E 's/(.*)-(.*)/\2-\1/' file
Use for: Find-and-replace, text transformation, file editing

awk — Analyze:

Print fields: awk '{print $1, $3}' file
Field separator: awk -F, '{print $1}' file.csv
Pattern-action: awk '$3 > 50 {print $1}' file
Sum/average: awk '{sum+=$1} END {print sum/NR}'
Arrays: awk '{count[$1]++} END {for (i in count) print i, count[i]}'
Use for: Column processing, calculations, reports

Combining them:

grep pattern file | sed 's/old/new/g' | awk '{print $1, $3}' | sort -u

Next steps:

Practice on real log files
Build text processing pipelines
Automate report generation
Learn more regex patterns

In the next tutorial, you'll learn process management—controlling running programs, background jobs, and system services.

AWK — Pattern Processing Language

Written by the ShellRAG Team