Advanced Real-World Scripts

Summary: in this tutorial, you will learn build a backup system with intelligent rotation and a web server log analyzer — complete production scripts with real-world patterns.

Advanced Real-World Scripts

Building on the simpler utility scripts from the previous tutorial, these advanced projects tackle more complex real-world challenges: automated backups with intelligent rotation and a complete web server log analyzer. Each script demonstrates professional patterns you will use in production environments.

4. Backup with Intelligent Rotation

The Problem: Daily backups fill up your disk. Manual cleanup is tedious and error-prone. You need retention policies: keep daily backups for 7 days, weekly for 4 weeks, monthly for 12 months.

The Solution: Automated backup with grandfather-father-son (GFS) rotation.

#!/usr/bin/env bash
set -euo pipefail
 
# backup_rotate.sh — Backup with intelligent GFS rotation
#
# RETENTION POLICY:
#   Daily:   7 backups (1 week)
#   Weekly:  4 backups (1 month)  — Saved on Sundays
#   Monthly: 12 backups (1 year)  — Saved on 1st of month
#
# USAGE:
#   ./backup_rotate.sh -s SOURCE -n NAME [-b BASE_DIR]
#
# OPTIONS:
#   -s SOURCE    Directory to backup
#   -n NAME      Backup name (identifier)
#   -b BASE      Base backup directory (default: /backup)
#   -c           Compress with gzip
#   -h           Show help
#
# EXAMPLES:
#   ./backup_rotate.sh -s /var/www -n website -c
#   ./backup_rotate.sh -s /home/alice -n home -b /mnt/backups
#
# CRON:
#   # Daily backup at 2 AM
#   0 2 * * * /usr/local/bin/backup_rotate.sh -s /var/www -n website -c
 
readonly SCRIPT_NAME="$(basename "$0")"
BACKUP_BASE="/backup"
LOGFILE=""
COMPRESS=false
 
# Exit codes
readonly E_SUCCESS=0
readonly E_MISSING_ARGS=2
readonly E_SOURCE_NOT_FOUND=3
readonly E_BACKUP_FAILED=4
 
log() {
    local msg="[$(date '+%Y-%m-%d %H:%M:%S')] $*"
    echo "$msg"
    [[ -n "$LOGFILE" ]] && echo "$msg" >> "$LOGFILE"
}
 
error() {
    echo "[ERROR] $*" >&2
    [[ -n "$LOGFILE" ]] && echo "[ERROR] $*" >> "$LOGFILE"
}
 
die() {
    error "$*"
    exit "${2:-1}"
}
 
usage() {
    cat << EOF
Usage: $SCRIPT_NAME -s SOURCE -n NAME [OPTIONS]
 
Create backups with intelligent GFS rotation policy.
 
REQUIRED:
    -s SOURCE    Directory to backup
    -n NAME      Backup identifier (e.g., 'website', 'database')
 
OPTIONS:
    -b BASE      Base backup directory (default: /backup)
    -c           Compress backups with gzip
    -h           Show this help
 
RETENTION POLICY:
    Daily:   7 backups (last 7 days)
    Weekly:  4 backups (saved on Sundays)
    Monthly: 12 backups (saved on 1st of month)
 
EXAMPLES:
    $SCRIPT_NAME -s /var/www/html -n website -c
    $SCRIPT_NAME -s /home/user -n home_backup -b /mnt/external
 
CRON EXAMPLE:
    # Daily backup at 2 AM with compression
    0 2 * * * $SCRIPT_NAME -s /var/www -n website -c 2>&1 | logger -t backup
 
DIRECTORY STRUCTURE:
    /backup/
      └── NAME/
          ├── daily/      # Last 7 days
          ├── weekly/     # Last 4 weeks (Sundays)
          └── monthly/    # Last 12 months (1st of month)
EOF
}
 
# Parse arguments
SOURCE=""
NAME=""
 
while getopts ":s:n:b:ch" opt; do
    case $opt in
        s) SOURCE="$OPTARG" ;;
        n) NAME="$OPTARG" ;;
        b) BACKUP_BASE="$OPTARG" ;;
        c) COMPRESS=true ;;
        h) usage; exit $E_SUCCESS ;;
        :) die "-$OPTARG requires an argument" $E_MISSING_ARGS ;;
        \?) die "Unknown option: -$OPTARG (use -h for help)" $E_MISSING_ARGS ;;
    esac
done
 
# Validate required arguments
[[ -z "$SOURCE" ]] && die "Source directory is required (-s)" $E_MISSING_ARGS
[[ -z "$NAME" ]] && die "Backup name is required (-n)" $E_MISSING_ARGS
[[ -d "$SOURCE" ]] || die "Source directory not found: $SOURCE" $E_SOURCE_NOT_FOUND
 
# Setup logging
LOGFILE="$BACKUP_BASE/$NAME/backup.log"
mkdir -p "$(dirname "$LOGFILE")" 2>/dev/null || true
 
log "=========================================="
log "Backup started"
log "Source: $SOURCE"
log "Name: $NAME"
log "Base: $BACKUP_BASE"
log "Compress: $COMPRESS"
log "=========================================="
 
# Create directory structure
DAILY="$BACKUP_BASE/$NAME/daily"
WEEKLY="$BACKUP_BASE/$NAME/weekly"
MONTHLY="$BACKUP_BASE/$NAME/monthly"
 
for dir in "$DAILY" "$WEEKLY" "$MONTHLY"; do
    if ! mkdir -p "$dir"; then
        die "Cannot create directory: $dir" $E_BACKUP_FAILED
    fi
done
 
# Date information
DATE=$(date +%Y%m%d_%H%M%S)
DAY_OF_WEEK=$(date +%u)    # 1=Monday, 7=Sunday
DAY_OF_MONTH=$(date +%d)
 
# Determine file extension
if $COMPRESS; then
    EXT=".tar.gz"
else
    EXT=".tar"
fi
 
# Create daily backup
BACKUP_FILE="$DAILY/${NAME}_${DATE}${EXT}"
log "Creating daily backup: $BACKUP_FILE"
 
# Perform backup
if $COMPRESS; then
    if tar czf "$BACKUP_FILE" -C "$(dirname "$SOURCE")" "$(basename "$SOURCE")" 2>&1 | grep -v "Removing leading"; then
        :
    else
        die "Backup failed!" $E_BACKUP_FAILED
    fi
else
    if tar cf "$BACKUP_FILE" -C "$(dirname "$SOURCE")" "$(basename "$SOURCE")" 2>&1 | grep -v "Removing leading"; then
        :
    else
        die "Backup failed!" $E_BACKUP_FAILED
    fi
fi
 
# Verify backup was created
if [[ ! -f "$BACKUP_FILE" ]]; then
    die "Backup file was not created: $BACKUP_FILE" $E_BACKUP_FAILED
fi
 
# Report size
SIZE=$(du -h "$BACKUP_FILE" | cut -f1)
log "Backup created successfully: $SIZE"
 
# Copy to weekly on Sundays
if [[ "$DAY_OF_WEEK" -eq 7 ]]; then
    WEEKLY_FILE="$WEEKLY/${NAME}_week_${DATE}${EXT}"
    cp "$BACKUP_FILE" "$WEEKLY_FILE"
    log "Weekly backup saved: $(basename "$WEEKLY_FILE")"
fi
 
# Copy to monthly on the 1st
if [[ "$DAY_OF_MONTH" == "01" ]]; then
    MONTHLY_FILE="$MONTHLY/${NAME}_month_${DATE}${EXT}"
    cp "$BACKUP_FILE" "$MONTHLY_FILE"
    log "Monthly backup saved: $(basename "$MONTHLY_FILE")"
fi
 
# Rotation: remove old backups
log "Cleaning up old backups..."
 
# Daily: keep 7 days
daily_removed=$(find "$DAILY" -name "*${EXT}" -mtime +7 -delete -print | wc -l)
[[ $daily_removed -gt 0 ]] && log "  Removed $daily_removed old daily backup(s)"
 
# Weekly: keep 4 weeks (28 days)
weekly_removed=$(find "$WEEKLY" -name "*${EXT}" -mtime +28 -delete -print | wc -l)
[[ $weekly_removed -gt 0 ]] && log "  Removed $weekly_removed old weekly backup(s)"
 
# Monthly: keep 12 months (365 days)
monthly_removed=$(find "$MONTHLY" -name "*${EXT}" -mtime +365 -delete -print | wc -l)
[[ $monthly_removed -gt 0 ]] && log "  Removed $monthly_removed old monthly backup(s)"
 
# Summary
log "=========================================="
log "Backup completed successfully!"
log "Current backup counts:"
log "  Daily:   $(find "$DAILY" -name "*${EXT}" | wc -l)"
log "  Weekly:  $(find "$WEEKLY" -name "*${EXT}" | wc -l)"
log "  Monthly: $(find "$MONTHLY" -name "*${EXT}" | wc -l)"
log "=========================================="
 
exit $E_SUCCESS
 

Why this rotation policy works:

  • Granularity: Recent changes (last 7 days) are all recoverable
  • Long-term: Can recover state from weeks/months ago
  • Space-efficient: Old backups automatically cleaned up
  • Predictable: Know exactly how many backups you have

Real-world deployment:

# Daily website backup at 2 AM
0 2 * * * /usr/local/bin/backup_rotate.sh -s /var/www/html -n website -c
 
# Daily database dump + backup at 3 AM
0 3 * * * pg_dump mydb > /tmp/db.sql && /usr/local/bin/backup_rotate.sh -s /tmp/db.sql -n database -c
 

5. Web Server Log Analyzer

The Problem: You have gigabytes of Apache/Nginx access logs. You need to quickly answer: Which IPs are hitting us most? What pages are popular? Are there errors? When is peak traffic?

The Solution: Parse and analyze access logs with awk, generating a comprehensive report.

#!/usr/bin/env bash
set -euo pipefail
 
# log_analyzer.sh — Web server access log analyzer
#
# USAGE:
#   ./log_analyzer.sh ACCESS_LOG [ACCESS_LOG...]
#
# SUPPORTED FORMATS:
#   - Apache Combined Log Format
#   - Nginx default access log format
#
# OUTPUT:
#   - Total requests
#   - Requests by HTTP status code
#   - Top IP addresses
#   - Top requested URLs
#   - Traffic by hour
#   - Top user agents
#   - Bandwidth usage (if log includes bytes)
 
readonly SCRIPT_NAME="$(basename "$0")"
 
# Color output
if [[ -t 1 ]]; then
    readonly BOLD='\033[1m'
    readonly BLUE='\033[0;34m'
    readonly GREEN='\033[0;32m'
    readonly YELLOW='\033[0;33m'
    readonly RED='\033[0;31m'
    readonly NC='\033[0m'
else
    readonly BOLD='' BLUE='' GREEN='' YELLOW='' RED='' NC=''
fi
 
usage() {
    cat << EOF
Usage: $SCRIPT_NAME LOG_FILE [LOG_FILE...]
 
Analyze web server access logs and generate comprehensive report.
 
Supports:
  - Apache Combined Log Format
  - Nginx default access log format
 
Examples:
  $SCRIPT_NAME /var/log/apache2/access.log
  $SCRIPT_NAME /var/log/nginx/access.log*
  $SCRIPT_NAME access.log access.log.1
 
Report includes:
  - Total requests
  - Status code distribution (2xx, 3xx, 4xx, 5xx)
  - Top IP addresses
  - Top requested URLs
  - Traffic patterns by hour
  - Top user agents
  - Bandwidth usage
EOF
}
 
# Validate arguments
if [[ $# -eq 0 ]]; then
    usage
    exit 1
fi
 
for file in "$@"; do
    if [[ ! -f "$file" ]]; then
        echo "Error: File not found: $file" >&2
        exit 1
    fi
done
 
# Combine all log files
TEMP_LOG=$(mktemp)
trap "rm -f '$TEMP_LOG'" EXIT
 
cat "$@" > "$TEMP_LOG"
 
# Count total requests
total_requests=$(wc -l < "$TEMP_LOG")
 
# Print header
echo ""
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo -e "${BOLD}  Web Server Log Analysis${NC}"
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo -e "  ${BLUE}Files analyzed:${NC} $#"
echo -e "  ${BLUE}Total requests:${NC} $(printf "%'d" "$total_requests")"
echo -e "  ${BLUE}Date range:${NC} $(head -1 "$TEMP_LOG" | awk -F'[\\[\\]]' '{print $2}' | cut -d: -f1) to $(tail -1 "$TEMP_LOG" | awk -F'[\\[\\]]' '{print $2}' | cut -d: -f1)"
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo ""
 
# ==================== Status Code Distribution ====================
echo -e "${BOLD}${BLUE}📊 HTTP Status Code Distribution${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
 
awk '{
    # Extract status code (field 9 in combined log format)
    match($0, /" [0-9]{3} /, arr)
    if (RSTART > 0) {
        code = substr($0, RSTART+2, 3)
        if (code >= 200 && code < 300) status["2xx"]++
        else if (code >= 300 && code < 400) status["3xx"]++
        else if (code >= 400 && code < 500) status["4xx"]++
        else if (code >= 500 && code < 600) status["5xx"]++
        codes[code]++
    }
}
END {
    # Print summary by category
    printf "  %-10s %10d  (%5.1f%%)  %s\n", "2xx (OK)", status["2xx"], (status["2xx"]/NR)*100, "'"${GREEN}■■■■■${NC}"'"
    printf "  %-10s %10d  (%5.1f%%)  %s\n", "3xx (Redir)", status["3xx"], (status["3xx"]/NR)*100, "'"${YELLOW}■■■${NC}"'"
    printf "  %-10s %10d  (%5.1f%%)  %s\n", "4xx (Client)", status["4xx"], (status["4xx"]/NR)*100, "'"${RED}■■■■${NC}"'"
    printf "  %-10s %10d  (%5.1f%%)  %s\n", "5xx (Server)", status["5xx"], (status["5xx"]/NR)*100, "'"${RED}■■■■■■${NC}"'"
    print ""
 
    # Top 5 specific codes
    print "  Top status codes:"
    n = asorti(codes, sorted_codes, "@val_num_desc")
    for (i = 1; i <= (n < 5 ? n : 5); i++) {
        code = sorted_codes[i]
        printf "    %s: %d requests (%.1f%%)\n", code, codes[code], (codes[code]/NR)*100
    }
}' "$TEMP_LOG"
 
echo ""
 
# ==================== Top IP Addresses ====================
echo -e "${BOLD}${BLUE}🌐 Top 10 IP Addresses${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
 
awk '{print $1}' "$TEMP_LOG" | sort | uniq -c | sort -rn | head -10 | \
    awk -v total="$total_requests" '{
        pct = ($1/total)*100
        printf "  %8d requests  (%5.1f%%)  %s\n", $1, pct, $2
    }'
 
echo ""
 
# ==================== Top Requested URLs ====================
echo -e "${BOLD}${BLUE}📄 Top 10 Requested URLs${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
 
awk '{
    match($0, /"[A-Z]+ [^ ]+ HTTP/, arr)
    if (RSTART > 0) {
        url_part = substr($0, RSTART+1)
        match(url_part, /[A-Z]+ ([^ ]+)/, url_match)
        if (url_match[1] != "") {
            print url_match[1]
        }
    }
}' "$TEMP_LOG" | sort | uniq -c | sort -rn | head -10 | \
    awk -v total="$total_requests" '{
        count = $1
        $1 = ""
        url = $0
        sub(/^ */, "", url)
        pct = (count/total)*100
        printf "  %8d  (%5.1f%%)  %s\n", count, pct, url
    }'
 
echo ""
 
# ==================== Traffic by Hour ====================
echo -e "${BOLD}${BLUE}🕐 Traffic Distribution by Hour${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
 
awk -F'[\\[:]' '{
    hour = $3
    if (hour >= 0 && hour <= 23) {
        hours[hour]++
    }
}
END {
    max = 0
    for (h in hours) {
        if (hours[h] > max) max = hours[h]
    }
 
    for (h = 0; h < 24; h++) {
        count = hours[h]
        if (count == "") count = 0
        bar_len = int((count / max) * 30)
        bar = ""
        for (i = 0; i < bar_len; i++) bar = bar "'"${GREEN}■${NC}"'"
        printf "  %02d:00  %8d  %s\n", h, count, bar
    }
}' "$TEMP_LOG"
 
echo ""
 
# ==================== Top User Agents ====================
echo -e "${BOLD}${BLUE}🤖 Top 10 User Agents${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
 
awk -F'"' '{
    # User agent is typically the 6th quoted field
    if (NF >= 6) print $6
}' "$TEMP_LOG" | sort | uniq -c | sort -rn | head -10 | \
    awk -v total="$total_requests" '{
        count = $1
        $1 = ""
        agent = $0
        sub(/^ */, "", agent)
        pct = (count/total)*100
 
        # Truncate long user agents
        if (length(agent) > 60) {
            agent = substr(agent, 1, 57) "..."
        }
 
        printf "  %8d  (%5.1f%%)  %s\n", count, pct, agent
    }'
 
echo ""
 
# ==================== Bandwidth Usage ====================
echo -e "${BOLD}${BLUE}📦 Bandwidth Usage${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
 
awk '{
    # Bytes sent is typically the last numeric field before user agent
    match($0, /" [0-9]+ [0-9]+ "/, arr)
    if (RSTART > 0) {
        bytes_part = substr($0, RSTART+2)
        match(bytes_part, /[0-9]+/, bytes_match)
        if (bytes_match[0] != "" && bytes_match[0] != "-") {
            total_bytes += bytes_match[0]
        }
    }
}
END {
    gb = total_bytes / (1024*1024*1024)
    mb = total_bytes / (1024*1024)
    printf "  Total:     %8.2f GB (%10.2f MB)\n", gb, mb
    if (NR > 0) {
        avg_bytes = total_bytes / NR
        avg_kb = avg_bytes / 1024
        printf "  Per request: %7.2f KB\n", avg_kb
    }
}' "$TEMP_LOG"
 
echo ""
 
# ==================== Footer ====================
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo -e "  ${GREEN}✓${NC} Analysis complete"
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo ""
 

Advanced analysis features:

  1. Status code distribution: Shows 2xx/3xx/4xx/5xx breakdown
  2. Traffic patterns: Hourly distribution with bar chart
  3. Security insights: Top IPs (potential DoS), 4xx/5xx errors
  4. Performance data: Bandwidth usage, average request size
  5. Handles multiple files: Analyze rotated logs together

Real-world usage:

# Analyze current log
./log_analyzer.sh /var/log/nginx/access.log
 
# Analyze all rotated logs (last week)
./log_analyzer.sh /var/log/nginx/access.log*
 
# Analyze specific date range (using grep)
grep "11/Feb/2026" /var/log/apache2/access.log | ./log_analyzer.sh /dev/stdin
 

Exercises

🏋️ Exercise 1: Build a Duplicate File Finder

Task: Write duplicate_finder.sh that:

  • Finds duplicate files by content (using md5sum)
  • Groups duplicates together
  • Reports total wasted space
  • Optionally deletes duplicates (keeping one copy)

Bonus: Add size filter (only check files > 1MB) for speed.

Show Solution
#!/usr/bin/env bash
set -euo pipefail
 
readonly SCRIPT_NAME="$(basename "$0")"
MIN_SIZE="${MIN_SIZE:-0}"  # Minimum file size (bytes)
DELETE=false
 
usage() {
    cat << EOF
Usage: $SCRIPT_NAME [OPTIONS] DIRECTORY
 
Find duplicate files by content (MD5 hash).
 
OPTIONS:
    -d          Delete duplicates (keep first, remove others)
    -m SIZE     Minimum file size in MB (default: 0)
    -h          Show help
 
ENVIRONMENT:
    MIN_SIZE=N  Minimum file size in bytes
 
EXAMPLES:
    $SCRIPT_NAME ~/Downloads
    $SCRIPT_NAME -m 1 ~/Documents          # Only files > 1MB
    $SCRIPT_NAME -d ~/Downloads            # Delete duplicates
 
OUTPUT:
    Lists duplicate groups with file paths and wasted space.
EOF
}
 
while getopts ":dm:h" opt; do
    case $opt in
        d) DELETE=true ;;
        m) MIN_SIZE=$((OPTARG * 1024 * 1024)) ;;
        h) usage; exit 0 ;;
        \?) echo "Unknown option: -$OPTARG" >&2; usage; exit 1 ;;
        :) echo "-$OPTARG requires an argument" >&2; exit 1 ;;
    esac
done
 
shift $((OPTIND - 1))
 
dir="${1:-.}"
[[ -d "$dir" ]] || { echo "Error: Directory not found: $dir" >&2; exit 1; }
 
echo "Scanning for duplicate files in: $dir"
[[ $MIN_SIZE -gt 0 ]] && echo "Minimum file size: $((MIN_SIZE / 1024 / 1024)) MB"
echo ""
 
temp_hashes=$(mktemp)
trap "rm -f '$temp_hashes'" EXIT
 
# Find files, compute hashes
find "$dir" -type f -size +"${MIN_SIZE}c" -exec md5sum {} \; 2>/dev/null | \
    sort > "$temp_hashes"
 
# Find duplicates
total_dupes=0
wasted_space=0
 
awk '{
    hash = $1
    file = substr($0, 35)  # MD5 is 32 chars + 2 spaces
 
    if (hash == prev_hash) {
        if (dup_count == 0) {
            # First duplicate found - print header
            print "\nDuplicate group (hash: " hash "):"
            print "  [1] " prev_file
            group_files[1] = prev_file
        }
        dup_count++
        total_dupes++
        print "  [" (dup_count + 1) "] " file
        group_files[dup_count + 1] = file
 
        # Calculate wasted space (all but first copy)
        cmd = "stat -f%z \"" file "\" 2>/dev/null || stat -c%s \"" file "\""
        cmd | getline size
        close(cmd)
        wasted += size
    } else {
        # Process previous group if delete mode
        if (dup_count > 0 && delete_mode == "true") {
            print "  Deleting duplicates, keeping: " group_files[1]
            for (i = 2; i <= dup_count + 1; i++) {
                system("rm -f \"" group_files[i] "\"")
                print "    Deleted: " group_files[i]
            }
        }
 
        # Reset for new hash
        dup_count = 0
        delete group_files
    }
 
    prev_hash = hash
    prev_file = file
}
END {
    # Handle last group
    if (dup_count > 0 && delete_mode == "true") {
        print "  Deleting duplicates, keeping: " group_files[1]
        for (i = 2; i <= dup_count + 1; i++) {
            system("rm -f \"" group_files[i] "\"")
            print "    Deleted: " group_files[i]
        }
    }
 
    print "\n=========================================="
    if (total_dupes > 0) {
        print "Found " total_dupes " duplicate file(s)"
        printf "Wasted space: %.2f MB\n", wasted/1024/1024
    } else {
        print "No duplicates found"
    }
    print "=========================================="
}' delete_mode="$DELETE" "$temp_hashes"
 
🏋️ Exercise 2: Build a Website Uptime Monitor

Task: Write uptime_monitor.sh that:

  • Checks a website every 30 seconds
  • Logs downtime with timestamp
  • When site recovers, logs total downtime duration
  • Sends email alert on downtime (optional)
  • Tracks uptime percentage
Show Solution
#!/usr/bin/env bash
set -euo pipefail
 
readonly SCRIPT_NAME="$(basename "$0")"
URL="${1:-}"
INTERVAL="${2:-30}"
EMAIL="${EMAIL:-}"
LOGFILE="/var/log/uptime_$(echo "$URL" | tr '/:' '_').log"
 
[[ -z "$URL" ]] && {
    echo "Usage: $SCRIPT_NAME URL [INTERVAL_SECONDS]" >&2
    echo "Example: $SCRIPT_NAME https://example.com 60" >&2
    exit 1
}
 
log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOGFILE"
}
 
send_alert() {
    local subject="$1"
    local message="$2"
 
    [[ -z "$EMAIL" ]] && return
 
    if command -v mail &>/dev/null; then
        echo "$message" | mail -s "$subject" "$EMAIL"
        log "Alert sent to $EMAIL"
    fi
}
 
is_down=false
down_since=""
total_checks=0
failed_checks=0
 
log "=========================================="
log "Monitoring: $URL (interval: ${INTERVAL}s)"
log "=========================================="
 
while true; do
    ((total_checks++))
 
    if curl -sf --max-time 10 "$URL" >/dev/null 2>&1; then
        # Site is UP
        if $is_down; then
            # RECOVERY
            now=$(date +%s)
            down_start=$(date -d "$down_since" +%s 2>/dev/null || echo "$now")
            duration=$((now - down_start))
 
            log "✓ RECOVERY: $URL is back up (downtime: ${duration}s)"
            send_alert "[RECOVERY] $URL is back up" \
                "Site: $URL\nDowntime: ${duration}s\nRecovered: $(date)"
 
            is_down=false
        fi
    else
        # Site is DOWN
        ((failed_checks++))
 
        if ! $is_down; then
            # INITIAL FAILURE
            down_since=$(date '+%Y-%m-%d %H:%M:%S')
            log "✗ DOWN: $URL is not responding!"
            send_alert "[ALERT] $URL is down!" \
                "Site: $URL\nStatus: Not responding\nTime: $down_since"
 
            is_down=true
        else
            # STILL DOWN
            now=$(date +%s)
            down_start=$(date -d "$down_since" +%s 2>/dev/null || echo "$now")
            duration=$((now - down_start))
            log "✗ STILL DOWN: $URL (${duration}s)"
        fi
    fi
 
    # Calculate uptime percentage
    uptime_pct=$(echo "$total_checks $failed_checks" | \
        awk '{printf "%.2f", (1 - $2/$1) * 100}')
 
    log "Stats: $total_checks checks, uptime: ${uptime_pct}%"
 
    sleep "$INTERVAL"
done
 
🏋️ Exercise 3: Build a CSV Data Validator

Task: Write validate_csv.sh that processes a CSV file:

  • Input: name,email,age,score format
  • Validates:
    • No empty fields
    • Valid email format (regex)
    • Age: 1-150
    • Score: 0-100
  • Outputs:
    • _clean.csv with valid records
    • _errors.csv with invalid records and error descriptions
  • Prints summary statistics
Show Solution
#!/usr/bin/env bash
set -euo pipefail
 
INPUT="${1:?Usage: $0 INPUT.csv}"
CLEAN="${INPUT%.csv}_clean.csv"
ERRORS="${INPUT%.csv}_errors.csv"
 
[[ -f "$INPUT" ]] || { echo "File not found: $INPUT" >&2; exit 1; }
 
echo "Validating: $INPUT"
echo "Output: $CLEAN (valid), $ERRORS (invalid)"
echo ""
 
# Write headers
head -1 "$INPUT" > "$CLEAN"
echo "line,field,error,original_value" > "$ERRORS"
 
line_num=0
clean_count=0
error_count=0
 
declare -A error_types=()
 
while IFS=, read -r name email age score; do
    ((line_num++))
    [[ $line_num -eq 1 ]] && continue  # Skip header
 
    valid=true
    errors_this_line=()
 
    # Validate name (not empty)
    if [[ -z "$name" ]]; then
        errors_this_line+=("name:empty")
        echo "$line_num,name,empty,\"\"" >> "$ERRORS"
        ((error_types[empty_name]++))
        valid=false
    fi
 
    # Validate email (basic regex)
    email_regex='^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    if [[ ! "$email" =~ $email_regex ]]; then
        errors_this_line+=("email:invalid")
        echo "$line_num,email,invalid,\"$email\"" >> "$ERRORS"
        ((error_types[invalid_email]++))
        valid=false
    fi
 
    # Validate age (1-150)
    if ! [[ "$age" =~ ^[0-9]+$ ]] || (( age < 1 || age > 150 )); then
        errors_this_line+=("age:out_of_range")
        echo "$line_num,age,out_of_range,\"$age\"" >> "$ERRORS"
        ((error_types[invalid_age]++))
        valid=false
    fi
 
    # Validate score (0-100)
    if ! [[ "$score" =~ ^[0-9]+$ ]] || (( score < 0 || score > 100 )); then
        errors_this_line+=("score:out_of_range")
        echo "$line_num,score,out_of_range,\"$score\"" >> "$ERRORS"
        ((error_types[invalid_score]++))
        valid=false
    fi
 
    # Write to appropriate file
    if $valid; then
        echo "$name,$email,$age,$score" >> "$CLEAN"
        ((clean_count++))
    else
        ((error_count++))
    fi
done < "$INPUT"
 
# Summary
echo "=========================================="
echo "Validation Complete"
echo "=========================================="
echo "Total records:  $((line_num - 1))"
echo "Valid:          $clean_count (→ $CLEAN)"
echo "Invalid:        $error_count (→ $ERRORS)"
echo ""
 
if [[ $error_count -gt 0 ]]; then
    echo "Error breakdown:"
    for error_type in "${!error_types[@]}"; do
        printf "  %-20s %d\n" "$error_type:" "${error_types[$error_type]}"
    done
fi
 
echo "=========================================="
 

Summary

You've seen 5 production-quality scripts demonstrating:

Design Patterns:

  • Strict mode + error handling in every script
  • Clear usage messages and help text
  • Logging with timestamps
  • Exit codes for automation
  • Cleanup with trap

Common Tasks:

  • File management (normalize, organize)
  • System monitoring (health checks, uptime)
  • Data processing (log analysis, CSV validation)
  • Backup automation (with rotation policies)

Key Techniques:

  • getopts for argument parsing
  • Associative arrays for categorization
  • awk for complex text processing
  • Date arithmetic for rotation policies
  • Colorized output for terminal vs log files

What's Next:

  1. Customize these scripts for your environment:
    • Adjust thresholds in health monitor
    • Add file categories to organizer
    • Extend log analyzer for your log format
  2. Deploy to production:
    • Install in /usr/local/bin/
    • Set up cron jobs
    • Configure email alerts
  3. Build your own:
    • Start with one of these templates
    • Add your specific logic
    • Follow the same patterns (strict mode, logging, error handling)

You now have a complete toolkit for shell automation. Every problem—file management, monitoring, data processing, backups—follows these same patterns. Master them and you can automate anything.

Was this page helpful?
SR

Written by the ShellRAG Team

The ShellRAG editorial team writes practical, beginner-friendly Bash Shell tutorials with tested code examples and real-world use cases. Every article is technically reviewed for accuracy and updated regularly.

Learn more about us →