Advanced Real-World Scripts
Summary: in this tutorial, you will learn build a backup system with intelligent rotation and a web server log analyzer — complete production scripts with real-world patterns.
Advanced Real-World Scripts
Building on the simpler utility scripts from the previous tutorial, these advanced projects tackle more complex real-world challenges: automated backups with intelligent rotation and a complete web server log analyzer. Each script demonstrates professional patterns you will use in production environments.
4. Backup with Intelligent Rotation
The Problem: Daily backups fill up your disk. Manual cleanup is tedious and error-prone. You need retention policies: keep daily backups for 7 days, weekly for 4 weeks, monthly for 12 months.
The Solution: Automated backup with grandfather-father-son (GFS) rotation.
#!/usr/bin/env bash
set -euo pipefail
# backup_rotate.sh — Backup with intelligent GFS rotation
#
# RETENTION POLICY:
# Daily: 7 backups (1 week)
# Weekly: 4 backups (1 month) — Saved on Sundays
# Monthly: 12 backups (1 year) — Saved on 1st of month
#
# USAGE:
# ./backup_rotate.sh -s SOURCE -n NAME [-b BASE_DIR]
#
# OPTIONS:
# -s SOURCE Directory to backup
# -n NAME Backup name (identifier)
# -b BASE Base backup directory (default: /backup)
# -c Compress with gzip
# -h Show help
#
# EXAMPLES:
# ./backup_rotate.sh -s /var/www -n website -c
# ./backup_rotate.sh -s /home/alice -n home -b /mnt/backups
#
# CRON:
# # Daily backup at 2 AM
# 0 2 * * * /usr/local/bin/backup_rotate.sh -s /var/www -n website -c
readonly SCRIPT_NAME="$(basename "$0")"
BACKUP_BASE="/backup"
LOGFILE=""
COMPRESS=false
# Exit codes
readonly E_SUCCESS=0
readonly E_MISSING_ARGS=2
readonly E_SOURCE_NOT_FOUND=3
readonly E_BACKUP_FAILED=4
log() {
local msg="[$(date '+%Y-%m-%d %H:%M:%S')] $*"
echo "$msg"
[[ -n "$LOGFILE" ]] && echo "$msg" >> "$LOGFILE"
}
error() {
echo "[ERROR] $*" >&2
[[ -n "$LOGFILE" ]] && echo "[ERROR] $*" >> "$LOGFILE"
}
die() {
error "$*"
exit "${2:-1}"
}
usage() {
cat << EOF
Usage: $SCRIPT_NAME -s SOURCE -n NAME [OPTIONS]
Create backups with intelligent GFS rotation policy.
REQUIRED:
-s SOURCE Directory to backup
-n NAME Backup identifier (e.g., 'website', 'database')
OPTIONS:
-b BASE Base backup directory (default: /backup)
-c Compress backups with gzip
-h Show this help
RETENTION POLICY:
Daily: 7 backups (last 7 days)
Weekly: 4 backups (saved on Sundays)
Monthly: 12 backups (saved on 1st of month)
EXAMPLES:
$SCRIPT_NAME -s /var/www/html -n website -c
$SCRIPT_NAME -s /home/user -n home_backup -b /mnt/external
CRON EXAMPLE:
# Daily backup at 2 AM with compression
0 2 * * * $SCRIPT_NAME -s /var/www -n website -c 2>&1 | logger -t backup
DIRECTORY STRUCTURE:
/backup/
└── NAME/
├── daily/ # Last 7 days
├── weekly/ # Last 4 weeks (Sundays)
└── monthly/ # Last 12 months (1st of month)
EOF
}
# Parse arguments
SOURCE=""
NAME=""
while getopts ":s:n:b:ch" opt; do
case $opt in
s) SOURCE="$OPTARG" ;;
n) NAME="$OPTARG" ;;
b) BACKUP_BASE="$OPTARG" ;;
c) COMPRESS=true ;;
h) usage; exit $E_SUCCESS ;;
:) die "-$OPTARG requires an argument" $E_MISSING_ARGS ;;
\?) die "Unknown option: -$OPTARG (use -h for help)" $E_MISSING_ARGS ;;
esac
done
# Validate required arguments
[[ -z "$SOURCE" ]] && die "Source directory is required (-s)" $E_MISSING_ARGS
[[ -z "$NAME" ]] && die "Backup name is required (-n)" $E_MISSING_ARGS
[[ -d "$SOURCE" ]] || die "Source directory not found: $SOURCE" $E_SOURCE_NOT_FOUND
# Setup logging
LOGFILE="$BACKUP_BASE/$NAME/backup.log"
mkdir -p "$(dirname "$LOGFILE")" 2>/dev/null || true
log "=========================================="
log "Backup started"
log "Source: $SOURCE"
log "Name: $NAME"
log "Base: $BACKUP_BASE"
log "Compress: $COMPRESS"
log "=========================================="
# Create directory structure
DAILY="$BACKUP_BASE/$NAME/daily"
WEEKLY="$BACKUP_BASE/$NAME/weekly"
MONTHLY="$BACKUP_BASE/$NAME/monthly"
for dir in "$DAILY" "$WEEKLY" "$MONTHLY"; do
if ! mkdir -p "$dir"; then
die "Cannot create directory: $dir" $E_BACKUP_FAILED
fi
done
# Date information
DATE=$(date +%Y%m%d_%H%M%S)
DAY_OF_WEEK=$(date +%u) # 1=Monday, 7=Sunday
DAY_OF_MONTH=$(date +%d)
# Determine file extension
if $COMPRESS; then
EXT=".tar.gz"
else
EXT=".tar"
fi
# Create daily backup
BACKUP_FILE="$DAILY/${NAME}_${DATE}${EXT}"
log "Creating daily backup: $BACKUP_FILE"
# Perform backup
if $COMPRESS; then
if tar czf "$BACKUP_FILE" -C "$(dirname "$SOURCE")" "$(basename "$SOURCE")" 2>&1 | grep -v "Removing leading"; then
:
else
die "Backup failed!" $E_BACKUP_FAILED
fi
else
if tar cf "$BACKUP_FILE" -C "$(dirname "$SOURCE")" "$(basename "$SOURCE")" 2>&1 | grep -v "Removing leading"; then
:
else
die "Backup failed!" $E_BACKUP_FAILED
fi
fi
# Verify backup was created
if [[ ! -f "$BACKUP_FILE" ]]; then
die "Backup file was not created: $BACKUP_FILE" $E_BACKUP_FAILED
fi
# Report size
SIZE=$(du -h "$BACKUP_FILE" | cut -f1)
log "Backup created successfully: $SIZE"
# Copy to weekly on Sundays
if [[ "$DAY_OF_WEEK" -eq 7 ]]; then
WEEKLY_FILE="$WEEKLY/${NAME}_week_${DATE}${EXT}"
cp "$BACKUP_FILE" "$WEEKLY_FILE"
log "Weekly backup saved: $(basename "$WEEKLY_FILE")"
fi
# Copy to monthly on the 1st
if [[ "$DAY_OF_MONTH" == "01" ]]; then
MONTHLY_FILE="$MONTHLY/${NAME}_month_${DATE}${EXT}"
cp "$BACKUP_FILE" "$MONTHLY_FILE"
log "Monthly backup saved: $(basename "$MONTHLY_FILE")"
fi
# Rotation: remove old backups
log "Cleaning up old backups..."
# Daily: keep 7 days
daily_removed=$(find "$DAILY" -name "*${EXT}" -mtime +7 -delete -print | wc -l)
[[ $daily_removed -gt 0 ]] && log " Removed $daily_removed old daily backup(s)"
# Weekly: keep 4 weeks (28 days)
weekly_removed=$(find "$WEEKLY" -name "*${EXT}" -mtime +28 -delete -print | wc -l)
[[ $weekly_removed -gt 0 ]] && log " Removed $weekly_removed old weekly backup(s)"
# Monthly: keep 12 months (365 days)
monthly_removed=$(find "$MONTHLY" -name "*${EXT}" -mtime +365 -delete -print | wc -l)
[[ $monthly_removed -gt 0 ]] && log " Removed $monthly_removed old monthly backup(s)"
# Summary
log "=========================================="
log "Backup completed successfully!"
log "Current backup counts:"
log " Daily: $(find "$DAILY" -name "*${EXT}" | wc -l)"
log " Weekly: $(find "$WEEKLY" -name "*${EXT}" | wc -l)"
log " Monthly: $(find "$MONTHLY" -name "*${EXT}" | wc -l)"
log "=========================================="
exit $E_SUCCESS
Why this rotation policy works:
- Granularity: Recent changes (last 7 days) are all recoverable
- Long-term: Can recover state from weeks/months ago
- Space-efficient: Old backups automatically cleaned up
- Predictable: Know exactly how many backups you have
Real-world deployment:
# Daily website backup at 2 AM
0 2 * * * /usr/local/bin/backup_rotate.sh -s /var/www/html -n website -c
# Daily database dump + backup at 3 AM
0 3 * * * pg_dump mydb > /tmp/db.sql && /usr/local/bin/backup_rotate.sh -s /tmp/db.sql -n database -c
5. Web Server Log Analyzer
The Problem: You have gigabytes of Apache/Nginx access logs. You need to quickly answer: Which IPs are hitting us most? What pages are popular? Are there errors? When is peak traffic?
The Solution: Parse and analyze access logs with awk, generating a comprehensive report.
#!/usr/bin/env bash
set -euo pipefail
# log_analyzer.sh — Web server access log analyzer
#
# USAGE:
# ./log_analyzer.sh ACCESS_LOG [ACCESS_LOG...]
#
# SUPPORTED FORMATS:
# - Apache Combined Log Format
# - Nginx default access log format
#
# OUTPUT:
# - Total requests
# - Requests by HTTP status code
# - Top IP addresses
# - Top requested URLs
# - Traffic by hour
# - Top user agents
# - Bandwidth usage (if log includes bytes)
readonly SCRIPT_NAME="$(basename "$0")"
# Color output
if [[ -t 1 ]]; then
readonly BOLD='\033[1m'
readonly BLUE='\033[0;34m'
readonly GREEN='\033[0;32m'
readonly YELLOW='\033[0;33m'
readonly RED='\033[0;31m'
readonly NC='\033[0m'
else
readonly BOLD='' BLUE='' GREEN='' YELLOW='' RED='' NC=''
fi
usage() {
cat << EOF
Usage: $SCRIPT_NAME LOG_FILE [LOG_FILE...]
Analyze web server access logs and generate comprehensive report.
Supports:
- Apache Combined Log Format
- Nginx default access log format
Examples:
$SCRIPT_NAME /var/log/apache2/access.log
$SCRIPT_NAME /var/log/nginx/access.log*
$SCRIPT_NAME access.log access.log.1
Report includes:
- Total requests
- Status code distribution (2xx, 3xx, 4xx, 5xx)
- Top IP addresses
- Top requested URLs
- Traffic patterns by hour
- Top user agents
- Bandwidth usage
EOF
}
# Validate arguments
if [[ $# -eq 0 ]]; then
usage
exit 1
fi
for file in "$@"; do
if [[ ! -f "$file" ]]; then
echo "Error: File not found: $file" >&2
exit 1
fi
done
# Combine all log files
TEMP_LOG=$(mktemp)
trap "rm -f '$TEMP_LOG'" EXIT
cat "$@" > "$TEMP_LOG"
# Count total requests
total_requests=$(wc -l < "$TEMP_LOG")
# Print header
echo ""
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo -e "${BOLD} Web Server Log Analysis${NC}"
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo -e " ${BLUE}Files analyzed:${NC} $#"
echo -e " ${BLUE}Total requests:${NC} $(printf "%'d" "$total_requests")"
echo -e " ${BLUE}Date range:${NC} $(head -1 "$TEMP_LOG" | awk -F'[\\[\\]]' '{print $2}' | cut -d: -f1) to $(tail -1 "$TEMP_LOG" | awk -F'[\\[\\]]' '{print $2}' | cut -d: -f1)"
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo ""
# ==================== Status Code Distribution ====================
echo -e "${BOLD}${BLUE}📊 HTTP Status Code Distribution${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
awk '{
# Extract status code (field 9 in combined log format)
match($0, /" [0-9]{3} /, arr)
if (RSTART > 0) {
code = substr($0, RSTART+2, 3)
if (code >= 200 && code < 300) status["2xx"]++
else if (code >= 300 && code < 400) status["3xx"]++
else if (code >= 400 && code < 500) status["4xx"]++
else if (code >= 500 && code < 600) status["5xx"]++
codes[code]++
}
}
END {
# Print summary by category
printf " %-10s %10d (%5.1f%%) %s\n", "2xx (OK)", status["2xx"], (status["2xx"]/NR)*100, "'"${GREEN}■■■■■${NC}"'"
printf " %-10s %10d (%5.1f%%) %s\n", "3xx (Redir)", status["3xx"], (status["3xx"]/NR)*100, "'"${YELLOW}■■■${NC}"'"
printf " %-10s %10d (%5.1f%%) %s\n", "4xx (Client)", status["4xx"], (status["4xx"]/NR)*100, "'"${RED}■■■■${NC}"'"
printf " %-10s %10d (%5.1f%%) %s\n", "5xx (Server)", status["5xx"], (status["5xx"]/NR)*100, "'"${RED}■■■■■■${NC}"'"
print ""
# Top 5 specific codes
print " Top status codes:"
n = asorti(codes, sorted_codes, "@val_num_desc")
for (i = 1; i <= (n < 5 ? n : 5); i++) {
code = sorted_codes[i]
printf " %s: %d requests (%.1f%%)\n", code, codes[code], (codes[code]/NR)*100
}
}' "$TEMP_LOG"
echo ""
# ==================== Top IP Addresses ====================
echo -e "${BOLD}${BLUE}🌐 Top 10 IP Addresses${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
awk '{print $1}' "$TEMP_LOG" | sort | uniq -c | sort -rn | head -10 | \
awk -v total="$total_requests" '{
pct = ($1/total)*100
printf " %8d requests (%5.1f%%) %s\n", $1, pct, $2
}'
echo ""
# ==================== Top Requested URLs ====================
echo -e "${BOLD}${BLUE}📄 Top 10 Requested URLs${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
awk '{
match($0, /"[A-Z]+ [^ ]+ HTTP/, arr)
if (RSTART > 0) {
url_part = substr($0, RSTART+1)
match(url_part, /[A-Z]+ ([^ ]+)/, url_match)
if (url_match[1] != "") {
print url_match[1]
}
}
}' "$TEMP_LOG" | sort | uniq -c | sort -rn | head -10 | \
awk -v total="$total_requests" '{
count = $1
$1 = ""
url = $0
sub(/^ */, "", url)
pct = (count/total)*100
printf " %8d (%5.1f%%) %s\n", count, pct, url
}'
echo ""
# ==================== Traffic by Hour ====================
echo -e "${BOLD}${BLUE}🕐 Traffic Distribution by Hour${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
awk -F'[\\[:]' '{
hour = $3
if (hour >= 0 && hour <= 23) {
hours[hour]++
}
}
END {
max = 0
for (h in hours) {
if (hours[h] > max) max = hours[h]
}
for (h = 0; h < 24; h++) {
count = hours[h]
if (count == "") count = 0
bar_len = int((count / max) * 30)
bar = ""
for (i = 0; i < bar_len; i++) bar = bar "'"${GREEN}■${NC}"'"
printf " %02d:00 %8d %s\n", h, count, bar
}
}' "$TEMP_LOG"
echo ""
# ==================== Top User Agents ====================
echo -e "${BOLD}${BLUE}🤖 Top 10 User Agents${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
awk -F'"' '{
# User agent is typically the 6th quoted field
if (NF >= 6) print $6
}' "$TEMP_LOG" | sort | uniq -c | sort -rn | head -10 | \
awk -v total="$total_requests" '{
count = $1
$1 = ""
agent = $0
sub(/^ */, "", agent)
pct = (count/total)*100
# Truncate long user agents
if (length(agent) > 60) {
agent = substr(agent, 1, 57) "..."
}
printf " %8d (%5.1f%%) %s\n", count, pct, agent
}'
echo ""
# ==================== Bandwidth Usage ====================
echo -e "${BOLD}${BLUE}📦 Bandwidth Usage${NC}"
echo -e "${BOLD}────────────────────────────────────────────────────${NC}"
awk '{
# Bytes sent is typically the last numeric field before user agent
match($0, /" [0-9]+ [0-9]+ "/, arr)
if (RSTART > 0) {
bytes_part = substr($0, RSTART+2)
match(bytes_part, /[0-9]+/, bytes_match)
if (bytes_match[0] != "" && bytes_match[0] != "-") {
total_bytes += bytes_match[0]
}
}
}
END {
gb = total_bytes / (1024*1024*1024)
mb = total_bytes / (1024*1024)
printf " Total: %8.2f GB (%10.2f MB)\n", gb, mb
if (NR > 0) {
avg_bytes = total_bytes / NR
avg_kb = avg_bytes / 1024
printf " Per request: %7.2f KB\n", avg_kb
}
}' "$TEMP_LOG"
echo ""
# ==================== Footer ====================
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo -e " ${GREEN}✓${NC} Analysis complete"
echo -e "${BOLD}════════════════════════════════════════════════════${NC}"
echo ""
Advanced analysis features:
- Status code distribution: Shows 2xx/3xx/4xx/5xx breakdown
- Traffic patterns: Hourly distribution with bar chart
- Security insights: Top IPs (potential DoS), 4xx/5xx errors
- Performance data: Bandwidth usage, average request size
- Handles multiple files: Analyze rotated logs together
Real-world usage:
# Analyze current log
./log_analyzer.sh /var/log/nginx/access.log
# Analyze all rotated logs (last week)
./log_analyzer.sh /var/log/nginx/access.log*
# Analyze specific date range (using grep)
grep "11/Feb/2026" /var/log/apache2/access.log | ./log_analyzer.sh /dev/stdin
Exercises
Task: Write duplicate_finder.sh that:
- Finds duplicate files by content (using
md5sum) - Groups duplicates together
- Reports total wasted space
- Optionally deletes duplicates (keeping one copy)
Bonus: Add size filter (only check files > 1MB) for speed.
Show Solution
#!/usr/bin/env bash
set -euo pipefail
readonly SCRIPT_NAME="$(basename "$0")"
MIN_SIZE="${MIN_SIZE:-0}" # Minimum file size (bytes)
DELETE=false
usage() {
cat << EOF
Usage: $SCRIPT_NAME [OPTIONS] DIRECTORY
Find duplicate files by content (MD5 hash).
OPTIONS:
-d Delete duplicates (keep first, remove others)
-m SIZE Minimum file size in MB (default: 0)
-h Show help
ENVIRONMENT:
MIN_SIZE=N Minimum file size in bytes
EXAMPLES:
$SCRIPT_NAME ~/Downloads
$SCRIPT_NAME -m 1 ~/Documents # Only files > 1MB
$SCRIPT_NAME -d ~/Downloads # Delete duplicates
OUTPUT:
Lists duplicate groups with file paths and wasted space.
EOF
}
while getopts ":dm:h" opt; do
case $opt in
d) DELETE=true ;;
m) MIN_SIZE=$((OPTARG * 1024 * 1024)) ;;
h) usage; exit 0 ;;
\?) echo "Unknown option: -$OPTARG" >&2; usage; exit 1 ;;
:) echo "-$OPTARG requires an argument" >&2; exit 1 ;;
esac
done
shift $((OPTIND - 1))
dir="${1:-.}"
[[ -d "$dir" ]] || { echo "Error: Directory not found: $dir" >&2; exit 1; }
echo "Scanning for duplicate files in: $dir"
[[ $MIN_SIZE -gt 0 ]] && echo "Minimum file size: $((MIN_SIZE / 1024 / 1024)) MB"
echo ""
temp_hashes=$(mktemp)
trap "rm -f '$temp_hashes'" EXIT
# Find files, compute hashes
find "$dir" -type f -size +"${MIN_SIZE}c" -exec md5sum {} \; 2>/dev/null | \
sort > "$temp_hashes"
# Find duplicates
total_dupes=0
wasted_space=0
awk '{
hash = $1
file = substr($0, 35) # MD5 is 32 chars + 2 spaces
if (hash == prev_hash) {
if (dup_count == 0) {
# First duplicate found - print header
print "\nDuplicate group (hash: " hash "):"
print " [1] " prev_file
group_files[1] = prev_file
}
dup_count++
total_dupes++
print " [" (dup_count + 1) "] " file
group_files[dup_count + 1] = file
# Calculate wasted space (all but first copy)
cmd = "stat -f%z \"" file "\" 2>/dev/null || stat -c%s \"" file "\""
cmd | getline size
close(cmd)
wasted += size
} else {
# Process previous group if delete mode
if (dup_count > 0 && delete_mode == "true") {
print " Deleting duplicates, keeping: " group_files[1]
for (i = 2; i <= dup_count + 1; i++) {
system("rm -f \"" group_files[i] "\"")
print " Deleted: " group_files[i]
}
}
# Reset for new hash
dup_count = 0
delete group_files
}
prev_hash = hash
prev_file = file
}
END {
# Handle last group
if (dup_count > 0 && delete_mode == "true") {
print " Deleting duplicates, keeping: " group_files[1]
for (i = 2; i <= dup_count + 1; i++) {
system("rm -f \"" group_files[i] "\"")
print " Deleted: " group_files[i]
}
}
print "\n=========================================="
if (total_dupes > 0) {
print "Found " total_dupes " duplicate file(s)"
printf "Wasted space: %.2f MB\n", wasted/1024/1024
} else {
print "No duplicates found"
}
print "=========================================="
}' delete_mode="$DELETE" "$temp_hashes"
Task: Write uptime_monitor.sh that:
- Checks a website every 30 seconds
- Logs downtime with timestamp
- When site recovers, logs total downtime duration
- Sends email alert on downtime (optional)
- Tracks uptime percentage
Show Solution
#!/usr/bin/env bash
set -euo pipefail
readonly SCRIPT_NAME="$(basename "$0")"
URL="${1:-}"
INTERVAL="${2:-30}"
EMAIL="${EMAIL:-}"
LOGFILE="/var/log/uptime_$(echo "$URL" | tr '/:' '_').log"
[[ -z "$URL" ]] && {
echo "Usage: $SCRIPT_NAME URL [INTERVAL_SECONDS]" >&2
echo "Example: $SCRIPT_NAME https://example.com 60" >&2
exit 1
}
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOGFILE"
}
send_alert() {
local subject="$1"
local message="$2"
[[ -z "$EMAIL" ]] && return
if command -v mail &>/dev/null; then
echo "$message" | mail -s "$subject" "$EMAIL"
log "Alert sent to $EMAIL"
fi
}
is_down=false
down_since=""
total_checks=0
failed_checks=0
log "=========================================="
log "Monitoring: $URL (interval: ${INTERVAL}s)"
log "=========================================="
while true; do
((total_checks++))
if curl -sf --max-time 10 "$URL" >/dev/null 2>&1; then
# Site is UP
if $is_down; then
# RECOVERY
now=$(date +%s)
down_start=$(date -d "$down_since" +%s 2>/dev/null || echo "$now")
duration=$((now - down_start))
log "✓ RECOVERY: $URL is back up (downtime: ${duration}s)"
send_alert "[RECOVERY] $URL is back up" \
"Site: $URL\nDowntime: ${duration}s\nRecovered: $(date)"
is_down=false
fi
else
# Site is DOWN
((failed_checks++))
if ! $is_down; then
# INITIAL FAILURE
down_since=$(date '+%Y-%m-%d %H:%M:%S')
log "✗ DOWN: $URL is not responding!"
send_alert "[ALERT] $URL is down!" \
"Site: $URL\nStatus: Not responding\nTime: $down_since"
is_down=true
else
# STILL DOWN
now=$(date +%s)
down_start=$(date -d "$down_since" +%s 2>/dev/null || echo "$now")
duration=$((now - down_start))
log "✗ STILL DOWN: $URL (${duration}s)"
fi
fi
# Calculate uptime percentage
uptime_pct=$(echo "$total_checks $failed_checks" | \
awk '{printf "%.2f", (1 - $2/$1) * 100}')
log "Stats: $total_checks checks, uptime: ${uptime_pct}%"
sleep "$INTERVAL"
done
Task: Write validate_csv.sh that processes a CSV file:
- Input:
name,email,age,scoreformat - Validates:
- No empty fields
- Valid email format (regex)
- Age: 1-150
- Score: 0-100
- Outputs:
_clean.csvwith valid records_errors.csvwith invalid records and error descriptions
- Prints summary statistics
Show Solution
#!/usr/bin/env bash
set -euo pipefail
INPUT="${1:?Usage: $0 INPUT.csv}"
CLEAN="${INPUT%.csv}_clean.csv"
ERRORS="${INPUT%.csv}_errors.csv"
[[ -f "$INPUT" ]] || { echo "File not found: $INPUT" >&2; exit 1; }
echo "Validating: $INPUT"
echo "Output: $CLEAN (valid), $ERRORS (invalid)"
echo ""
# Write headers
head -1 "$INPUT" > "$CLEAN"
echo "line,field,error,original_value" > "$ERRORS"
line_num=0
clean_count=0
error_count=0
declare -A error_types=()
while IFS=, read -r name email age score; do
((line_num++))
[[ $line_num -eq 1 ]] && continue # Skip header
valid=true
errors_this_line=()
# Validate name (not empty)
if [[ -z "$name" ]]; then
errors_this_line+=("name:empty")
echo "$line_num,name,empty,\"\"" >> "$ERRORS"
((error_types[empty_name]++))
valid=false
fi
# Validate email (basic regex)
email_regex='^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if [[ ! "$email" =~ $email_regex ]]; then
errors_this_line+=("email:invalid")
echo "$line_num,email,invalid,\"$email\"" >> "$ERRORS"
((error_types[invalid_email]++))
valid=false
fi
# Validate age (1-150)
if ! [[ "$age" =~ ^[0-9]+$ ]] || (( age < 1 || age > 150 )); then
errors_this_line+=("age:out_of_range")
echo "$line_num,age,out_of_range,\"$age\"" >> "$ERRORS"
((error_types[invalid_age]++))
valid=false
fi
# Validate score (0-100)
if ! [[ "$score" =~ ^[0-9]+$ ]] || (( score < 0 || score > 100 )); then
errors_this_line+=("score:out_of_range")
echo "$line_num,score,out_of_range,\"$score\"" >> "$ERRORS"
((error_types[invalid_score]++))
valid=false
fi
# Write to appropriate file
if $valid; then
echo "$name,$email,$age,$score" >> "$CLEAN"
((clean_count++))
else
((error_count++))
fi
done < "$INPUT"
# Summary
echo "=========================================="
echo "Validation Complete"
echo "=========================================="
echo "Total records: $((line_num - 1))"
echo "Valid: $clean_count (→ $CLEAN)"
echo "Invalid: $error_count (→ $ERRORS)"
echo ""
if [[ $error_count -gt 0 ]]; then
echo "Error breakdown:"
for error_type in "${!error_types[@]}"; do
printf " %-20s %d\n" "$error_type:" "${error_types[$error_type]}"
done
fi
echo "=========================================="
Summary
You've seen 5 production-quality scripts demonstrating:
Design Patterns:
- Strict mode + error handling in every script
- Clear usage messages and help text
- Logging with timestamps
- Exit codes for automation
- Cleanup with
trap
Common Tasks:
- File management (normalize, organize)
- System monitoring (health checks, uptime)
- Data processing (log analysis, CSV validation)
- Backup automation (with rotation policies)
Key Techniques:
getoptsfor argument parsing- Associative arrays for categorization
awkfor complex text processing- Date arithmetic for rotation policies
- Colorized output for terminal vs log files
What's Next:
- Customize these scripts for your environment:
- Adjust thresholds in health monitor
- Add file categories to organizer
- Extend log analyzer for your log format
- Deploy to production:
- Install in
/usr/local/bin/ - Set up cron jobs
- Configure email alerts
- Install in
- Build your own:
- Start with one of these templates
- Add your specific logic
- Follow the same patterns (strict mode, logging, error handling)
You now have a complete toolkit for shell automation. Every problem—file management, monitoring, data processing, backups—follows these same patterns. Master them and you can automate anything.
Written by the ShellRAG Team
The ShellRAG editorial team writes practical, beginner-friendly Bash Shell tutorials with tested code examples and real-world use cases. Every article is technically reviewed for accuracy and updated regularly.
Learn more about us →