Summary: in this tutorial, you will learn master pipes for chaining commands, tee for splitting output, xargs for building commands from input, and command and process substitution.

Pipes, tee, xargs, and Command Substitution

Redirection controls where data goes, but pipes are what make the Unix philosophy come alive. By chaining simple commands together, you can build sophisticated data processing workflows without writing a single line of code. This tutorial covers pipes, tee, xargs, and the powerful command and process substitution features.

Pipes: The Power of Composition

Pipes (|) connect the stdout of one command to the stdin of the next, enabling command chaining without temporary files. This is the essence of Unix philosophy.

Basic Pipes

# Count files in a directory
ls | wc -l
 
# Find a running process
ps aux | grep "nginx"
 
# Sort and deduplicate
cat file.txt | sort | uniq
 
# Show the 10 largest files
ls -lS | head -10
 
# Count error messages in a log
grep "error" application.log | wc -l
 
# List directories only
ls -l | grep "^d"
 
# Show users currently logged in
who | awk '{print $1}' | sort -u

Chaining Multiple Commands

The real power comes from chaining many commands:

# Find the 10 most common words in a text file
cat book.txt | \
    tr '[:upper:]' '[:lower:]' | \
    tr -s '[:space:]' '\n' | \
    sort | \
    uniq -c | \
    sort -rn | \
    head -10
 
# Breakdown:
# 1. cat book.txt              → Read the file
# 2. tr '[:upper:]' '[:lower:]' → Convert to lowercase
# 3. tr -s '[:space:]' '\n'    → Replace spaces with newlines (one word per line)
# 4. sort                      → Sort alphabetically (required for uniq)
# 5. uniq -c                   → Count consecutive duplicates
# 6. sort -rn                  → Sort numerically in reverse (highest first)
# 7. head -10                  → Show top 10
 
# Find top 10 largest directories
du -h --max-depth=1 2>/dev/null | \
    grep -v '^[0-9.]*K' | \
    sort -hr | \
    head -10
 
# Monitor active network connections
ss -tunap | \
    grep ESTAB | \
    awk '{print $5}' | \
    cut -d: -f1 | \
    sort | \
    uniq -c | \
    sort -rn
 
# Extract unique email addresses from text
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt | \
    tr '[:upper:]' '[:lower:]' | \
    sort -u

ℹ️ Why pipes are powerful

Advantages of pipes over temporary files:

Traditional approach (creating temporary files):

# Inefficient: creates intermediate files on disk
command1 > temp1.txt
command2 < temp1.txt > temp2.txt
command3 < temp2.txt > final.txt
rm temp1.txt temp2.txt

Pipe approach (streaming through memory):

# Efficient: data flows through memory, no disk I/O
command1 | command2 | command3 > final.txt

Benefits:

Speed: No disk I/O for intermediate data
Simplicity: No cleanup of temporary files
Safety: No leftover files if script crashes
Concurrency: All commands run simultaneously (parallelism!)
Clarity: Data flow is obvious from left to right

Pipes Don't Pass stderr by Default

By default, pipes only pass stdout. Errors still go to the terminal:

# stderr is NOT piped
find / -name "*.txt" | grep "important"
# "Permission denied" errors appear on screen (not piped to grep)
 
# To pipe BOTH stdout and stderr:
find / -name "*.txt" 2>&1 | grep "important"
# Now grep searches both results AND error messages
 
# Shorthand (Bash 4+)
find / -name "*.txt" |& grep "important"
 
# Common pattern: Pipe stdout, discard stderr
find / -name "*.txt" 2>/dev/null | grep "important"
# Only successful results are piped to grep

Example: Clean vs messy output:

# Messy (errors mixed with output):
$ ls /etc /nonexistent | wc -l
ls: cannot access '/nonexistent': No such file or directory
42
 
# Clean (errors discarded):
$ ls /etc /nonexistent 2>/dev/null | wc -l
42
 
# Errors logged separately:
$ ls /etc /nonexistent 2> errors.log | wc -l
42
$ cat errors.log
ls: cannot access '/nonexistent': No such file or directory

Pipeline Exit Status

A pipeline returns the exit status of the last command:

# Pipeline succeeds if the last command succeeds
false | false | true
echo $?                          # 0 (success, because true succeeded)
 
false | false | false
echo $?                          # 1 (failure, because last command failed)
 
# Check if ANY command in pipeline failed (Bash 4+)
set -o pipefail                  # Enable pipefail option
 
false | true | true
echo $?                          # 1 (now pipeline fails because first command failed)
 
# Useful in scripts
set -e                           # Exit on any error
set -o pipefail                  # Exit if any pipeline command fails
 
command1 | command2 | command3
# Script exits if ANY command in the pipeline fails

tee: Split Output to Multiple Destinations

The tee command is like a T-junction in plumbing—it sends output to both a file and stdout (screen or next command):

# Basic usage: Save to file AND display
echo "Hello" | tee output.txt
# Shows "Hello" on screen AND saves to output.txt
 
# Append instead of overwrite
echo "Line 1" | tee output.txt
echo "Line 2" | tee -a output.txt    # -a appends
echo "Line 3" | tee -a output.txt
 
# Save intermediate results in a pipeline
cat data.txt | \
    sort | \
    tee sorted.txt | \
    uniq | \
    tee unique.txt | \
    wc -l
# Creates sorted.txt and unique.txt while showing final count
 
# Write to multiple files simultaneously
echo "Broadcast message" | tee file1.txt file2.txt file3.txt
 
# Log command output while still seeing it
./deploy.sh | tee deployment.log
# Watch output in real-time, and it's saved to deployment.log
 
# Capture stderr as well
./script.sh 2>&1 | tee full_log.txt
./script.sh |& tee full_log.txt      # Shorthand (Bash 4+)
 
# Append to log while displaying
./long_running_process.sh | tee -a process.log
 
# Silent tee (save to file, don't display)
command | tee output.txt > /dev/null

Practical tee patterns:

# Build log with real-time monitoring
{
    echo "=== Deployment started at $(date) ==="
    ./pre_deploy.sh
    ./deploy.sh
    ./post_deploy.sh
    echo "=== Deployment completed at $(date) ==="
} | tee deployment_$(date +%Y%m%d_%H%M%S).log
 
# Debug a pipeline (inspect intermediate results)
cat data.csv | \
    grep "active" | \
    tee debug_filtered.txt | \
    cut -d, -f1,3 | \
    tee debug_columns.txt | \
    sort | \
    tee debug_sorted.txt | \
    uniq
 
# Send notification and log
echo "Backup completed" | tee -a backup.log | mail -s "Backup Status" admin@example.com
 
# Parallel processing with tee and process substitution
cat data.txt | tee >(process_a > output_a.txt) >(process_b > output_b.txt) > /dev/null

xargs: Build Commands from Input

xargs reads items from stdin and executes a command with those items as arguments. It bridges commands that produce output to commands that need arguments.

# Basic usage: delete all .tmp files
find . -name "*.tmp" | xargs rm
# find produces: ./file1.tmp ./file2.tmp ./file3.tmp
# xargs runs: rm ./file1.tmp ./file2.tmp ./file3.tmp
 
# Count lines in all .txt files
find . -name "*.txt" | xargs wc -l
 
# Search for pattern in multiple files
find . -name "*.log" | xargs grep "error"
 
# Handle filenames with spaces (CRITICAL!)
find . -name "*.txt" -print0 | xargs -0 cat
# -print0: find separates results with null character (\0)
# -0: xargs expects null-separated input
 
# Limit arguments per command execution
echo "a b c d e f g h" | xargs -n 2 echo
# Output:
# a b
# c d
# e f
# g h
 
# Custom placeholder (use {} for filename)
find . -name "*.jpg" | xargs -I{} cp {} /backup/photos/
# Runs: cp ./photo1.jpg /backup/photos/
#       cp ./photo2.jpg /backup/photos/
# etc.
 
# Parallel execution (4 commands at a time)
find . -name "*.png" | xargs -P 4 -I{} convert {} {}.webp
# Converts 4 images simultaneously
 
# Confirm before each execution
echo "file1.txt file2.txt file3.txt" | xargs -p rm
# Prompts: rm file1.txt file2.txt file3.txt?... (y/n)
 
# Pass arguments to multiple commands
find . -name "*.txt" | xargs -I% sh -c 'wc -l %; grep "error" %'
 
# Replace specific part of command
ls *.txt | xargs -I% mv % %.backup
 
# Maximum characters per command
find . -name "*.log" | xargs -s 1000 rm
# Ensures command line doesn't exceed 1000 characters

Practical xargs examples:

# Batch file operations
find . -name "*.bak" -print0 | xargs -0 rm
find . -name "*.log" -mtime +30 -print0 | xargs -0 gzip
 
# Parallel processing
cat urls.txt | xargs -P 10 -n 1 curl -O
# Download 10 URLs simultaneously
 
# Complex transformations
find . -name "*.jpg" -print0 | xargs -0 -P 4 -I{} sh -c '
    mkdir -p thumbnails
    convert "{}" -resize 200x200 "thumbnails/$(basename "{}")"
'
 
# Check multiple servers
cat servers.txt | xargs -I{} sh -c 'echo "Checking {}"; ping -c 1 {}'
 
# Kill processes by pattern
ps aux | grep "zombie_process" | awk '{print $2}' | xargs kill -9
 
# Archive multiple directories
find . -maxdepth 1 -type d | grep "project_" | xargs -I{} tar -czf {}.tar.gz {}

⚠️ xargs and filenames with spaces

WRONG (breaks with spaces):

find . -name "*.txt" | xargs cat
# Fails if any filename contains spaces

RIGHT (handles any filename):

find . -name "*.txt" -print0 | xargs -0 cat
# -print0: null-separated output from find
# -0: null-separated input to xargs

The null character (\0) never appears in filenames, making it the only safe delimiter. Always use -print0 with find and -0 with xargs when filenames might contain spaces, quotes, or other special characters.

Command Substitution

Command substitution captures a command's output and uses it as part of another command or variable assignment:

# Modern syntax: $(command)
current_date=$(date)
echo "Today is $current_date"
 
# Old syntax: `command` (avoid—harder to read and nest)
current_date=`date`
 
# Inline usage
echo "There are $(ls | wc -l) files here"
echo "You are $(whoami) on $(hostname)"
echo "Current directory: $(pwd)"
 
# Save command output to variables
files_count=$(find . -name "*.py" | wc -l)
disk_usage=$(df -h / | awk 'NR==2{print $5}')
memory_free=$(free -m | awk 'NR==2{print $4}')
load_average=$(uptime | awk -F'load average:' '{print $2}')
 
echo "Python files: $files_count"
echo "Disk usage: $disk_usage"
echo "Free memory: ${memory_free}MB"
echo "Load: $load_average"
 
# Nested command substitution
echo "nginx is in $(dirname $(which nginx))"
backup_dir="backup_$(date +%Y%m%d)_$(hostname)"
 
# Use in conditionals
if [ $(whoami) = "root" ]; then
    echo "Running as root"
fi
 
if [ $(df / | awk 'NR==2{print $5}' | tr -d '%') -gt 90 ]; then
    echo "Disk usage critical!"
fi
 
# Arithmetic with command substitution
total_size=$(du -sb /var/log | awk '{print $1}')
size_gb=$((total_size / 1024 / 1024 / 1024))
echo "Logs use ${size_gb}GB"

Command substitution in loops:

# Process each line of command output
for user in $(cut -d: -f1 /etc/passwd); do
    echo "User: $user"
done
 
# Better: Use while read loop (handles spaces correctly)
while read -r user; do
    echo "User: $user"
done < <(cut -d: -f1 /etc/passwd)
 
# Create dated backups
tar -czf "backup_$(date +%Y%m%d_%H%M%S).tar.gz" /data
 
# Generate filenames
mv file.txt "file_$(date +%Y%m%d).txt"
cp config.yml "config.yml.backup_$(date +%Y%m%d)"

Process Substitution

Process substitution makes a command's output (or input) appear as a filename. The syntax <(command) creates a temporary file descriptor that other commands can read from.

# Compare outputs of two commands
diff <(ls dir1/) <(ls dir2/)
# Creates temporary "files" containing ls output for diff to compare
 
# Compare sorted versions of two files
diff <(sort file1.txt) <(sort file2.txt)
# Sort happens on-the-fly, no temporary files created
 
# Use command output where filename is expected
wc -l <(ps aux)
# ps aux output appears as a file to wc
 
# Multiple process substitutions
paste <(cut -f1 data.csv) <(cut -f3 data.csv)
# Combines columns 1 and 3 from data.csv
 
# Use with while loop (avoids subshell)
while read -r line; do
    echo "Processing: $line"
    ((count++))
done < <(find . -name "*.txt")
echo "Processed $count files"
# count is available after loop (no subshell created)
 
# Output process substitution: >(command)
# Send data to multiple commands
echo "data" | tee >(grep "pattern" > matches.txt) >(wc -l > count.txt)
 
# Compare command outputs with beyond compare
bcomp <(cat file1.txt) <(cat file2.txt)
 
# Join data from different sources
join <(sort users.txt) <(sort groups.txt)

Process substitution vs pipes:

# Pipes: Linear flow (A → B → C)
command1 | command2 | command3
 
# Process substitution: Multiple inputs/outputs
diff <(command1) <(command2)           # Two inputs to diff
tee >(command1) >(command2) > output   # Send to two commands + file
 
# Pipes create subshells (variables don't propagate)
count=0
echo "test" | while read -r line; do
    ((count++))
done
echo $count                            # 0 (variable lost!)
 
# Process substitution avoids subshell
count=0
while read -r line; do
    ((count++))
done < <(echo "test")
echo $count                            # 1 (variable preserved!)

Practical Pipeline Examples

Log Analysis

# Top 10 IP addresses hitting your server
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10
 
# HTTP status code distribution
awk '{print $9}' access.log | sort | uniq -c | sort -rn
 
# Find failed SSH login attempts with counts
grep "Failed password" /var/log/auth.log | \
    awk '{print $(NF-3)}' | \
    sort | uniq -c | sort -rn | head -10
 
# Extract and count error messages
grep "ERROR" application.log | \
    awk -F'ERROR:' '{print $2}' | \
    sort | uniq -c | sort -rn | head -20
 
# Hourly error distribution
grep "ERROR" app.log | \
    awk '{print $1, $2}' | \
    cut -d: -f1,2 | \
    sort | uniq -c | \
    sort -rn
 
# Find slowest requests (parse request time from logs)
awk '$NF > 1 {print $NF, $7}' access.log | \
    sort -rn | head -20

System Monitoring

# Top 10 memory-consuming processes
ps aux --sort=-%mem | head -11
 
# Top 10 CPU-consuming processes
ps aux --sort=-%cpu | head -11
 
# Disk usage by directory (top 10)
sudo du -h --max-depth=1 / 2>/dev/null | sort -hr | head -10
 
# Network connections by state
ss -tan | awk 'NR>1 {print $1}' | sort | uniq -c | sort -rn
 
# Real-time error monitoring
tail -f /var/log/syslog | grep --line-buffered "error" --color=auto
 
# List open files by a process
lsof -p $(pgrep firefox) | wc -l
 
# Find largest 20 files on system
sudo find / -type f -exec du -h {} + 2>/dev/null | sort -rh | head -20

Data Processing

# Calculate column average
awk '{sum+=$1; count++} END {print sum/count}' numbers.txt
 
# Sum a column
awk -F, '{sum+=$3} END {print "Total:", sum}' sales.csv
 
# Extract email addresses and deduplicate
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt | \
    tr '[:upper:]' '[:lower:]' | \
    sort -u
 
# Convert CSV to formatted table
column -t -s, data.csv
 
# Remove duplicate lines (preserve order, unlike sort | uniq)
awk '!seen[$0]++' file.txt
 
# Calculate percentile
sort -n data.txt | \
    awk '{a[NR]=$1} END {print "Median:", a[int(NR/2)]; print "90th:", a[int(NR*0.9)]}'
 
# Join two CSV files on common column
join -t, <(sort -t, -k1 file1.csv) <(sort -t, -k1 file2.csv)

Exercises

🏋️ Exercise 1: Redirection Mastery

Task 1: Create a file called system_info.txt containing:

Current date
Your username
Your hostname
System uptime

Each on its own line, all in one command.

Show Solution

{ date; whoami; hostname; uptime; } > system_info.txt
 
# Or separately:
date > system_info.txt
whoami >> system_info.txt
hostname >> system_info.txt
uptime >> system_info.txt
 
# Verify:
cat system_info.txt

Task 2: Run ls /etc /nonexistent_directory and:

Redirect normal output to files.txt
Redirect errors to errors.txt
Verify both files contain the expected content

Show Solution

ls /etc /nonexistent_directory > files.txt 2> errors.txt
 
# Check normal output:
head files.txt
# Should show /etc contents
 
# Check errors:
cat errors.txt
# Should show: ls: cannot access '/nonexistent_directory': No such file or directory

Task 3: Use a heredoc to create a configuration file app.conf with your actual hostname, username, and current date substituted:


hostname: <actual hostname>
user: <actual user>
date: <actual date>
environment: production

Show Solution

cat > app.conf << EOF
hostname: $(hostname)
user: $(whoami)
date: $(date)
environment: production
EOF
 
# Verify:
cat app.conf

🏋️ Exercise 2: Pipeline Challenge

Q1: Write a pipeline to count the number of unique login shells listed in /etc/shells (ignore comments and blank lines).

Show Solution

grep -v '^#' /etc/shells | grep -v '^$' | sort -u | wc -l
 
# Or more concisely:
grep '^/' /etc/shells | sort -u | wc -l

Q2: Find the 5 largest files in your home directory and display them with human-readable sizes, sorted largest first.

Show Solution

find ~ -type f -exec ls -lh {} \; 2>/dev/null | sort -k5 -hr | head -5
 
# Alternative using du:
find ~ -type f -exec du -h {} + 2>/dev/null | sort -hr | head -5

Q3: Extract all usernames from /etc/passwd, sort them alphabetically, and save to users.txt while also displaying on screen.

Show Solution

cut -d: -f1 /etc/passwd | sort | tee users.txt
 
# Verify:
wc -l users.txt
head users.txt

Q4: Find the 10 most frequent words in a text file, showing count and word.

Show Solution

cat book.txt | \
    tr '[:upper:]' '[:lower:]' | \
    tr -cs '[:alpha:]' '\n' | \
    sort | \
    uniq -c | \
    sort -rn | \
    head -10
 
# Breakdown:
# tr '[:upper:]' '[:lower:]'  → lowercase
# tr -cs '[:alpha:]' '\n'     → non-letters to newlines, squeeze repeats
# sort                        → alphabetical sort
# uniq -c                     → count occurrences
# sort -rn                    → numeric sort, reverse (highest first)
# head -10                    → top 10

🏋️ Exercise 3: Advanced Pipelines

Task 1: Find all .log files modified in the last 7 days, then count how many contain the word "ERROR".

Show Solution

find . -name "*.log" -mtime -7 -exec grep -l "ERROR" {} \; | wc -l
 
# Or with xargs:
find . -name "*.log" -mtime -7 -print0 | \
    xargs -0 grep -l "ERROR" | \
    wc -l

Task 2: Create a pipeline that shows the top 5 largest directories under /var, suppressing permission errors.

Show Solution

sudo du -h --max-depth=1 /var 2>/dev/null | sort -hr | head -6
 
# Note: head -6 because first line is total for /var
# Or exclude total:
sudo du -h --max-depth=1 /var 2>/dev/null | grep -v '^.*\s/var$' | sort -hr | head -5

Task 3: Monitor the last 20 lines of /var/log/syslog in real-time, but only show lines containing "error" (case-insensitive).

Show Solution

tail -n 20 -f /var/log/syslog | grep -i --line-buffered "error"
 
# --line-buffered ensures grep outputs immediately (not buffered)
# -i makes search case-insensitive
# Press Ctrl+C to stop

Summary

Pipes and redirection are the foundation of Unix command-line power:

Standard Streams:

stdin (0): Input to command (keyboard by default)
stdout (1): Normal output (screen by default)
stderr (2): Error messages (screen by default)

Redirection Operators:

>: Write stdout to file (overwrite)
>>: Append stdout to file
2>: Write stderr to file
2>&1: Redirect stderr to wherever stdout goes
&>: Redirect both stdout and stderr (Bash 4+)
<: Read stdin from file
<<EOF: Here document (multi-line input)
<<<: Here string (single string as input)

Pipes and Tools:

|: Connect stdout of one command to stdin of next
|&: Pipe both stdout and stderr (Bash 4+)
tee: Split output to file and stdout
xargs: Build commands from stdin input
<(command): Process substitution (command output as file)
/dev/null: Discard data (the bit bucket)

Best Practices:

Quote variables when redirecting: cat "$file" > output.txt
Suppress errors with 2>/dev/null for cleaner output
Use tee to log while monitoring real-time output
Use -print0 and -0 with find/xargs for filenames with spaces
Use set -o pipefail to catch errors in pipelines
Chain commands with pipes instead of temporary files

Golden Rule: Build complex solutions from simple, composable commands. Each command does one thing well, and pipes connect them into powerful data processing pipelines.

In the next tutorial, you'll learn about scripting basics—how to combine these concepts into reusable, automated scripts with variables, conditionals, loops, and functions.

Pipes, tee, xargs, and Command Substitution

Written by the ShellRAG Team