Working with Data

Summary: in this tutorial, you will learn import, transform, and export data in powershell. master csv, json, xml, excel, databases, rest apis, and build data processing pipelines.

Working with Data

PowerShell excels at data manipulation. Unlike traditional shells that deal with text, PowerShell works with structured objects, making it natural to transform data between formats — CSV, JSON, XML, databases, and web APIs — without brittle text parsing.

Whether you're importing user data from CSV, consuming REST APIs, querying databases, or generating reports, PowerShell provides cmdlets specifically designed for each format. This chapter teaches you to build data processing pipelines that read, transform, and export data efficiently.

Why PowerShell for Data Processing

Traditional data processing involves:

  • Parsing text with awk, sed, regex
  • Dealing with inconsistent formats, edge cases
  • Writing fragile scripts that break when data changes

PowerShell's approach:

  • Format-aware cmdlets: Import-Csv, ConvertFrom-Json, Import-Clixml understand structure
  • Object pipeline: Transform data as structured objects, not text
  • Type-safe operations: Properties have types, preventing common errors
  • Seamless format conversion: JSON → CSV → Objects → XML with simple cmdlets

CSV Data: The Universal Format

CSV (Comma-Separated Values) is ubiquitous for data exchange. PowerShell makes CSV processing trivial.

Importing CSV Files

# Import CSV (auto-detects headers from first line)
$employees = Import-Csv "employees.csv"
 
# Access data
$employees | Format-Table
$employees[0].Name           # Access properties by header name
$employees[0].Department
$employees[0].Salary
 
# CSV with different delimiter
$data = Import-Csv "data.tsv" -Delimiter "`t"     # Tab-separated
$data = Import-Csv "euro.csv" -Delimiter ";"       # Semicolon-separated
 
# CSV without headers (provide your own)
$data = Import-Csv "data.csv" -Header "ID", "Name", "Email", "Department"
 
# Import from string (useful for testing)
$csvText = @"
Name,Age,City
Alice,30,Seattle
Bob,25,Portland
Charlie,35,Austin
"@
$data = $csvText | ConvertFrom-Csv
 

How it works: Import-Csv reads the first line as headers, converts each subsequent line to a PSCustomObject with properties matching the headers. You get an array of objects, not raw text.

Processing CSV Data

$employees = Import-Csv "employees.csv"
 
# Filter
$engineers = $employees | Where-Object { $_.Department -eq "Engineering" }
$highEarners = $employees | Where-Object { [double]$_.Salary -gt 100000 }
 
# Sort
$byName = $employees | Sort-Object Name
$bySalary = $employees | Sort-Object { [double]$_.Salary } -Descending
 
# Select specific columns
$employees | Select-Object Name, Department, Email | Format-Table
 
# Add calculated columns
$employees | Select-Object Name, Salary,
    @{Name='Annual'; Expression={[double]$_.Salary * 12}},
    @{Name='Bonus'; Expression={[double]$_.Salary * 0.10}} |
    Format-Table -AutoSize
 
# Group and summarize
$summary = $employees | Group-Object Department | ForEach-Object {
    $salaries = $_.Group.Salary | ForEach-Object { [double]$_ }
    [PSCustomObject]@{
        Department    = $_.Name
        EmployeeCount = $_.Count
        AvgSalary     = [math]::Round(($salaries | Measure-Object -Average).Average, 2)
        TotalSalary   = [math]::Round(($salaries | Measure-Object -Sum).Sum, 2)
    }
}
$summary | Format-Table -AutoSize
 
# Transform data
$processed = $employees | ForEach-Object {
    [PSCustomObject]@{
        FullName   = $_.Name
        Email      = $_.Email.ToLower()
        Dept       = $_.Department
        YearlySal  = [double]$_.Salary * 12
        HireYear   = [datetime]::Parse($_.HireDate).Year
    }
}
 

Exporting CSV

# Export to CSV
$employees | Export-Csv "output.csv" -NoTypeInformation
 
# Export selected columns only
$employees | Select-Object Name, Department, Email |
    Export-Csv "contact_list.csv" -NoTypeInformation
 
# Append to existing CSV
$newEmployees | Export-Csv "employees.csv" -Append -NoTypeInformation
 
# Export with different delimiter
$data | Export-Csv "output.tsv" -Delimiter "`t" -NoTypeInformation
 
# Convert to CSV string (not a file)
$csvString = $employees | ConvertTo-Csv -NoTypeInformation
$csvString | Out-File "data.csv"
 

Always use -NoTypeInformation

Without it, PowerShell adds a #TYPE line at the top of CSV files, which breaks compatibility with Excel and other tools. In PowerShell 7+, -NoTypeInformation is the default.

JSON Data: Web APIs and Configuration

JSON is the standard format for web APIs and modern configuration files. PowerShell provides seamless JSON handling.

Reading JSON

# From file
$config = Get-Content "config.json" -Raw | ConvertFrom-Json
 
# Access properties
$config.database.host
$config.database.port
$config.features.enabled
 
# From string
$jsonText = @"
{
    "name": "MyApp",
    "version": "1.0.0",
    "database": {
        "host": "localhost",
        "port": 5432
    }
}
"@
$config = $jsonText | ConvertFrom-Json
 
# From web API
$repos = Invoke-RestMethod "https://api.github.com/users/powershell/repos"
$repos | Select-Object name, stargazers_count, language | Format-Table
 

Why -Raw? ConvertFrom-Json expects a single string, not an array of lines. -Raw reads the file as one string.

Modifying JSON Data

# Read JSON
$config = Get-Content "config.json" -Raw | ConvertFrom-Json
 
# Modify properties
$config.database.host = "prod-server.example.com"
$config.database.port = 5433
$config.lastModified = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
 
# Add new properties
$config | Add-Member -NotePropertyName "environment" -NotePropertyValue "production"
 
# Save back to file
$config | ConvertTo-Json -Depth 10 | Out-File "config.json"
 

-Depth parameter: JSON can be nested deeply. -Depth controls how many levels to serialize (default is 2, which truncates deep objects). Use higher values for complex data.

Working with JSON Arrays

# Array of objects
$users = @"
[
    {"name": "Alice", "age": 30, "active": true},
    {"name": "Bob", "age": 25, "active": false},
    {"name": "Charlie", "age": 35, "active": true}
]
"@ | ConvertFrom-Json
 
# Filter
$activeUsers = $users | Where-Object { $_.active }
 
# Transform
$userSummary = $users | ForEach-Object {
    [PSCustomObject]@{
        Name = $_.name.ToUpper()
        Age = $_.age
        Status = if ($_.active) { "Active" } else { "Inactive" }
    }
}
 
# Export as JSON
$userSummary | ConvertTo-Json | Out-File "summary.json"
 

Pretty-Printing JSON

# Format JSON readably
$data | ConvertTo-Json -Depth 10 | Out-File "output.json"
 
# Compress JSON (no whitespace)
$data | ConvertTo-Json -Compress -Depth 10 | Out-File "compact.json"
 

XML Data: Configuration and Legacy Systems

XML is common in enterprise systems and configuration files. PowerShell provides multiple ways to work with XML.

Reading XML

# Read XML file
[xml]$config = Get-Content "config.xml"
 
# Access elements
$config.configuration.appSettings.add | ForEach-Object {
    "$($_.key) = $($_.value)"
}
 
# Navigate with XPath
$nodes = $config.SelectNodes("//add[@key='DatabaseConnection']")
$nodes[0].value
 
# From string
$xmlText = @"
<users>
    <user id="1">
        <name>Alice</name>
        <email>alice@example.com</email>
    </user>
    <user id="2">
        <name>Bob</name>
        <email>bob@example.com</email>
    </user>
</users>
"@
[xml]$users = $xmlText
 
# Access nested elements
$users.users.user | ForEach-Object {
    [PSCustomObject]@{
        ID = $_.id
        Name = $_.name
        Email = $_.email
    }
} | Format-Table
 

Modifying XML

[xml]$config = Get-Content "config.xml"
 
# Modify element
$node = $config.SelectSingleNode("//add[@key='Port']")
$node.value = "8080"
 
# Add new element
$newNode = $config.CreateElement("add")
$newNode.SetAttribute("key", "LogLevel")
$newNode.SetAttribute("value", "Debug")
$config.configuration.appSettings.AppendChild($newNode)
 
# Save
$config.Save("config.xml")
 

Using Select-Xml

# Search XML with XPath
$results = Select-Xml -Path "data.xml" -XPath "//user[@active='true']"
 
$results | ForEach-Object {
    $node = $_.Node
    [PSCustomObject]@{
        Name = $node.name
        Email = $node.email
        Department = $node.department
    }
} | Format-Table
 

PowerShell's Native Format: CliXml

Export-Clixml and Import-Clixml preserve PowerShell objects perfectly, including types, properties, and complex structures.

# Export objects
$processes = Get-Process | Select-Object Name, CPU, WorkingSet64
$processes | Export-Clixml "processes.xml"
 
# Import later (objects are restored exactly)
$loaded = Import-Clixml "processes.xml"
$loaded | Format-Table
 
# Works with complex objects
$data = @{
    Timestamp = Get-Date
    Servers = @("Server1", "Server2", "Server3")
    Config = @{
        Port = 8080
        Enabled = $true
    }
}
$data | Export-Clixml "state.xml"
$restored = Import-Clixml "state.xml"
 

When to use CliXml:

  • Saving PowerShell state between sessions
  • Caching complex query results
  • Preserving object types and methods
  • Not for interoperability (use JSON/CSV for that)

Working with REST APIs

Modern data often comes from web APIs. PowerShell makes API consumption simple.

GET Requests

# Simple GET
$repos = Invoke-RestMethod "https://api.github.com/users/powershell/repos"
$repos | Select-Object name, description, stargazers_count | Format-Table
 
# With headers
$headers = @{
    "Authorization" = "Bearer YOUR_TOKEN"
    "Accept" = "application/json"
}
$data = Invoke-RestMethod "https://api.example.com/data" -Headers $headers
 
# With query parameters
$params = @{
    Uri = "https://api.example.com/search"
    Method = "GET"
    Body = @{
        q = "powershell"
        per_page = 10
        sort = "stars"
    }
}
$results = Invoke-RestMethod @params
 

POST Requests

# POST JSON data
$body = @{
    name = "John Doe"
    email = "john@example.com"
    role = "Admin"
} | ConvertTo-Json
 
$response = Invoke-RestMethod `
    -Uri "https://api.example.com/users" `
    -Method POST `
    -Headers @{"Content-Type" = "application/json"} `
    -Body $body
 
# POST form data
$formData = @{
    username = "alice"
    password = "secret123"
}
$response = Invoke-RestMethod `
    -Uri "https://api.example.com/login" `
    -Method POST `
    -Body $formData
 

Handling API Responses

try {
    $data = Invoke-RestMethod "https://api.example.com/data" -ErrorAction Stop
 
    # Process response
    $data.items | ForEach-Object {
        [PSCustomObject]@{
            ID = $_.id
            Name = $_.name
            Status = $_.status
        }
    }
}
catch {
    $statusCode = $_.Exception.Response.StatusCode.value__
    Write-Error "API request failed with status $statusCode: $($_.Exception.Message)"
}
 

Pagination

function Get-PaginatedData {
    param(
        [string]$BaseUri,
        [int]$PageSize = 100
    )
 
    $page = 1
    $allData = @()
 
    do {
        $uri = "${BaseUri}?page=$page&per_page=$PageSize"
        $response = Invoke-RestMethod $uri
 
        $allData += $response
        $page++
 
        Write-Progress -Activity "Fetching data" -Status "Page $page"
    } while ($response.Count -eq $PageSize)
 
    Write-Progress -Activity "Fetching data" -Completed
    return $allData
}
 
$allRepos = Get-PaginatedData -BaseUri "https://api.github.com/orgs/microsoft/repos"
 

Database Queries

PowerShell can query databases directly using .NET SQL classes.

SQL Server

# Connection
$connectionString = "Server=localhost;Database=MyDB;Integrated Security=True;"
$connection = New-Object System.Data.SqlClient.SqlConnection($connectionString)
 
try {
    $connection.Open()
 
    # Query
    $command = $connection.CreateCommand()
    $command.CommandText = "SELECT * FROM Users WHERE Active = 1"
 
    $adapter = New-Object System.Data.SqlClient.SqlDataAdapter($command)
    $dataset = New-Object System.Data.DataSet
    $adapter.Fill($dataset)
 
    # Process results
    $dataset.Tables[0] | ForEach-Object {
        [PSCustomObject]@{
            UserID = $_.UserID
            UserName = $_.UserName
            Email = $_.Email
        }
    } | Format-Table
}
finally {
    $connection.Close()
}
 

Parameterized Queries (Prevent SQL Injection)

$command.CommandText = "SELECT * FROM Users WHERE Department = @dept"
$command.Parameters.AddWithValue("@dept", "Engineering")
 

Using Invoke-Sqlcmd (SQL Server Module)

# Install module
Install-Module -Name SqlServer
 
# Query
$results = Invoke-Sqlcmd -ServerInstance "localhost" -Database "MyDB" -Query "SELECT * FROM Users"
$results | Format-Table
 
# From file
Invoke-Sqlcmd -ServerInstance "localhost" -Database "MyDB" -InputFile "query.sql"
 

Data Transformation Patterns

Merging Data Sources

# Load from multiple sources
$csvUsers = Import-Csv "users.csv"
$jsonConfig = Get-Content "config.json" -Raw | ConvertFrom-Json
 
# Merge
$merged = $csvUsers | ForEach-Object {
    $user = $_
    $config = $jsonConfig.users | Where-Object { $_.id -eq $user.ID }
 
    [PSCustomObject]@{
        Name = $user.Name
        Email = $user.Email
        Department = $user.Department
        Preferences = $config.preferences
        Theme = $config.theme
    }
}
 

Data Enrichment

# Enrich CSV with API data
$users = Import-Csv "users.csv"
 
$enriched = $users | ForEach-Object {
    $githubData = Invoke-RestMethod "https://api.github.com/users/$($_.GithubUsername)"
 
    [PSCustomObject]@{
        Name = $_.Name
        Email = $_.Email
        GithubRepos = $githubData.public_repos
        GithubFollowers = $githubData.followers
        GithubBio = $githubData.bio
    }
}
 
$enriched | Export-Csv "enriched_users.csv" -NoTypeInformation
 

Aggregation and Reporting

# Load sales data
$sales = Import-Csv "sales.csv"
 
# Aggregate by region
$report = $sales | Group-Object Region | ForEach-Object {
    $total = ($_.Group.Amount | ForEach-Object { [double]$_ } | Measure-Object -Sum).Sum
    $avg = ($_.Group.Amount | ForEach-Object { [double]$_ } | Measure-Object -Average).Average
 
    [PSCustomObject]@{
        Region = $_.Name
        TotalSales = [math]::Round($total, 2)
        AvgSale = [math]::Round($avg, 2)
        Orders = $_.Count
    }
} | Sort-Object TotalSales -Descending
 
$report | Format-Table -AutoSize
$report | Export-Csv "sales_report.csv" -NoTypeInformation
 

Exercises

🏋️ Exercise 1: CSV Data Pipeline

Create a script that:

  1. Imports employee CSV with columns: Name, Email, Department, Salary, HireDate
  2. Filters employees hired in last year
  3. Groups by department
  4. Calculates average salary per department
  5. Exports summary to JSON
Show Solution
# Import data
$employees = Import-Csv "employees.csv"
 
# Filter recent hires
$oneYearAgo = (Get-Date).AddYears(-1)
$recentHires = $employees | Where-Object {
    [datetime]::Parse($_.HireDate) -gt $oneYearAgo
}
 
Write-Host "Found $($recentHires.Count) employees hired in the last year"
 
# Group by department and calculate stats
$summary = $recentHires | Group-Object Department | ForEach-Object {
    $salaries = $_.Group.Salary | ForEach-Object { [double]$_ }
 
    [PSCustomObject]@{
        Department = $_.Name
        EmployeeCount = $_.Count
        AverageSalary = [math]::Round(($salaries | Measure-Object -Average).Average, 2)
        MinSalary = [math]::Round(($salaries | Measure-Object -Minimum).Minimum, 2)
        MaxSalary = [math]::Round(($salaries | Measure-Object -Maximum).Maximum, 2)
        TotalSalaryExpense = [math]::Round(($salaries | Measure-Object -Sum).Sum, 2)
    }
} | Sort-Object AverageSalary -Descending
 
# Display
$summary | Format-Table -AutoSize
 
# Export to JSON
$summary | ConvertTo-Json -Depth 10 | Out-File "department_summary.json"
Write-Host "Summary exported to department_summary.json" -ForegroundColor Green
 
🏋️ Exercise 2: API Data Collector

Create a function that:

  1. Fetches GitHub user data via API
  2. Gets their repositories
  3. Calculates total stars across all repos
  4. Exports to CSV
Show Solution
function Get-GitHubUserStats {
    [CmdletBinding()]
    param(
        [Parameter(Mandatory, ValueFromPipeline)]
        [string[]]$UserName
    )
 
    begin {
        $results = @()
    }
 
    process {
        foreach ($user in $UserName) {
            Write-Verbose "Fetching data for $user..."
 
            try {
                # Get user info
                $userInfo = Invoke-RestMethod "https://api.github.com/users/$user" -ErrorAction Stop
 
                # Get repositories
                $repos = Invoke-RestMethod "https://api.github.com/users/$user/repos?per_page=100" -ErrorAction Stop
 
                # Calculate stats
                $totalStars = ($repos.stargazers_count | Measure-Object -Sum).Sum
                $totalRepos = $repos.Count
                $languages = $repos.language | Where-Object { $_ } | Group-Object | Sort-Object Count -Descending
 
                $results += [PSCustomObject]@{
                    Username = $user
                    Name = $userInfo.name
                    PublicRepos = $totalRepos
                    TotalStars = $totalStars
                    Followers = $userInfo.followers
                    Following = $userInfo.following
                    TopLanguage = if ($languages) { $languages[0].Name } else { "N/A" }
                    ProfileUrl = $userInfo.html_url
                }
 
                Write-Host "✓ Processed $user" -ForegroundColor Green
            }
            catch {
                Write-Warning "Failed to fetch data for $user: $($_.Exception.Message)"
            }
        }
    }
 
    end {
        return $results
    }
}
 
# Usage
$users = @("powershell", "microsoft", "github")
$stats = Get-GitHubUserStats -UserName $users -Verbose
 
$stats | Format-Table -AutoSize
$stats | Export-Csv "github_stats.csv" -NoTypeInformation
 
Write-Host "Stats exported to github_stats.csv" -ForegroundColor Green
 
🏋️ Exercise 3: Multi-Format Data Converter

Create a tool that converts between CSV, JSON, and XML:

  • Reads one format
  • Converts to another
  • Handles nested data
Show Solution
function Convert-DataFormat {
    [CmdletBinding()]
    param(
        [Parameter(Mandatory)]
        [ValidateScript({ Test-Path $_ })]
        [string]$InputFile,
 
        [Parameter(Mandatory)]
        [ValidateSet("CSV", "JSON", "XML")]
        [string]$OutputFormat,
 
        [Parameter(Mandatory)]
        [string]$OutputFile
    )
 
    # Detect input format
    $extension = [System.IO.Path]::GetExtension($InputFile).ToLower()
 
    Write-Host "Reading $extension file: $InputFile" -ForegroundColor Cyan
 
    # Import based on format
    $data = switch ($extension) {
        ".csv" {
            Import-Csv $InputFile
        }
        ".json" {
            Get-Content $InputFile -Raw | ConvertFrom-Json
        }
        ".xml" {
            [xml]$xml = Get-Content $InputFile
            # Convert XML to objects (simplified)
            $xml.DocumentElement.ChildNodes | ForEach-Object {
                $obj = [PSCustomObject]@{}
                $_.Attributes | ForEach-Object {
                    $obj | Add-Member -NotePropertyName $_.Name -NotePropertyValue $_.Value
                }
                $obj
            }
        }
        default {
            throw "Unsupported input format: $extension"
        }
    }
 
    Write-Host "Converting to $OutputFormat..." -ForegroundColor Yellow
 
    # Export to target format
    switch ($OutputFormat) {
        "CSV" {
            $data | Export-Csv $OutputFile -NoTypeInformation
        }
        "JSON" {
            $data | ConvertTo-Json -Depth 10 | Out-File $OutputFile
        }
        "XML" {
            $data | Export-Clixml $OutputFile
        }
    }
 
    Write-Host "✓ Converted successfully to: $OutputFile" -ForegroundColor Green
 
    # Show sample
    Write-Host "`nSample output:" -ForegroundColor Cyan
    $data | Select-Object -First 3 | Format-List
}
 
# Test
Convert-DataFormat -InputFile "employees.csv" -OutputFormat "JSON" -OutputFile "employees.json"
Convert-DataFormat -InputFile "config.json" -OutputFormat "CSV" -OutputFile "config.csv"
 

Summary

PowerShell's data handling capabilities make it ideal for ETL (Extract, Transform, Load) workflows:

  • CSV: Import-Csv, Export-Csv for tabular data
  • JSON: ConvertFrom-Json, ConvertTo-Json for web APIs and config
  • XML: Native [xml] type and Select-Xml for legacy systems
  • CliXml: Export-Clixml, Import-Clixml for PowerShell-native serialization
  • REST APIs: Invoke-RestMethod for web service integration
  • Databases: .NET SQL classes or Invoke-Sqlcmd for direct queries

Master these formats to build powerful data pipelines that integrate diverse systems and automate complex workflows.

Was this page helpful?
SR

Written by the ShellRAG Team

The ShellRAG editorial team writes practical, beginner-friendly PowerShell tutorials with tested code examples and real-world use cases. Every article is technically reviewed for accuracy and updated regularly.

Learn more about us →