Working with Data
Summary: in this tutorial, you will learn import, transform, and export data in powershell. master csv, json, xml, excel, databases, rest apis, and build data processing pipelines.
Working with Data
PowerShell excels at data manipulation. Unlike traditional shells that deal with text, PowerShell works with structured objects, making it natural to transform data between formats — CSV, JSON, XML, databases, and web APIs — without brittle text parsing.
Whether you're importing user data from CSV, consuming REST APIs, querying databases, or generating reports, PowerShell provides cmdlets specifically designed for each format. This chapter teaches you to build data processing pipelines that read, transform, and export data efficiently.
Why PowerShell for Data Processing
Traditional data processing involves:
- Parsing text with
awk,sed, regex - Dealing with inconsistent formats, edge cases
- Writing fragile scripts that break when data changes
PowerShell's approach:
- Format-aware cmdlets:
Import-Csv,ConvertFrom-Json,Import-Clixmlunderstand structure - Object pipeline: Transform data as structured objects, not text
- Type-safe operations: Properties have types, preventing common errors
- Seamless format conversion: JSON → CSV → Objects → XML with simple cmdlets
CSV Data: The Universal Format
CSV (Comma-Separated Values) is ubiquitous for data exchange. PowerShell makes CSV processing trivial.
Importing CSV Files
# Import CSV (auto-detects headers from first line)
$employees = Import-Csv "employees.csv"
# Access data
$employees | Format-Table
$employees[0].Name # Access properties by header name
$employees[0].Department
$employees[0].Salary
# CSV with different delimiter
$data = Import-Csv "data.tsv" -Delimiter "`t" # Tab-separated
$data = Import-Csv "euro.csv" -Delimiter ";" # Semicolon-separated
# CSV without headers (provide your own)
$data = Import-Csv "data.csv" -Header "ID", "Name", "Email", "Department"
# Import from string (useful for testing)
$csvText = @"
Name,Age,City
Alice,30,Seattle
Bob,25,Portland
Charlie,35,Austin
"@
$data = $csvText | ConvertFrom-Csv
How it works: Import-Csv reads the first line as headers, converts each subsequent line to a PSCustomObject with properties matching the headers. You get an array of objects, not raw text.
Processing CSV Data
$employees = Import-Csv "employees.csv"
# Filter
$engineers = $employees | Where-Object { $_.Department -eq "Engineering" }
$highEarners = $employees | Where-Object { [double]$_.Salary -gt 100000 }
# Sort
$byName = $employees | Sort-Object Name
$bySalary = $employees | Sort-Object { [double]$_.Salary } -Descending
# Select specific columns
$employees | Select-Object Name, Department, Email | Format-Table
# Add calculated columns
$employees | Select-Object Name, Salary,
@{Name='Annual'; Expression={[double]$_.Salary * 12}},
@{Name='Bonus'; Expression={[double]$_.Salary * 0.10}} |
Format-Table -AutoSize
# Group and summarize
$summary = $employees | Group-Object Department | ForEach-Object {
$salaries = $_.Group.Salary | ForEach-Object { [double]$_ }
[PSCustomObject]@{
Department = $_.Name
EmployeeCount = $_.Count
AvgSalary = [math]::Round(($salaries | Measure-Object -Average).Average, 2)
TotalSalary = [math]::Round(($salaries | Measure-Object -Sum).Sum, 2)
}
}
$summary | Format-Table -AutoSize
# Transform data
$processed = $employees | ForEach-Object {
[PSCustomObject]@{
FullName = $_.Name
Email = $_.Email.ToLower()
Dept = $_.Department
YearlySal = [double]$_.Salary * 12
HireYear = [datetime]::Parse($_.HireDate).Year
}
}
Exporting CSV
# Export to CSV
$employees | Export-Csv "output.csv" -NoTypeInformation
# Export selected columns only
$employees | Select-Object Name, Department, Email |
Export-Csv "contact_list.csv" -NoTypeInformation
# Append to existing CSV
$newEmployees | Export-Csv "employees.csv" -Append -NoTypeInformation
# Export with different delimiter
$data | Export-Csv "output.tsv" -Delimiter "`t" -NoTypeInformation
# Convert to CSV string (not a file)
$csvString = $employees | ConvertTo-Csv -NoTypeInformation
$csvString | Out-File "data.csv"
Always use -NoTypeInformation
Without it, PowerShell adds a #TYPE line at the top of CSV files, which breaks compatibility with Excel and other tools. In PowerShell 7+, -NoTypeInformation is the default.
JSON Data: Web APIs and Configuration
JSON is the standard format for web APIs and modern configuration files. PowerShell provides seamless JSON handling.
Reading JSON
# From file
$config = Get-Content "config.json" -Raw | ConvertFrom-Json
# Access properties
$config.database.host
$config.database.port
$config.features.enabled
# From string
$jsonText = @"
{
"name": "MyApp",
"version": "1.0.0",
"database": {
"host": "localhost",
"port": 5432
}
}
"@
$config = $jsonText | ConvertFrom-Json
# From web API
$repos = Invoke-RestMethod "https://api.github.com/users/powershell/repos"
$repos | Select-Object name, stargazers_count, language | Format-Table
Why -Raw? ConvertFrom-Json expects a single string, not an array of lines. -Raw reads the file as one string.
Modifying JSON Data
# Read JSON
$config = Get-Content "config.json" -Raw | ConvertFrom-Json
# Modify properties
$config.database.host = "prod-server.example.com"
$config.database.port = 5433
$config.lastModified = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
# Add new properties
$config | Add-Member -NotePropertyName "environment" -NotePropertyValue "production"
# Save back to file
$config | ConvertTo-Json -Depth 10 | Out-File "config.json"
-Depth parameter: JSON can be nested deeply. -Depth controls how many levels to serialize (default is 2, which truncates deep objects). Use higher values for complex data.
Working with JSON Arrays
# Array of objects
$users = @"
[
{"name": "Alice", "age": 30, "active": true},
{"name": "Bob", "age": 25, "active": false},
{"name": "Charlie", "age": 35, "active": true}
]
"@ | ConvertFrom-Json
# Filter
$activeUsers = $users | Where-Object { $_.active }
# Transform
$userSummary = $users | ForEach-Object {
[PSCustomObject]@{
Name = $_.name.ToUpper()
Age = $_.age
Status = if ($_.active) { "Active" } else { "Inactive" }
}
}
# Export as JSON
$userSummary | ConvertTo-Json | Out-File "summary.json"
Pretty-Printing JSON
# Format JSON readably
$data | ConvertTo-Json -Depth 10 | Out-File "output.json"
# Compress JSON (no whitespace)
$data | ConvertTo-Json -Compress -Depth 10 | Out-File "compact.json"
XML Data: Configuration and Legacy Systems
XML is common in enterprise systems and configuration files. PowerShell provides multiple ways to work with XML.
Reading XML
# Read XML file
[xml]$config = Get-Content "config.xml"
# Access elements
$config.configuration.appSettings.add | ForEach-Object {
"$($_.key) = $($_.value)"
}
# Navigate with XPath
$nodes = $config.SelectNodes("//add[@key='DatabaseConnection']")
$nodes[0].value
# From string
$xmlText = @"
<users>
<user id="1">
<name>Alice</name>
<email>alice@example.com</email>
</user>
<user id="2">
<name>Bob</name>
<email>bob@example.com</email>
</user>
</users>
"@
[xml]$users = $xmlText
# Access nested elements
$users.users.user | ForEach-Object {
[PSCustomObject]@{
ID = $_.id
Name = $_.name
Email = $_.email
}
} | Format-Table
Modifying XML
[xml]$config = Get-Content "config.xml"
# Modify element
$node = $config.SelectSingleNode("//add[@key='Port']")
$node.value = "8080"
# Add new element
$newNode = $config.CreateElement("add")
$newNode.SetAttribute("key", "LogLevel")
$newNode.SetAttribute("value", "Debug")
$config.configuration.appSettings.AppendChild($newNode)
# Save
$config.Save("config.xml")
Using Select-Xml
# Search XML with XPath
$results = Select-Xml -Path "data.xml" -XPath "//user[@active='true']"
$results | ForEach-Object {
$node = $_.Node
[PSCustomObject]@{
Name = $node.name
Email = $node.email
Department = $node.department
}
} | Format-Table
PowerShell's Native Format: CliXml
Export-Clixml and Import-Clixml preserve PowerShell objects perfectly, including types, properties, and complex structures.
# Export objects
$processes = Get-Process | Select-Object Name, CPU, WorkingSet64
$processes | Export-Clixml "processes.xml"
# Import later (objects are restored exactly)
$loaded = Import-Clixml "processes.xml"
$loaded | Format-Table
# Works with complex objects
$data = @{
Timestamp = Get-Date
Servers = @("Server1", "Server2", "Server3")
Config = @{
Port = 8080
Enabled = $true
}
}
$data | Export-Clixml "state.xml"
$restored = Import-Clixml "state.xml"
When to use CliXml:
- Saving PowerShell state between sessions
- Caching complex query results
- Preserving object types and methods
- Not for interoperability (use JSON/CSV for that)
Working with REST APIs
Modern data often comes from web APIs. PowerShell makes API consumption simple.
GET Requests
# Simple GET
$repos = Invoke-RestMethod "https://api.github.com/users/powershell/repos"
$repos | Select-Object name, description, stargazers_count | Format-Table
# With headers
$headers = @{
"Authorization" = "Bearer YOUR_TOKEN"
"Accept" = "application/json"
}
$data = Invoke-RestMethod "https://api.example.com/data" -Headers $headers
# With query parameters
$params = @{
Uri = "https://api.example.com/search"
Method = "GET"
Body = @{
q = "powershell"
per_page = 10
sort = "stars"
}
}
$results = Invoke-RestMethod @params
POST Requests
# POST JSON data
$body = @{
name = "John Doe"
email = "john@example.com"
role = "Admin"
} | ConvertTo-Json
$response = Invoke-RestMethod `
-Uri "https://api.example.com/users" `
-Method POST `
-Headers @{"Content-Type" = "application/json"} `
-Body $body
# POST form data
$formData = @{
username = "alice"
password = "secret123"
}
$response = Invoke-RestMethod `
-Uri "https://api.example.com/login" `
-Method POST `
-Body $formData
Handling API Responses
try {
$data = Invoke-RestMethod "https://api.example.com/data" -ErrorAction Stop
# Process response
$data.items | ForEach-Object {
[PSCustomObject]@{
ID = $_.id
Name = $_.name
Status = $_.status
}
}
}
catch {
$statusCode = $_.Exception.Response.StatusCode.value__
Write-Error "API request failed with status $statusCode: $($_.Exception.Message)"
}
Pagination
function Get-PaginatedData {
param(
[string]$BaseUri,
[int]$PageSize = 100
)
$page = 1
$allData = @()
do {
$uri = "${BaseUri}?page=$page&per_page=$PageSize"
$response = Invoke-RestMethod $uri
$allData += $response
$page++
Write-Progress -Activity "Fetching data" -Status "Page $page"
} while ($response.Count -eq $PageSize)
Write-Progress -Activity "Fetching data" -Completed
return $allData
}
$allRepos = Get-PaginatedData -BaseUri "https://api.github.com/orgs/microsoft/repos"
Database Queries
PowerShell can query databases directly using .NET SQL classes.
SQL Server
# Connection
$connectionString = "Server=localhost;Database=MyDB;Integrated Security=True;"
$connection = New-Object System.Data.SqlClient.SqlConnection($connectionString)
try {
$connection.Open()
# Query
$command = $connection.CreateCommand()
$command.CommandText = "SELECT * FROM Users WHERE Active = 1"
$adapter = New-Object System.Data.SqlClient.SqlDataAdapter($command)
$dataset = New-Object System.Data.DataSet
$adapter.Fill($dataset)
# Process results
$dataset.Tables[0] | ForEach-Object {
[PSCustomObject]@{
UserID = $_.UserID
UserName = $_.UserName
Email = $_.Email
}
} | Format-Table
}
finally {
$connection.Close()
}
Parameterized Queries (Prevent SQL Injection)
$command.CommandText = "SELECT * FROM Users WHERE Department = @dept"
$command.Parameters.AddWithValue("@dept", "Engineering")
Using Invoke-Sqlcmd (SQL Server Module)
# Install module
Install-Module -Name SqlServer
# Query
$results = Invoke-Sqlcmd -ServerInstance "localhost" -Database "MyDB" -Query "SELECT * FROM Users"
$results | Format-Table
# From file
Invoke-Sqlcmd -ServerInstance "localhost" -Database "MyDB" -InputFile "query.sql"
Data Transformation Patterns
Merging Data Sources
# Load from multiple sources
$csvUsers = Import-Csv "users.csv"
$jsonConfig = Get-Content "config.json" -Raw | ConvertFrom-Json
# Merge
$merged = $csvUsers | ForEach-Object {
$user = $_
$config = $jsonConfig.users | Where-Object { $_.id -eq $user.ID }
[PSCustomObject]@{
Name = $user.Name
Email = $user.Email
Department = $user.Department
Preferences = $config.preferences
Theme = $config.theme
}
}
Data Enrichment
# Enrich CSV with API data
$users = Import-Csv "users.csv"
$enriched = $users | ForEach-Object {
$githubData = Invoke-RestMethod "https://api.github.com/users/$($_.GithubUsername)"
[PSCustomObject]@{
Name = $_.Name
Email = $_.Email
GithubRepos = $githubData.public_repos
GithubFollowers = $githubData.followers
GithubBio = $githubData.bio
}
}
$enriched | Export-Csv "enriched_users.csv" -NoTypeInformation
Aggregation and Reporting
# Load sales data
$sales = Import-Csv "sales.csv"
# Aggregate by region
$report = $sales | Group-Object Region | ForEach-Object {
$total = ($_.Group.Amount | ForEach-Object { [double]$_ } | Measure-Object -Sum).Sum
$avg = ($_.Group.Amount | ForEach-Object { [double]$_ } | Measure-Object -Average).Average
[PSCustomObject]@{
Region = $_.Name
TotalSales = [math]::Round($total, 2)
AvgSale = [math]::Round($avg, 2)
Orders = $_.Count
}
} | Sort-Object TotalSales -Descending
$report | Format-Table -AutoSize
$report | Export-Csv "sales_report.csv" -NoTypeInformation
Exercises
Create a script that:
- Imports employee CSV with columns: Name, Email, Department, Salary, HireDate
- Filters employees hired in last year
- Groups by department
- Calculates average salary per department
- Exports summary to JSON
Show Solution
# Import data
$employees = Import-Csv "employees.csv"
# Filter recent hires
$oneYearAgo = (Get-Date).AddYears(-1)
$recentHires = $employees | Where-Object {
[datetime]::Parse($_.HireDate) -gt $oneYearAgo
}
Write-Host "Found $($recentHires.Count) employees hired in the last year"
# Group by department and calculate stats
$summary = $recentHires | Group-Object Department | ForEach-Object {
$salaries = $_.Group.Salary | ForEach-Object { [double]$_ }
[PSCustomObject]@{
Department = $_.Name
EmployeeCount = $_.Count
AverageSalary = [math]::Round(($salaries | Measure-Object -Average).Average, 2)
MinSalary = [math]::Round(($salaries | Measure-Object -Minimum).Minimum, 2)
MaxSalary = [math]::Round(($salaries | Measure-Object -Maximum).Maximum, 2)
TotalSalaryExpense = [math]::Round(($salaries | Measure-Object -Sum).Sum, 2)
}
} | Sort-Object AverageSalary -Descending
# Display
$summary | Format-Table -AutoSize
# Export to JSON
$summary | ConvertTo-Json -Depth 10 | Out-File "department_summary.json"
Write-Host "Summary exported to department_summary.json" -ForegroundColor Green
Create a function that:
- Fetches GitHub user data via API
- Gets their repositories
- Calculates total stars across all repos
- Exports to CSV
Show Solution
function Get-GitHubUserStats {
[CmdletBinding()]
param(
[Parameter(Mandatory, ValueFromPipeline)]
[string[]]$UserName
)
begin {
$results = @()
}
process {
foreach ($user in $UserName) {
Write-Verbose "Fetching data for $user..."
try {
# Get user info
$userInfo = Invoke-RestMethod "https://api.github.com/users/$user" -ErrorAction Stop
# Get repositories
$repos = Invoke-RestMethod "https://api.github.com/users/$user/repos?per_page=100" -ErrorAction Stop
# Calculate stats
$totalStars = ($repos.stargazers_count | Measure-Object -Sum).Sum
$totalRepos = $repos.Count
$languages = $repos.language | Where-Object { $_ } | Group-Object | Sort-Object Count -Descending
$results += [PSCustomObject]@{
Username = $user
Name = $userInfo.name
PublicRepos = $totalRepos
TotalStars = $totalStars
Followers = $userInfo.followers
Following = $userInfo.following
TopLanguage = if ($languages) { $languages[0].Name } else { "N/A" }
ProfileUrl = $userInfo.html_url
}
Write-Host "✓ Processed $user" -ForegroundColor Green
}
catch {
Write-Warning "Failed to fetch data for $user: $($_.Exception.Message)"
}
}
}
end {
return $results
}
}
# Usage
$users = @("powershell", "microsoft", "github")
$stats = Get-GitHubUserStats -UserName $users -Verbose
$stats | Format-Table -AutoSize
$stats | Export-Csv "github_stats.csv" -NoTypeInformation
Write-Host "Stats exported to github_stats.csv" -ForegroundColor Green
Create a tool that converts between CSV, JSON, and XML:
- Reads one format
- Converts to another
- Handles nested data
Show Solution
function Convert-DataFormat {
[CmdletBinding()]
param(
[Parameter(Mandatory)]
[ValidateScript({ Test-Path $_ })]
[string]$InputFile,
[Parameter(Mandatory)]
[ValidateSet("CSV", "JSON", "XML")]
[string]$OutputFormat,
[Parameter(Mandatory)]
[string]$OutputFile
)
# Detect input format
$extension = [System.IO.Path]::GetExtension($InputFile).ToLower()
Write-Host "Reading $extension file: $InputFile" -ForegroundColor Cyan
# Import based on format
$data = switch ($extension) {
".csv" {
Import-Csv $InputFile
}
".json" {
Get-Content $InputFile -Raw | ConvertFrom-Json
}
".xml" {
[xml]$xml = Get-Content $InputFile
# Convert XML to objects (simplified)
$xml.DocumentElement.ChildNodes | ForEach-Object {
$obj = [PSCustomObject]@{}
$_.Attributes | ForEach-Object {
$obj | Add-Member -NotePropertyName $_.Name -NotePropertyValue $_.Value
}
$obj
}
}
default {
throw "Unsupported input format: $extension"
}
}
Write-Host "Converting to $OutputFormat..." -ForegroundColor Yellow
# Export to target format
switch ($OutputFormat) {
"CSV" {
$data | Export-Csv $OutputFile -NoTypeInformation
}
"JSON" {
$data | ConvertTo-Json -Depth 10 | Out-File $OutputFile
}
"XML" {
$data | Export-Clixml $OutputFile
}
}
Write-Host "✓ Converted successfully to: $OutputFile" -ForegroundColor Green
# Show sample
Write-Host "`nSample output:" -ForegroundColor Cyan
$data | Select-Object -First 3 | Format-List
}
# Test
Convert-DataFormat -InputFile "employees.csv" -OutputFormat "JSON" -OutputFile "employees.json"
Convert-DataFormat -InputFile "config.json" -OutputFormat "CSV" -OutputFile "config.csv"
Summary
PowerShell's data handling capabilities make it ideal for ETL (Extract, Transform, Load) workflows:
- CSV:
Import-Csv,Export-Csvfor tabular data - JSON:
ConvertFrom-Json,ConvertTo-Jsonfor web APIs and config - XML: Native
[xml]type andSelect-Xmlfor legacy systems - CliXml:
Export-Clixml,Import-Clixmlfor PowerShell-native serialization - REST APIs:
Invoke-RestMethodfor web service integration - Databases: .NET SQL classes or
Invoke-Sqlcmdfor direct queries
Master these formats to build powerful data pipelines that integrate diverse systems and automate complex workflows.
Written by the ShellRAG Team
The ShellRAG editorial team writes practical, beginner-friendly PowerShell tutorials with tested code examples and real-world use cases. Every article is technically reviewed for accuracy and updated regularly.
Learn more about us →