Content migration projects are notoriously painful Excel files flying around email threads, editors losing track of versions, and developers drowning in manual imports. I recently implemented a workflow that keeps Excel in Sitecore Media Library, transforms it to JSON with Next.js, and imports directly into Sitecore via PowerShell. No temp files, no duplication, just clean automation.
The Challenge
- Content teams live in Excel.
- Developers prefer structured formats like JSON.
- Sitecore needs clean, validated data to create content items at scale.
Traditional approaches meant exporting JSON files, juggling temp folders, and scripting one-off transformations. I wanted a system that was repeatable, auditable, and fully API-driven.
The Architecture
- Excel in Sitecore Media Library → Editors upload workbooks as media items.
- GraphQL Query → Resolve the media item URL via Sitecore GraphQL.
- Next.js API (with
xlsx
) → Download, parse, normalize headers, return JSON.
- PowerShell + SPE → Call API, validate, and upsert items into Sitecore.
This creates a seamless Excel → JSON → Sitecore pipeline.
We start by resolving the Excel file stored in the Media Library:
query ExcelMedia($id: String!) {
item(path: $id, language: "en") {
... on MediaItem {
id
name
extension
url: mediaUrl
}
}
}
This gives us the download URL for the Excel file.
Step 2 — Next.js API: Parse Excel into JSON
The API fetches the media binary, parses it with xlsx
, and maps it to clean JSON. Features include:
- Column mapping dictionary (
Page Title
→ Title
)
- Validation (required fields)
- Duplicate detection (ExternalId or Title)
- Dry-run mode
- Sheet selection (
?sheet=LandingPages
)
import * as XLSX from "xlsx";
const COLUMN_MAP = {
"title": "Title",
"page title": "Title",
"summary": "Summary",
"body": "Body",
"content": "Body",
"category": "Category",
"external id": "ExternalId",
"id": "ExternalId"
};
// Normalize + validate rows
type MappedRow = { Title: string; Summary?: string; Body?: string; Category?: string; ExternalId?: string };
function mapRow(raw: Record<string, unknown>): MappedRow {
const mapped: Partial<MappedRow> = {};
for (const [k, v] of Object.entries(raw)) {
const key = String(k).toLowerCase().trim();
const field = COLUMN_MAP[key];
if (!field) continue;
mapped[field] = String(v ?? "");
}
mapped.Title = (mapped.Title ?? "Untitled").trim();
mapped.Category = mapped.Category ?? "General";
return mapped as MappedRow;
}
export async function GET(req: Request) {
const url = new URL(req.url);
const mediaId = url.searchParams.get("mediaId");
const dryRun = url.searchParams.get("dryRun") === "true";
// 1. Resolve Media URL via GraphQL (omitted here for brevity)
// 2. Fetch binary
// 3. Parse workbook
const wb = XLSX.read(/* arrayBuffer */, { type: "array" });
const ws = wb.Sheets[wb.SheetNames[0]];
const raw = XLSX.utils.sheet_to_json(ws, { defval: "" });
const mapped = raw.map(mapRow);
return Response.json({
dryRun,
count: mapped.length,
data: mapped
});
}
Step 3 — PowerShell Importer
The PowerShell script calls the API, validates rows, and upserts into Sitecore. It respects dry-run mode and reports created, updated, skipped, failed.
param(
[string]$MediaId = "media:/sitecore/media library/Your/Path/content.xlsx",
[string]$ApiBase = "https://yourapp.com/api/import/parse-excel",
[switch]$DryRun
)
$endpoint = "$ApiBase?mediaId=$([uri]::EscapeDataString($MediaId))" + ($(if ($DryRun) { "&dryRun=true" } else { "" }))
$response = Invoke-RestMethod -Uri $endpoint -Method Get
$parentPath = "master:/sitecore/content/YourSite/Migrated"
$templateId = "{TEMPLATE-GUID-HERE}"
foreach ($row in $response.data) {
$title = $row.Title
$name = [Sitecore.Data.Items.ItemUtil]::ProposeValidItemName($title)
# Resolve existing by ExternalId or Title
$item = $null
if ($row.ExternalId) { $item = Get-Item "$parentPath/*" | Where-Object { $_["ExternalId"] -eq $row.ExternalId } | Select-Object -First 1 }
if (-not $item) { $item = Get-Item "$parentPath/*" | Where-Object { $_["Title"] -eq $title } | Select-Object -First 1 }
if (-not $item) {
$item = New-Item -Path $parentPath -Name $name -ItemType $templateId
}
$item.Editing.BeginEdit()
try {
$item["Title"] = $row.Title
$item["Summary"] = $row.Summary
$item["Body"] = $row.Body
$item["Category"] = $row.Category
$item["ExternalId"] = $row.ExternalId
} finally {
$item.Editing.EndEdit()
}
}
Hardening & Enhancements
- Dry-run mode → editors see validation errors without creating items.
- Row-level error reporting → API returns
{ rowNumber, messages }
for invalid rows.
- Duplicate detection → prevents double imports by checking
ExternalId
/Title
.
- Observability → counts, logs, and metrics pushed to dashboards.
- Security → protect API with signed tokens or IP allowlisting.
Why This Works
✅ Editors stay in their comfort zone (Excel in Media Library).
✅ Developers get a stable JSON contract.
✅ PowerShell focuses on item creation, not messy parsing.
✅ Re-runs are safe thanks to upserts + duplicate detection.
This isn’t just about moving content—it’s about building a reliable migration engine that’s reusable for future projects.
Final Thought
When migrations are API-driven, versioned, and validated at the edge, you get repeatability, safety, and speed. That’s exactly what this Next.js + PowerShell pipeline delivered.