From Excel to Sitecore - My Streamlined Migration Workflow via PowerShell

Content migration projects are notoriously painful Excel files flying around email threads, editors losing track of versions, and developers drowning in manual imports. I recently implemented a workflow that keeps Excel in Sitecore Media Library, transforms it to JSON with Next.js, and imports directly into Sitecore via PowerShell. No temp files, no duplication, just clean automation.

The Challenge

  • Content teams live in Excel.
  • Developers prefer structured formats like JSON.
  • Sitecore needs clean, validated data to create content items at scale.

Traditional approaches meant exporting JSON files, juggling temp folders, and scripting one-off transformations. I wanted a system that was repeatable, auditable, and fully API-driven.


The Architecture

  1. Excel in Sitecore Media Library → Editors upload workbooks as media items.
  2. GraphQL Query → Resolve the media item URL via Sitecore GraphQL.
  3. Next.js API (with xlsx) → Download, parse, normalize headers, return JSON.
  4. PowerShell + SPE → Call API, validate, and upsert items into Sitecore.

This creates a seamless Excel → JSON → Sitecore pipeline.


Step 1 — GraphQL to Resolve the Media Item

We start by resolving the Excel file stored in the Media Library:

query ExcelMedia($id: String!) {
  item(path: $id, language: "en") {
    ... on MediaItem {
      id
      name
      extension
      url: mediaUrl
    }
  }
}

This gives us the download URL for the Excel file.


Step 2 — Next.js API: Parse Excel into JSON

The API fetches the media binary, parses it with xlsx, and maps it to clean JSON. Features include:

  • Column mapping dictionary (Page TitleTitle)
  • Validation (required fields)
  • Duplicate detection (ExternalId or Title)
  • Dry-run mode
  • Sheet selection (?sheet=LandingPages)
import * as XLSX from "xlsx";

const COLUMN_MAP = {
  "title": "Title",
  "page title": "Title",
  "summary": "Summary",
  "body": "Body",
  "content": "Body",
  "category": "Category",
  "external id": "ExternalId",
  "id": "ExternalId"
};

// Normalize + validate rows
type MappedRow = { Title: string; Summary?: string; Body?: string; Category?: string; ExternalId?: string };

function mapRow(raw: Record<string, unknown>): MappedRow {
  const mapped: Partial<MappedRow> = {};
  for (const [k, v] of Object.entries(raw)) {
    const key = String(k).toLowerCase().trim();
    const field = COLUMN_MAP[key];
    if (!field) continue;
    mapped[field] = String(v ?? "");
  }
  mapped.Title = (mapped.Title ?? "Untitled").trim();
  mapped.Category = mapped.Category ?? "General";
  return mapped as MappedRow;
}

export async function GET(req: Request) {
  const url = new URL(req.url);
  const mediaId = url.searchParams.get("mediaId");
  const dryRun = url.searchParams.get("dryRun") === "true";

  // 1. Resolve Media URL via GraphQL (omitted here for brevity)
  // 2. Fetch binary
  // 3. Parse workbook

  const wb = XLSX.read(/* arrayBuffer */, { type: "array" });
  const ws = wb.Sheets[wb.SheetNames[0]];
  const raw = XLSX.utils.sheet_to_json(ws, { defval: "" });

  const mapped = raw.map(mapRow);

  return Response.json({
    dryRun,
    count: mapped.length,
    data: mapped
  });
}

Step 3 — PowerShell Importer

The PowerShell script calls the API, validates rows, and upserts into Sitecore. It respects dry-run mode and reports created, updated, skipped, failed.

param(
  [string]$MediaId = "media:/sitecore/media library/Your/Path/content.xlsx",
  [string]$ApiBase = "https://yourapp.com/api/import/parse-excel",
  [switch]$DryRun
)

$endpoint = "$ApiBase?mediaId=$([uri]::EscapeDataString($MediaId))" + ($(if ($DryRun) { "&dryRun=true" } else { "" }))
$response = Invoke-RestMethod -Uri $endpoint -Method Get

$parentPath = "master:/sitecore/content/YourSite/Migrated"
$templateId = "{TEMPLATE-GUID-HERE}"

foreach ($row in $response.data) {
  $title = $row.Title
  $name  = [Sitecore.Data.Items.ItemUtil]::ProposeValidItemName($title)

  # Resolve existing by ExternalId or Title
  $item = $null
  if ($row.ExternalId) { $item = Get-Item "$parentPath/*" | Where-Object { $_["ExternalId"] -eq $row.ExternalId } | Select-Object -First 1 }
  if (-not $item) { $item = Get-Item "$parentPath/*" | Where-Object { $_["Title"] -eq $title } | Select-Object -First 1 }

  if (-not $item) {
    $item = New-Item -Path $parentPath -Name $name -ItemType $templateId
  }

  $item.Editing.BeginEdit()
  try {
    $item["Title"]      = $row.Title
    $item["Summary"]    = $row.Summary
    $item["Body"]       = $row.Body
    $item["Category"]   = $row.Category
    $item["ExternalId"] = $row.ExternalId
  } finally {
    $item.Editing.EndEdit()
  }
}

Hardening & Enhancements

  • Dry-run mode → editors see validation errors without creating items.
  • Row-level error reporting → API returns { rowNumber, messages } for invalid rows.
  • Duplicate detection → prevents double imports by checking ExternalId/Title.
  • Observability → counts, logs, and metrics pushed to dashboards.
  • Security → protect API with signed tokens or IP allowlisting.

Why This Works

✅ Editors stay in their comfort zone (Excel in Media Library).

✅ Developers get a stable JSON contract.

✅ PowerShell focuses on item creation, not messy parsing.

✅ Re-runs are safe thanks to upserts + duplicate detection.

This isn’t just about moving content—it’s about building a reliable migration engine that’s reusable for future projects.


Final Thought

When migrations are API-driven, versioned, and validated at the edge, you get repeatability, safety, and speed. That’s exactly what this Next.js + PowerShell pipeline delivered.


Related Posts

From Excel to Sitecore - My Streamlined Migration Workflow via PowerShell

Content migration projects are notoriously painful Excel files flying around email threads, editors losing track of versions, and developers drowning in manual imports. I recently implemented a workf

Read More

Exposing Site Settings from the Settings item in a Headless JSS Next.js Application

In this blog post, we will explore a common requirement in Sitecore Headless SXA, exposing site-specific settings and global data to the front end when working with a JSS headless site. ## Problem S

Read More

Handling Sitecore CM and IdentityServer Behind a Proxy or CDN in an Azure Environment

Recently, while working with a Sitecore 10.4 deployment in an Azure environment, we encountered an interesting challenge: handling Sitecore Content Management (CM) and IdentityServer (SI) behind a pr

Read More