Indexing Sitecore Content Hub Assets in Coveo: A Practical Guide

In the composable world of modern Sitecore solutions, integrating search appliances like Coveo with digital asset management tools such as Sitecore Content Hub is common practice. One typical use case involves building Coveo search sources to index content stored in Sitecore Content Hub. These indexes can then be used to create rich front-end search experiences.

Use Case: Indexing Approved PDFs with Public URLs

In this post, we explore a scenario where we needed to index only approved PDF documents from Content Hub that have public URLs. We leveraged the Scroll API to retrieve and process a large number of results efficiently.

Using the Scroll API

The Scroll API is a powerful tool for retrieving extensive results from a single query without the limitations of traditional pagination. Unlike standard queries, it does not support the skip parameter and relies on retrieving the next page through the resource link. This allows continuous paging until no more results are available.

Here’s the Scroll API query we used:

GET /api/entities/scroll?query=Definition.Name=='M.Asset' AND Parent('AssetMediaToAsset').id==1057 AND Parent('FinalLifeCycleStatusToAsset').id==544&sort=createdOn&order=Asc

This query retrieves assets defined as 'M.Asset' and filters them to those associated with specific media and lifecycle status IDs.

Filtering for Approved PDFs

To further refine our results to only include approved PDF assets, we used the following filters:

  • PDF Assets: Parent('AssetMediaToAsset').id==1057
  • Approved Assets: Parent('FinalLifeCycleStatusToAsset')==544

Adding these filters to our query ensured that only approved PDF assets were returned. Although these IDs are typically standard across all Content Hub instances, it's important to verify them in your own Content Hub instance before using them in queries. You can locate the asset media type IDs under Taxonomy Management within Content Hub.

Retrieving Public URLs of an Asset

To include the public URLs of the PDFs in the index, we had to call additional APIs. For each asset returned by the Scroll API, we used the following endpoint to get all API URLs of the public links:

GET /api/entities/{assetId}/relations/AssetToPublicLink

The response provides the actual API links needed to obtain the public URL. Here’s an example response:

{
    "children": [
        {"href": "/api/entities/303872"},
        {"href": "/api/entities/235003"}
    ],
    "inherits_security": true,
    "self": {"href": "/api/entities/31999/relations/AssetToPublicLink"}
}

We then called each API link to retrieve the public URL.

Indexing Data into Coveo Using Push APIs

Step 1: Create the File Container

  1. Obtain Access Token: Ensure it has Privileges / Organization / Organization = View.

  2. Gather Information:

    • OrganizationId: Found in your Coveo Platform.
    • useVirtualHostedStyleUrl: Set to true.
  3. Create File Container:

    • POST Request:
      POST https://api.cloud.coveo.com/push/v1/organizations/<OrganizationId>/files?useVirtualHostedStyleUrl=true
      Content-Type: application/json
      Accept: application/json
      Authorization: Bearer <AccessToken>
      
    • Response: Contains uploadUri, fileId, requiredHeaders.

Step 2: Upload Data to File Container

  1. Prepare JSON Data: Structure your data using the addOrUpdate array.

    {
        "addOrUpdate": [
            {
                "m_title": "Example Title",
                "m_overview": "Description",
                "documentId": "http://example.com",
                "data": "Content",
                "fileExtension": ".txt"
            }
        ]
    }
    
  2. Upload Data:

    • PUT Request:
      PUT <uploadUri - From Step 1>
      Content-Type: application/octet-stream
      x-amz-server-side-encryption: AES256
      
    • Body: Your JSON data.

Step 3: Push File Container to Push Source

  1. Push Data:

    • PUT Request:
      PUT https://api.cloud.coveo.com/push/v1/organizations/<OrganizationId>/sources/<SourceId>/documents/batch?fileId=<fileId>
      Content-Type: application/json
      Authorization: Bearer <AccessToken>
      
  2. Response: Should be null with a 202 Accepted status if successful.

Pseudo Implementation

With the necessary data gathered, we could proceed to index the content into Coveo. Below is a sample implementation in C#:

1public void Process(string apiUrl, bool recursive = true)
2{
3    // Calling scroll API, and parse the API response into relevant model
4    var contentHubItems = GetContentHubAssets(apiUrl);
5
6    // Process each item in contentHubItems  
7    foreach (var item in contentHubItems)
8    {
9        try
10        {
11            // Get Public link of the content hub Item
12            var publicLink = GetAssetPublicLink(item.assetToPublicLink);
13            
14            // Parse content hub item required information to coveo model
15            List<CoveoItem> coveoItemsModel = new List<CoveoItem>();
16            CoveoItem coveoObject = GetCoveoObject(publicLink, item);
17            coveoItemsModel.Add(coveoObject);
18        }
19        catch (Exception ex)
20        {
21            // Log Error
22        }
23    }
24
25    // Try to push items to Coveo
26    if (coveoItemsModel.Count > 0)
27    {
28        var coveoIPushModel = new CoveoItemsModel
29        {
30            addOrUpdate = coveoItemsModel
31        };
32        
33        var containerInfo = CoveoHelper.CreateFileContainer();
34        CoveoHelper.UploadConetntsToContainer(containerInfo.uploadUri, containerInfo.requiredHeaders.XAmzServerSideEncryption, coveoIPushModel);
35        CoveoHelper.PushContainerToSource(containerInfo.fileId, containerInfo.requiredHeaders.XAmzServerSideEncryption);
36    }
37
38    // Recursively process next page if recursive flag is true
39    if (recursive && contentHubItems.next != null && !string.IsNullOrEmpty(contentHubItems.next.href))
40    {
41        Process(contentHubItems.next.href);
42    }
43}

Conclusion

By leveraging the Scroll API and additional API calls for public URLs, we were able to effectively index approved PDF documents from Sitecore Content Hub into Coveo. This approach ensures that only the necessary and approved assets are indexed, optimizing the search experience.

Feel free to use and adapt this approach for your own Sitecore Content hub and Coveo integrations.

Related Posts

Coveo for Sitecore - Guide to Migrate Changes Between Cloud Organizations

Moving to a new Coveo Cloud organization can be a strategic step for your business. But what about the customizations and data you've built up in your dev or QA organization? This post will guide you

Read More

Indexing Sitecore Content Hub Assets in Coveo: A Practical Guide

In the composable world of modern Sitecore solutions, integrating search appliances like Coveo with digital asset management tools such as Sitecore Content Hub is common practice. One typical use cas

Read More