In the composable world of modern Sitecore solutions, integrating search appliances like Coveo with digital asset management tools such as Sitecore Content Hub is common practice. One typical use case involves building Coveo search sources to index content stored in Sitecore Content Hub. These indexes can then be used to create rich front-end search experiences.
Use Case: Indexing Approved PDFs with Public URLs
In this post, we explore a scenario where we needed to index only approved PDF documents from Content Hub that have public URLs. We leveraged the Scroll API to retrieve and process a large number of results efficiently.
The Scroll API is a powerful tool for retrieving extensive results from a single query without the limitations of traditional pagination. Unlike standard queries, it does not support the skip
parameter and relies on retrieving the next page through the resource link. This allows continuous paging until no more results are available.
Here’s the Scroll API query we used:
GET /api/entities/scroll?query=Definition.Name=='M.Asset' AND Parent('AssetMediaToAsset').id==1057 AND Parent('FinalLifeCycleStatusToAsset').id==544&sort=createdOn&order=Asc
This query retrieves assets defined as 'M.Asset'
and filters them to those associated with specific media and lifecycle status IDs.
Filtering for Approved PDFs
To further refine our results to only include approved PDF assets, we used the following filters:
- PDF Assets:
Parent('AssetMediaToAsset').id==1057
- Approved Assets:
Parent('FinalLifeCycleStatusToAsset')==544
Adding these filters to our query ensured that only approved PDF assets were returned. Although these IDs are typically standard across all Content Hub instances, it's important to verify them in your own Content Hub instance before using them in queries. You can locate the asset media type IDs under Taxonomy Management within Content Hub.
Retrieving Public URLs of an Asset
To include the public URLs of the PDFs in the index, we had to call additional APIs. For each asset returned by the Scroll API, we used the following endpoint to get all API URLs of the public links:
GET /api/entities/{assetId}/relations/AssetToPublicLink
The response provides the actual API links needed to obtain the public URL. Here’s an example response:
{
"children": [
{"href": "/api/entities/303872"},
{"href": "/api/entities/235003"}
],
"inherits_security": true,
"self": {"href": "/api/entities/31999/relations/AssetToPublicLink"}
}
We then called each API link to retrieve the public URL.
Indexing Data into Coveo Using Push APIs
Step 1: Create the File Container
-
Obtain Access Token: Ensure it has Privileges / Organization / Organization = View
.
-
Gather Information:
- OrganizationId: Found in your Coveo Platform.
- useVirtualHostedStyleUrl: Set to
true
.
-
Create File Container:
Step 2: Upload Data to File Container
-
Prepare JSON Data: Structure your data using the addOrUpdate
array.
{
"addOrUpdate": [
{
"m_title": "Example Title",
"m_overview": "Description",
"documentId": "http://example.com",
"data": "Content",
"fileExtension": ".txt"
}
]
}
-
Upload Data:
Step 3: Push File Container to Push Source
-
Push Data:
-
Response: Should be null
with a 202 Accepted
status if successful.
Pseudo Implementation
With the necessary data gathered, we could proceed to index the content into Coveo. Below is a sample implementation in C#:
1public void Process(string apiUrl, bool recursive = true)
2{
3 // Calling scroll API, and parse the API response into relevant model
4 var contentHubItems = GetContentHubAssets(apiUrl);
5
6 // Process each item in contentHubItems
7 foreach (var item in contentHubItems)
8 {
9 try
10 {
11 // Get Public link of the content hub Item
12 var publicLink = GetAssetPublicLink(item.assetToPublicLink);
13
14 // Parse content hub item required information to coveo model
15 List<CoveoItem> coveoItemsModel = new List<CoveoItem>();
16 CoveoItem coveoObject = GetCoveoObject(publicLink, item);
17 coveoItemsModel.Add(coveoObject);
18 }
19 catch (Exception ex)
20 {
21 // Log Error
22 }
23 }
24
25 // Try to push items to Coveo
26 if (coveoItemsModel.Count > 0)
27 {
28 var coveoIPushModel = new CoveoItemsModel
29 {
30 addOrUpdate = coveoItemsModel
31 };
32
33 var containerInfo = CoveoHelper.CreateFileContainer();
34 CoveoHelper.UploadConetntsToContainer(containerInfo.uploadUri, containerInfo.requiredHeaders.XAmzServerSideEncryption, coveoIPushModel);
35 CoveoHelper.PushContainerToSource(containerInfo.fileId, containerInfo.requiredHeaders.XAmzServerSideEncryption);
36 }
37
38 // Recursively process next page if recursive flag is true
39 if (recursive && contentHubItems.next != null && !string.IsNullOrEmpty(contentHubItems.next.href))
40 {
41 Process(contentHubItems.next.href);
42 }
43}
Conclusion
By leveraging the Scroll API and additional API calls for public URLs, we were able to effectively index approved PDF documents from Sitecore Content Hub into Coveo. This approach ensures that only the necessary and approved assets are indexed, optimizing the search experience.
Feel free to use and adapt this approach for your own Sitecore Content hub and Coveo integrations.