I've been working on improving the page performance of an Umbraco site that has fairly complex data structures. For one section of the site, there are two levels of categories for content items which can belong to multiple categories. The top level category page displays a pageable, sortable list of items associated with any of its subcategories. These items are sortable by date created and title.

The content structure was accomplished with a custom ContentFinder similar to what I described in the accompanying Using a custom ContentFinder in Umbraco. The top level category pages were performing poorly because there was no efficient way using Umbraco's content cache to get the sorted list of content items. The solution was to use a custom Examine index which could be efficiently queried.

Creating a custom Examine index is fairly well document. First, you need to define the index in \config\ExamineIndex.config. Using the example from my previous post on content finders, the new index set configuration looks like this:

<IndexSet SetName="BooksIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/{machinename}/Books/">
   <IndexAttributeFields> 
     <add Name="id" /> 
     <add Name="nodeName"/> 
   </IndexAttributeFields> 
   <IndexUserFields> 
     <add Name="title" EnableSorting="true" /> 
     <add Name="maincategories" /> 
     <add Name="url" /> 
     <add Name="imageId" /> 
     <add Name="dateCreated" EnableSorting="true" /> 
   </IndexUserFields> 
   <IncludeNodeTypes> 
     <add Name="Book" /> 
   </IncludeNodeTypes> 
   <ExcludeNodeTypes />
</IndexSet>

You will also need to add a corresponding Examine index provider and search provider. 

In order to query this index, we need to add the fields that we can use to query by. For this example, we want to query using the id of the main category. The main category is the parent of the subcategory that the book is related to. This, of course, is not a property of the book node and as such cannot be configured to be automatically included in the index.

One can use the GatheringNodeData event of the Indexer to manipulate the data that gets indexed by Examine. To do this, we bind the event handler to the event during ApplicationEventHandler's ApplicationStarted method.

protected override void ApplicationStarted(UmbracoApplicationBase umbracoApplication, ApplicationContext applicationContext)
{
  ExamineManager.Instance.IndexProviderCollection["ExternalIndexer"].GatheringNodeData 
        += Indexer_GatheringNodeData;
}

And then in the event handler, we can update the Examine fields before they get indexed.

private void Indexer_GatheringNodeData(object sender, IndexingNodeDataEventArgs e)
{
  e.Fields["maincategories"] = GetMainCategoriesForItem(NodeId);
}

Note that it can help to pass an UmbracoHelper object to the event handler as I've described in Using UmbracoHelper from event handlers.

Using the GatheringNodeData event handler, we can set the value of an Examine field to the Id of the Id of the parent of the subcategory related to our item. However, in the particular case I am trying to optimize, the item can relate to multiple subcategory with potentially multiple main categories. Unfortunately, Examine does not support multivalued fields. The creator of Examine (and Umbraco core team member) Shannon Deminick has stated that in the future Examine 2 will support multivalued fields but "in the meantime, you could use the DocumentWriting event on your indexer which gives you direct access to the Lucene Document, then you can index however you like."

So instead of binding an event handler to the GatheringNodeData event, we can bind one to DocumentWriting and add the main category IDs as follows:

private void indexer_DocumentWriting(object sender, DocumentWritingEventArgs e, UmbracoHelper helper)
{
  var item= helper.TypedContent(e.NodeId);
  var relatedSubcategories = GetRelatedSubcategories(item);
  foreach (var relatedSubcategory in relatedSubcategories)
  {
    e.Document.Add(new Field("maincategories", relatedSubcategory.ParentId.ToString(), Field.Store.YES, Field.Index.ANALYZED))
  }
}

While I don't show the implementation of GetRelatedSubcategories(), that is where you would implement your logic that relates items to subcategories. With Lucene.NET, you can add multiple fields of the same name with different values to achieve a multivalued field. The field is not defined in the Examine Index configuration, because it's added directly to the Lucene index. With the values indexed, it is now simple to construct a Lucene query that returns all items that have a specified main category ID:

var searcher = ExamineManager.Instance.SearchProviderCollection["BooksSearcher"];
var searchCriteria = searcher.CreateSearchCriteria(
  UmbracoExamine.IndexTypes.Content
);
var query = searchCriteria.RawQuery("maincategories:" + categoryId);
var results = searcher.Search(query);

The results of this optimization for my site have been significant. Before the optimization, it was taking 625 ms on my laptop to get the list. With Examine, it is only taking 52 ms.