Indexing

edit

NEST has a number of ways to index documents.

Single documents

edit

A single document can be indexed at a time, either synchronously or asynchronously. These methods use the IndexDocument methods, which is a simple way to index single documents.

var person = new Person
{
    Id = 1,
    FirstName = "Martijn",
    LastName = "Laarman"
};

var indexResponse = client.IndexDocument(person); 
if (!indexResponse.IsValid)
{
    // If the request isn't valid, we can take action here
}

var indexResponseAsync = await client.IndexDocumentAsync(person); 

synchronous method that returns an IIndexResponse

asynchronous method that returns a Task<IIndexResponse> that can be awaited

Single documents with parameters

edit

If you need to set additional parameters when indexing you can use the fluent or object initializer syntax. This will allow you finer control over the indexing of single documents.

var person = new Person
{
    Id = 1,
    FirstName = "Martijn",
    LastName = "Laarman"
};

client.Index(person, i => i.Index("people")); 

client.Index(new IndexRequest<Person>(person, "people")); 

fluent syntax

object initializer syntax

Multiple documents with IndexMany

edit

Multiple documents can be indexed using the IndexMany and IndexManyAsync methods, again either synchronously or asynchronously, respectively. These methods are specific to the NEST client and wrap calls to the _bulk endpoint, providing a convenient shortcut to indexing multiple documents.

Please note, these methods index all documents in a single HTTP request, so for very large document collections it is not a recommended approach

  • consider using the BulkAllObservable helper instead.
var people = new []
{
    new Person
    {
        Id = 1,
        FirstName = "Martijn",
        LastName = "Laarman"
    },
    new Person
    {
        Id = 2,
        FirstName = "Stuart",
        LastName = "Cam"
    },
    new Person
    {
        Id = 3,
        FirstName = "Russ",
        LastName = "Cam"
    }
};

var indexManyResponse = client.IndexMany(people); 

if (indexManyResponse.Errors) 
{
    foreach (var itemWithError in indexManyResponse.ItemsWithErrors) 
    {
        Console.WriteLine("Failed to index document {0}: {1}", itemWithError.Id, itemWithError.Error);
    }
}

// Alternatively, documents can be indexed asynchronously
var indexManyAsyncResponse = await client.IndexManyAsync(people); 

synchronous method that returns an IBulkResponse

the response can be inspected to see if any of the bulk operations resulted in an error

If there are errors, they can be enumerated and inspected

asynchronous method that returns a Task<IBulkResponse> that can be awaited

Multiple documents with bulk

edit

If you require finer grained control over indexing many documents you can use the Bulk and BulkAsync methods and use the descriptors to customise the bulk calls.

As with the IndexMany methods above, documents are sent to the _bulk endpoint in a single HTTP request. This does mean that consideration will need to be given to the overall size of the HTTP request. For indexing large numbers of documents it may be sensible to perform multiple separate Bulk calls.

var bulkIndexResponse = client.Bulk(b => b
    .Index("people")
    .IndexMany(people)); 

// Alternatively, documents can be indexed asynchronously similar to IndexManyAsync
var asyncBulkIndexResponse = await client.BulkAsync(b => b
    .Index("people")
    .IndexMany(people)); 

synchronous method that returns an IBulkResponse, the same as IndexMany and can be inspected in the same way for errors

asynchronous method that returns a Task<IBulkResponse> that can be awaited

Multiple documents with BulkAllObservable helper

edit

Using the BulkAllObservable helper allows you to focus on the overall objective of indexing, without having to concern yourself with retry, backoff or chunking mechanics. Multiple documents can be indexed using the BulkAll method and Wait() extension method.

This helper exposes functionality to automatically retry / backoff in the event of an indexing failure, and to control the number of documents indexed in a single HTTP request. In the example below each request will contain 1000 documents, chunked from the original input. In the event of a large number of documents this could result in many HTTP requests, each containing 1000 documents (the last request may contain less, depending on the total number).

The helper will also lazily enumerate an IEnumerable<T> collection, allowing you to index a large number of documents easily.

var bulkAllObservable = client.BulkAll(people, b => b
    .Index("people")
    .BackOffTime("30s") 
    .BackOffRetries(2) 
    .RefreshOnCompleted()
    .MaxDegreeOfParallelism(Environment.ProcessorCount)
    .Size(1000) 
)
.Wait(TimeSpan.FromMinutes(15), next => 
{
    // do something e.g. write number of pages to console
});

how long to wait between retries

how many retries are attempted if a failure occurs

items per bulk request

perform the indexing and wait up to 15 minutes, whilst the BulkAll calls are asynchronous this is a blocking operation

Advanced bulk indexing

edit

The BulkAllObservable helper exposes a number of advanced features.

  1. BufferToBulk allows for the customisation of individual operations within the bulk request before it is dispatched to the server.
  2. RetryDocumentPredicate enables fine control on deciding if a document that failed to be indexed should be retried.
  3. DroppedDocumentCallback in the event a document is not indexed, even after retrying, this delegate is called.
client.BulkAll(people, b => b
      .BufferToBulk((descriptor, list) => 
      {
          foreach (var item in list)
          {
              descriptor.Index<Person>(bi => bi
                  .Index(item.Id % 2 == 0 ? "even-index" : "odd-index") 
                  .Document(item)
              );
          }
      })
      .RetryDocumentPredicate((item, person) => 
      {
          return item.Error.Index == "even-index" && person.FirstName == "Martijn";
      })
      .DroppedDocumentCallback((item, person) => 
      {
          Console.WriteLine($"Unable to index: {item} {person}");
      }));

customise the individual operations in the bulk request before it is dispatched

Index each document into either even-index or odd-index

decide if a document should be retried in the event of a failure

if a document cannot be indexed this delegate is called