Index some documents

edit

Once you have a cluster up and running, you’re ready to index some data. There are a variety of ingest options for Elasticsearch, but in the end they all do the same thing: put JSON documents into an Elasticsearch index.

You can do this directly with a simple PUT request that specifies the index you want to add the document, a unique document ID, and one or more "field": "value" pairs in the request body:

PUT /customer/_doc/1
{
  "name": "John Doe"
}

This request automatically creates the customer index if it doesn’t already exist, adds a new document that has an ID of 1, and stores and indexes the name field.

Since this is a new document, the response shows that the result of the operation was that version 1 of the document was created:

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 26,
  "_primary_term" : 4
}

The new document is available immediately from any node in the cluster. You can retrieve it with a GET request that specifies its document ID:

GET /customer/_doc/1

The response indicates that a document with the specified ID was found and shows the original source fields that were indexed.

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 26,
  "_primary_term" : 4,
  "found" : true,
  "_source" : {
    "name": "John Doe"
  }
}

Indexing documents in bulk

edit

If you have a lot of documents to index, you can submit them in batches with the bulk API. Using bulk to batch document operations is significantly faster than submitting requests individually as it minimizes network roundtrips.

The optimal batch size depends a number of factors: the document size and complexity, the indexing and search load, and the resources available to your cluster. A good place to start is with batches of 1,000 to 5,000 documents and a total payload between 5MB and 15MB. From there, you can experiment to find the sweet spot.

To get some data into Elasticsearch that you can start searching and analyzing:

  1. Download the accounts.json sample data set. The documents in this randomly-generated data set represent user accounts with the following information:

    {
        "account_number": 0,
        "balance": 16623,
        "firstname": "Bradshaw",
        "lastname": "Mckenzie",
        "age": 29,
        "gender": "F",
        "address": "244 Columbus Place",
        "employer": "Euron",
        "email": "bradshawmckenzie@euron.com",
        "city": "Hobucken",
        "state": "CO"
    }
  2. Index the account data into the bank index with the following _bulk request:

    curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"
    curl "localhost:9200/_cat/indices?v"

    The response indicates that 1,000 documents were indexed successfully.

    health status index uuid                   pri rep docs.count docs.deleted store.size pri.store.size
    green open   bank  l7sSYV2cQXmu6_4rJWVIww   5   1       1000            0    128.6kb        128.6kb