Talking to Elasticsearch

edit

How you talk to Elasticsearch depends on whether you are using Java.

Java API

edit

If you are using Java, Elasticsearch comes with two built-in clients that you can use in your code:

Node client
The node client joins a local cluster as a non data node. In other words, it doesn’t hold any data itself, but it knows what data lives on which node in the cluster, and can forward requests directly to the correct node.
Transport client
The lighter-weight transport client can be used to send requests to a remote cluster. It doesn’t join the cluster itself, but simply forwards requests to a node in the cluster.

Both Java clients talk to the cluster over port 9300, using the native Elasticsearch transport protocol. The nodes in the cluster also communicate with each other over port 9300. If this port is not open, your nodes will not be able to form a cluster.

The Java client must be from the same major version of Elasticsearch as the nodes; otherwise, they may not be able to understand each other.

More information about the Java clients can be found in Elasticsearch Clients.

RESTful API with JSON over HTTP

edit

All other languages can communicate with Elasticsearch over port 9200 using a RESTful API, accessible with your favorite web client. In fact, as you have seen, you can even talk to Elasticsearch from the command line by using the curl command.

Elasticsearch provides official clients for several languages—​Groovy, JavaScript, .NET, PHP, Perl, Python, and Ruby—​and there are numerous community-provided clients and integrations, all of which can be found in Elasticsearch Clients.

A request to Elasticsearch consists of the same parts as any HTTP request:

curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'

The parts marked with < > above are:

VERB

The appropriate HTTP method or verb: GET, POST, PUT, HEAD, or DELETE.

PROTOCOL

Either http or https (if you have an https proxy in front of Elasticsearch.)

HOST

The hostname of any node in your Elasticsearch cluster, or localhost for a node on your local machine.

PORT

The port running the Elasticsearch HTTP service, which defaults to 9200.

PATH

API Endpoint (for example _count will return the number of documents in the cluster). Path may contain multiple components, such as _cluster/stats or _nodes/stats/jvm

QUERY_STRING

Any optional query-string parameters (for example ?pretty will pretty-print the JSON response to make it easier to read.)

BODY

A JSON-encoded request body (if the request needs one.)

For instance, to count the number of documents in the cluster, we could use this:

curl -XGET 'http://localhost:9200/_count?pretty' -d '
{
    "query": {
        "match_all": {}
    }
}
'

Elasticsearch returns an HTTP status code like 200 OK and (except for HEAD requests) a JSON-encoded response body. The preceding curl request would respond with a JSON body like the following:

{
    "count" : 0,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    }
}

We don’t see the HTTP headers in the response because we didn’t ask curl to display them. To see the headers, use the curl command with the -i switch:

curl -i -XGET 'localhost:9200/'

For the rest of the book, we will show these curl examples using a shorthand format that leaves out all the bits that are the same in every request, like the hostname and port, and the curl command itself. Instead of showing a full request like

curl -XGET 'localhost:9200/_count?pretty' -d '
{
    "query": {
        "match_all": {}
    }
}'

we will show it in this shorthand format:

GET /_count
{
    "query": {
        "match_all": {}
    }
}

In fact, this is the same format that is used by the Sense console that we installed with Marvel. If in the online version of this book, you can open and run this code example in Sense by clicking the View in Sense link above.