Cross-cluster search

edit

The cross-cluster search feature allows any node to act as a federated client across multiple clusters. In contrast to the tribe node feature, a cross-cluster search node won’t join the remote cluster, instead it connects to a remote cluster in a light fashion in order to execute federated search requests.

Using cross-cluster search

edit

Cross-cluster search requires configuring remote clusters.

PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "cluster_one": {
          "seeds": [
            "127.0.0.1:9300"
          ]
        },
        "cluster_two": {
          "seeds": [
            "127.0.0.1:9301"
          ]
        },
        "cluster_three": {
          "seeds": [
            "127.0.0.1:9302"
          ]
        }
      }
    }
  }
}

To search the twitter index on remote cluster cluster_one the index name must be prefixed with the cluster alias separated by a : character:

GET /cluster_one:twitter/_search
{
  "query": {
    "match": {
      "user": "kimchy"
    }
  }
}
{
  "took": 150,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0,
    "skipped": 0
  },
  "_clusters": {
    "total": 1,
    "successful": 1,
    "skipped": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "cluster_one:twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 1,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      }
    ]
  }
}

In contrast to the tribe feature cross cluster search can also search indices with the same name on different clusters:

GET /cluster_one:twitter,twitter/_search
{
  "query": {
    "match": {
      "user": "kimchy"
    }
  }
}

Search results are disambiguated the same way as the indices are disambiguated in the request. Even if index names are identical these indices will be treated as different indices when results are merged. All results retrieved from a remote index will be prefixed with their remote cluster name:

{
  "took": 150,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0,
    "skipped": 0
  },
  "_clusters": {
    "total": 2,
    "successful": 2,
    "skipped": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "cluster_one:twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 1,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      },
      {
        "_index": "twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 2,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      }
    ]
  }
}

Skipping disconnected clusters

edit

By default all remote clusters that are searched via cross-cluster search need to be available when the search request is executed, otherwise the whole request fails and no search results are returned despite some of the clusters are available. Remote clusters can be made optional through the boolean skip_unavailable setting, set to false by default.

PUT _cluster/settings
{
  "persistent": {
    "cluster.remote.cluster_two.skip_unavailable": true 
  }
}

cluster_two is made optional

GET /cluster_one:twitter,cluster_two:twitter,twitter/_search 
{
  "query": {
    "match": {
      "user": "kimchy"
    }
  }
}

Search against the twitter index in cluster_one, cluster_two and also locally

{
  "took": 150,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0,
    "skipped": 0
  },
  "_clusters": { 
    "total": 3,
    "successful": 2,
    "skipped": 1
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "cluster_one:twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 1,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      },
      {
        "_index": "twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 2,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      }
    ]
  }
}

The clusters section indicates that one cluster was unavailable and got skipped