- Elasticsearch Guide: other versions:
- Getting Started
- Set up Elasticsearch
- Installing Elasticsearch
- Configuring Elasticsearch
- Important Elasticsearch configuration
- Important System Configuration
- Bootstrap Checks
- Heap size check
- File descriptor check
- Memory lock check
- Maximum number of threads check
- Max file size check
- Maximum size virtual memory check
- Maximum map count check
- Client JVM check
- Use serial collector check
- System call filter check
- OnError and OnOutOfMemoryError checks
- Early-access check
- G1GC check
- All permission check
- Starting Elasticsearch
- Stopping Elasticsearch
- Adding nodes to your cluster
- Installing X-Pack
- Set up X-Pack
- Configuring X-Pack Java Clients
- X-Pack Settings
- Bootstrap Checks for X-Pack
- Upgrade Elasticsearch
- API Conventions
- Document APIs
- Search APIs
- Aggregations
- Metrics Aggregations
- Avg Aggregation
- Cardinality Aggregation
- Extended Stats Aggregation
- Geo Bounds Aggregation
- Geo Centroid Aggregation
- Max Aggregation
- Min Aggregation
- Percentiles Aggregation
- Percentile Ranks Aggregation
- Scripted Metric Aggregation
- Stats Aggregation
- Sum Aggregation
- Top Hits Aggregation
- Value Count Aggregation
- Bucket Aggregations
- Adjacency Matrix Aggregation
- Children Aggregation
- Composite Aggregation
- Date Histogram Aggregation
- Date Range Aggregation
- Diversified Sampler Aggregation
- Filter Aggregation
- Filters Aggregation
- Geo Distance Aggregation
- GeoHash grid Aggregation
- Global Aggregation
- Histogram Aggregation
- IP Range Aggregation
- Missing Aggregation
- Nested Aggregation
- Range Aggregation
- Reverse nested Aggregation
- Sampler Aggregation
- Significant Terms Aggregation
- Significant Text Aggregation
- Terms Aggregation
- Pipeline Aggregations
- Avg Bucket Aggregation
- Derivative Aggregation
- Max Bucket Aggregation
- Min Bucket Aggregation
- Sum Bucket Aggregation
- Stats Bucket Aggregation
- Extended Stats Bucket Aggregation
- Percentiles Bucket Aggregation
- Moving Average Aggregation
- Cumulative Sum Aggregation
- Bucket Script Aggregation
- Bucket Selector Aggregation
- Bucket Sort Aggregation
- Serial Differencing Aggregation
- Matrix Aggregations
- Caching heavy aggregations
- Returning only aggregation results
- Aggregation Metadata
- Returning the type of the aggregation
- Metrics Aggregations
- Indices APIs
- Create Index
- Delete Index
- Get Index
- Indices Exists
- Open / Close Index API
- Shrink Index
- Split Index
- Rollover Index
- Put Mapping
- Get Mapping
- Get Field Mapping
- Types Exists
- Index Aliases
- Update Indices Settings
- Get Settings
- Analyze
- Index Templates
- Indices Stats
- Indices Segments
- Indices Recovery
- Indices Shard Stores
- Clear Cache
- Flush
- Refresh
- Force Merge
- cat APIs
- Cluster APIs
- Query DSL
- Mapping
- Analysis
- Anatomy of an analyzer
- Testing analyzers
- Analyzers
- Normalizers
- Tokenizers
- Standard Tokenizer
- Letter Tokenizer
- Lowercase Tokenizer
- Whitespace Tokenizer
- UAX URL Email Tokenizer
- Classic Tokenizer
- Thai Tokenizer
- NGram Tokenizer
- Edge NGram Tokenizer
- Keyword Tokenizer
- Pattern Tokenizer
- Simple Pattern Tokenizer
- Simple Pattern Split Tokenizer
- Path Hierarchy Tokenizer
- Path Hierarchy Tokenizer Examples
- Token Filters
- Standard Token Filter
- ASCII Folding Token Filter
- Flatten Graph Token Filter
- Length Token Filter
- Lowercase Token Filter
- Uppercase Token Filter
- NGram Token Filter
- Edge NGram Token Filter
- Porter Stem Token Filter
- Shingle Token Filter
- Stop Token Filter
- Word Delimiter Token Filter
- Word Delimiter Graph Token Filter
- Stemmer Token Filter
- Stemmer Override Token Filter
- Keyword Marker Token Filter
- Keyword Repeat Token Filter
- KStem Token Filter
- Snowball Token Filter
- Phonetic Token Filter
- Synonym Token Filter
- Synonym Graph Token Filter
- Compound Word Token Filters
- Reverse Token Filter
- Elision Token Filter
- Truncate Token Filter
- Unique Token Filter
- Pattern Capture Token Filter
- Pattern Replace Token Filter
- Trim Token Filter
- Limit Token Count Token Filter
- Hunspell Token Filter
- Common Grams Token Filter
- Normalization Token Filter
- CJK Width Token Filter
- CJK Bigram Token Filter
- Delimited Payload Token Filter
- Keep Words Token Filter
- Keep Types Token Filter
- Classic Token Filter
- Apostrophe Token Filter
- Decimal Digit Token Filter
- Fingerprint Token Filter
- Minhash Token Filter
- Character Filters
- Modules
- Index Modules
- Ingest Node
- Pipeline Definition
- Ingest APIs
- Accessing Data in Pipelines
- Handling Failures in Pipelines
- Processors
- Append Processor
- Convert Processor
- Date Processor
- Date Index Name Processor
- Fail Processor
- Foreach Processor
- Grok Processor
- Gsub Processor
- Join Processor
- JSON Processor
- KV Processor
- Lowercase Processor
- Remove Processor
- Rename Processor
- Script Processor
- Set Processor
- Split Processor
- Sort Processor
- Trim Processor
- Uppercase Processor
- Dot Expander Processor
- URL Decode Processor
- SQL Access
- Monitor a cluster
- Rolling up historical data
- Secure a cluster
- Overview
- Configuring security
- Encrypting communications in Elasticsearch
- Encrypting communications in an Elasticsearch Docker container
- Enabling cipher suites for stronger encryption
- Separating node-to-node and client traffic
- Configuring an Active Directory realm
- Configuring a file realm
- Configuring an LDAP realm
- Configuring a native realm
- Configuring a PKI realm
- Configuring a SAML realm
- Security settings
- Auditing settings
- Getting started with security
- How security works
- User authentication
- Configuring SAML single-sign-on on the Elastic Stack
- User authorization
- Auditing security events
- Encrypting communications
- Restricting connections with IP filtering
- Cross cluster search, tribe, clients, and integrations
- Reference
- Troubleshooting
- Can’t log in after upgrading to 6.3.2
- Some settings are not returned via the nodes settings API
- Authorization exceptions
- Users command fails due to extra arguments
- Users are frequently locked out of Active Directory
- Certificate verification fails for curl on Mac
- SSLHandshakeException causes connections to fail
- Common SSL/TLS exceptions
- Common SAML issues
- Internal Server Error in Kibana
- Setup-passwords command fails due to connection failure
- Failures due to relocation of the configuration files
- Limitations
- Alerting on Cluster and Index Events
- X-Pack APIs
- Info API
- Explore API
- Licensing APIs
- Migration APIs
- Machine Learning APIs
- Add Events to Calendar
- Add Jobs to Calendar
- Close Jobs
- Create Calendar
- Create Datafeeds
- Create Jobs
- Delete Calendar
- Delete Datafeeds
- Delete Events from Calendar
- Delete Jobs
- Delete Jobs from Calendar
- Delete Model Snapshots
- Flush Jobs
- Forecast Jobs
- Get Calendars
- Get Buckets
- Get Overall Buckets
- Get Categories
- Get Datafeeds
- Get Datafeed Statistics
- Get Influencers
- Get Jobs
- Get Job Statistics
- Get Model Snapshots
- Get Scheduled Events
- Get Records
- Open Jobs
- Post Data to Jobs
- Preview Datafeeds
- Revert Model Snapshots
- Start Datafeeds
- Stop Datafeeds
- Update Datafeeds
- Update Jobs
- Update Model Snapshots
- Rollup APIs
- Security APIs
- Authenticate API
- Change passwords API
- Clear Cache API
- Create or update role mappings API
- Clear roles cache API
- Create or update roles API
- Create or update users API
- Delete role mappings API
- Delete roles API
- Delete users API
- Disable users API
- Enable users API
- Get role mappings API
- Get roles API
- Get token API
- Get users API
- Privilege APIs
- Invalidate token API
- SSL Certificate API
- Watcher APIs
- Definitions
- Command line tools
- How To
- Testing
- Glossary of terms
- Release Highlights
- Breaking changes
- Release Notes
- Elasticsearch version 6.3.2
- Elasticsearch version 6.3.1
- Elasticsearch version 6.3.0
- Elasticsearch version 6.2.4
- Elasticsearch version 6.2.3
- Elasticsearch version 6.2.2
- Elasticsearch version 6.2.1
- Elasticsearch version 6.2.0
- Elasticsearch version 6.1.4
- Elasticsearch version 6.1.3
- Elasticsearch version 6.1.2
- Elasticsearch version 6.1.1
- Elasticsearch version 6.1.0
- Elasticsearch version 6.0.1
- Elasticsearch version 6.0.0
- Elasticsearch version 6.0.0-rc2
- Elasticsearch version 6.0.0-rc1
- Elasticsearch version 6.0.0-beta2
- Elasticsearch version 6.0.0-beta1
- Elasticsearch version 6.0.0-alpha2
- Elasticsearch version 6.0.0-alpha1
- Elasticsearch version 6.0.0-alpha1 (Changes previously released in 5.x)
Percolate Query
editPercolate Query
editThe percolate
query can be used to match queries
stored in an index. The percolate
query itself
contains the document that will be used as query
to match with the stored queries.
Sample Usage
editCreate an index with two fields:
PUT /my-index { "mappings": { "_doc": { "properties": { "message": { "type": "text" }, "query": { "type": "percolator" } } } } }
The message
field is the field used to preprocess the document defined in
the percolator
query before it gets indexed into a temporary index.
The query
field is used for indexing the query documents. It will hold a
json object that represents an actual Elasticsearch query. The query
field
has been configured to use the percolator field type. This field
type understands the query dsl and stores the query in such a way that it can be
used later on to match documents defined on the percolate
query.
Register a query in the percolator:
PUT /my-index/_doc/1?refresh { "query" : { "match" : { "message" : "bonsai tree" } } }
Match a document to the registered percolator queries:
GET /my-index/_search { "query" : { "percolate" : { "field" : "query", "document" : { "message" : "A new bonsai tree in the office" } } } }
The above request will yield the following response:
{ "took": 13, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped" : 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.5753642, "hits": [ { "_index": "my-index", "_type": "_doc", "_id": "1", "_score": 0.5753642, "_source": { "query": { "match": { "message": "bonsai tree" } } }, "fields" : { "_percolator_document_slot" : [0] } } ] } }
The query with id |
|
The |
Parameters
editThe following parameters are required when percolating a document:
|
The field of type |
|
The suffix to be used for the |
|
The source of the document being percolated. |
|
Like the |
|
The type / mapping of the document being percolated. This setting is deprecated and only required for indices created before 6.0 |
Instead of specifying the source of the document being percolated, the source can also be retrieved from an already
stored document. The percolate
query will then internally execute a get request to fetch that document.
In that case the document
parameter can be substituted with the following parameters:
|
The index the document resides in. This is a required parameter. |
|
The type of the document to fetch. This is a required parameter. |
|
The id of the document to fetch. This is a required parameter. |
|
Optionally, routing to be used to fetch document to percolate. |
|
Optionally, preference to be used to fetch document to percolate. |
|
Optionally, the expected version of the document to be fetched. |
Percolating in a filter context
editIn case you are not interested in the score, better performance can be expected by wrapping
the percolator query in a bool
query’s filter clause or in a constant_score
query:
GET /my-index/_search { "query" : { "constant_score": { "filter": { "percolate" : { "field" : "query", "document" : { "message" : "A new bonsai tree in the office" } } } } } }
At index time terms are extracted from the percolator query and the percolator
can often determine whether a query matches just by looking at those extracted
terms. However, computing scores requires to deserialize each matching query
and run it against the percolated document, which is a much more expensive
operation. Hence if computing scores is not required the percolate
query
should be wrapped in a constant_score
query or a bool
query’s filter clause.
Note that the percolate
query never gets cached by the query cache.
Percolating multiple documents
editThe percolate
query can match multiple documents simultaneously with the indexed percolator queries.
Percolating multiple documents in a single request can improve performance as queries only need to be parsed and
matched once instead of multiple times.
The _percolator_document_slot
field that is being returned with each matched percolator query is important when percolating
multiple documents simultaneously. It indicates which documents matched with a particular percolator query. The numbers
correlate with the slot in the documents
array specified in the percolate
query.
GET /my-index/_search { "query" : { "percolate" : { "field" : "query", "documents" : [ { "message" : "bonsai tree" }, { "message" : "new tree" }, { "message" : "the office" }, { "message" : "office tree" } ] } } }
{ "took": 13, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped" : 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1.5606477, "hits": [ { "_index": "my-index", "_type": "_doc", "_id": "1", "_score": 1.5606477, "_source": { "query": { "match": { "message": "bonsai tree" } } }, "fields" : { "_percolator_document_slot" : [0, 1, 3] } } ] } }
The |
Percolating an Existing Document
editIn order to percolate a newly indexed document, the percolate
query can be used. Based on the response
from an index request, the _id
and other meta information can be used to immediately percolate the newly added
document.
Example
editBased on the previous example.
Index the document we want to percolate:
PUT /my-index/_doc/2 { "message" : "A new bonsai tree in the office" }
Index response:
{ "_index": "my-index", "_type": "_doc", "_id": "2", "_version": 1, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "result": "created", "_seq_no" : 0, "_primary_term" : 1 }
Percolating an existing document, using the index response as basis to build to new search request:
GET /my-index/_search { "query" : { "percolate" : { "field": "query", "index" : "my-index", "type" : "_doc", "id" : "2", "version" : 1 } } }
The version is optional, but useful in certain cases. We can ensure that we are trying to percolate the document we just have indexed. A change may be made after we have indexed, and if that is the case the search request would fail with a version conflict error. |
The search response returned is identical as in the previous example.
Percolate query and highlighting
editThe percolate
query is handled in a special way when it comes to highlighting. The queries hits are used
to highlight the document that is provided in the percolate
query. Whereas with regular highlighting the query in
the search request is used to highlight the hits.
Example
editThis example is based on the mapping of the first example.
Save a query:
PUT /my-index/_doc/3?refresh { "query" : { "match" : { "message" : "brown fox" } } }
Save another query:
PUT /my-index/_doc/4?refresh { "query" : { "match" : { "message" : "lazy dog" } } }
Execute a search request with the percolate
query and highlighting enabled:
GET /my-index/_search { "query" : { "percolate" : { "field": "query", "document" : { "message" : "The quick brown fox jumps over the lazy dog" } } }, "highlight": { "fields": { "message": {} } } }
This will yield the following response.
{ "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped" : 0, "failed": 0 }, "hits": { "total": 2, "max_score": 0.5753642, "hits": [ { "_index": "my-index", "_type": "_doc", "_id": "4", "_score": 0.5753642, "_source": { "query": { "match": { "message": "lazy dog" } } }, "highlight": { "message": [ "The quick brown fox jumps over the <em>lazy</em> <em>dog</em>" ] }, "fields" : { "_percolator_document_slot" : [0] } }, { "_index": "my-index", "_type": "_doc", "_id": "3", "_score": 0.5753642, "_source": { "query": { "match": { "message": "brown fox" } } }, "highlight": { "message": [ "The quick <em>brown</em> <em>fox</em> jumps over the lazy dog" ] }, "fields" : { "_percolator_document_slot" : [0] } } ] } }
Instead of the query in the search request highlighting the percolator hits, the percolator queries are highlighting
the document defined in the percolate
query.
When percolating multiple documents at the same time like the request below then the highlight response is different:
GET /my-index/_search { "query" : { "percolate" : { "field": "query", "documents" : [ { "message" : "bonsai tree" }, { "message" : "new tree" }, { "message" : "the office" }, { "message" : "office tree" } ] } }, "highlight": { "fields": { "message": {} } } }
The slightly different response:
{ "took": 13, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped" : 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1.5606477, "hits": [ { "_index": "my-index", "_type": "_doc", "_id": "1", "_score": 1.5606477, "_source": { "query": { "match": { "message": "bonsai tree" } } }, "fields" : { "_percolator_document_slot" : [0, 1, 3] }, "highlight" : { "0_message" : [ "<em>bonsai</em> <em>tree</em>" ], "3_message" : [ "office <em>tree</em>" ], "1_message" : [ "new <em>tree</em>" ] } } ] } }
The highlight fields have been prefixed with the document slot they belong to, in order to know which highlight field belongs to what document. |
Specifying multiple percolate queries
editIt is possible to specify multiple percolate
queries in a single search request:
GET /my-index/_search { "query" : { "bool" : { "should" : [ { "percolate" : { "field" : "query", "document" : { "message" : "bonsai tree" }, "name": "query1" } }, { "percolate" : { "field" : "query", "document" : { "message" : "tulip flower" }, "name": "query2" } } ] } } }
The |
The _percolator_document_slot
field name will be suffixed with what is specified in the _name
parameter.
If that isn’t specified then the field
parameter will be used, which in this case will result in ambiguity.
The above search request returns a response similar to this:
{ "took": 13, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped" : 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.5753642, "hits": [ { "_index": "my-index", "_type": "_doc", "_id": "1", "_score": 0.5753642, "_source": { "query": { "match": { "message": "bonsai tree" } } }, "fields" : { "_percolator_document_slot_query1" : [0] } } ] } }
The |
How it Works Under the Hood
editWhen indexing a document into an index that has the percolator field type mapping configured, the query part of the document gets parsed into a Lucene query and is stored into the Lucene index. A binary representation of the query gets stored, but also the query’s terms are analyzed and stored into an indexed field.
At search time, the document specified in the request gets parsed into a Lucene document and is stored in a in-memory temporary Lucene index. This in-memory index can just hold this one document and it is optimized for that. After this a special query is built based on the terms in the in-memory index that select candidate percolator queries based on their indexed query terms. These queries are then evaluated by the in-memory index if they actually match.
The selecting of candidate percolator queries matches is an important performance optimization during the execution
of the percolate
query as it can significantly reduce the number of candidate matches the in-memory index needs to
evaluate. The reason the percolate
query can do this is because during indexing of the percolator queries the query
terms are being extracted and indexed with the percolator query. Unfortunately the percolator cannot extract terms from
all queries (for example the wildcard
or geo_shape
query) and as a result of that in certain cases the percolator
can’t do the selecting optimization (for example if an unsupported query is defined in a required clause of a boolean query
or the unsupported query is the only query in the percolator document). These queries are marked by the percolator and
can be found by running the following search:
GET /_search { "query": { "term" : { "query.extraction_result" : "failed" } } }
The above example assumes that there is a query
field of type
percolator
in the mappings.
On this page