- Elasticsearch Guide: other versions:
- Elasticsearch introduction
- Getting started with Elasticsearch
- Set up Elasticsearch
- Installing Elasticsearch
- Configuring Elasticsearch
- Setting JVM options
- Secure settings
- Logging configuration
- Auditing settings
- Cross-cluster replication settings
- Transforms settings
- Index lifecycle management settings
- License settings
- Machine learning settings
- Monitoring settings
- Security settings
- Snapshot lifecycle management settings
- SQL access settings
- Watcher settings
- Important Elasticsearch configuration
- Important System Configuration
- Bootstrap Checks
- Heap size check
- File descriptor check
- Memory lock check
- Maximum number of threads check
- Max file size check
- Maximum size virtual memory check
- Maximum map count check
- Client JVM check
- Use serial collector check
- System call filter check
- OnError and OnOutOfMemoryError checks
- Early-access check
- G1GC check
- All permission check
- Discovery configuration check
- Starting Elasticsearch
- Stopping Elasticsearch
- Adding nodes to your cluster
- Full-cluster restart and rolling restart
- Set up X-Pack
- Configuring X-Pack Java Clients
- Bootstrap Checks for X-Pack
- Upgrade Elasticsearch
- Aggregations
- Metrics Aggregations
- Avg Aggregation
- Weighted Avg Aggregation
- Cardinality Aggregation
- Extended Stats Aggregation
- Geo Bounds Aggregation
- Geo Centroid Aggregation
- Max Aggregation
- Min Aggregation
- Percentiles Aggregation
- Percentile Ranks Aggregation
- Scripted Metric Aggregation
- Stats Aggregation
- String Stats Aggregation
- Sum Aggregation
- Top Hits Aggregation
- Value Count Aggregation
- Median Absolute Deviation Aggregation
- Bucket Aggregations
- Adjacency Matrix Aggregation
- Auto-interval Date Histogram Aggregation
- Children Aggregation
- Composite aggregation
- Date histogram aggregation
- Date Range Aggregation
- Diversified Sampler Aggregation
- Filter Aggregation
- Filters Aggregation
- Geo Distance Aggregation
- GeoHash grid Aggregation
- GeoTile Grid Aggregation
- Global Aggregation
- Histogram Aggregation
- IP Range Aggregation
- Missing Aggregation
- Nested Aggregation
- Parent Aggregation
- Range Aggregation
- Rare Terms Aggregation
- Reverse nested Aggregation
- Sampler Aggregation
- Significant Terms Aggregation
- Significant Text Aggregation
- Terms Aggregation
- Subtleties of bucketing range fields
- Pipeline Aggregations
- Avg Bucket Aggregation
- Derivative Aggregation
- Max Bucket Aggregation
- Min Bucket Aggregation
- Sum Bucket Aggregation
- Stats Bucket Aggregation
- Extended Stats Bucket Aggregation
- Percentiles Bucket Aggregation
- Moving Average Aggregation
- Moving Function Aggregation
- Cumulative Sum Aggregation
- Cumulative Cardinality Aggregation
- Bucket Script Aggregation
- Bucket Selector Aggregation
- Bucket Sort Aggregation
- Serial Differencing Aggregation
- Matrix Aggregations
- Caching heavy aggregations
- Returning only aggregation results
- Aggregation Metadata
- Returning the type of the aggregation
- Indexing aggregation results with transforms
- Metrics Aggregations
- Query DSL
- Search across clusters
- Scripting
- Mapping
- Text analysis
- Overview
- Concepts
- Configure text analysis
- Built-in analyzer reference
- Tokenizer reference
- Char Group Tokenizer
- Classic Tokenizer
- Edge n-gram tokenizer
- Keyword Tokenizer
- Letter Tokenizer
- Lowercase Tokenizer
- N-gram tokenizer
- Path Hierarchy Tokenizer
- Path Hierarchy Tokenizer Examples
- Pattern Tokenizer
- Simple Pattern Tokenizer
- Simple Pattern Split Tokenizer
- Standard Tokenizer
- Thai Tokenizer
- UAX URL Email Tokenizer
- Whitespace Tokenizer
- Token filter reference
- Apostrophe
- ASCII folding
- CJK bigram
- CJK width
- Classic
- Common grams
- Conditional
- Decimal digit
- Delimited payload
- Dictionary decompounder
- Edge n-gram
- Elision
- Fingerprint
- Flatten graph
- Hunspell
- Hyphenation decompounder
- Keep types
- Keep words
- Keyword marker
- Keyword repeat
- KStem
- Length
- Limit token count
- Lowercase
- MinHash
- Multiplexer
- N-gram
- Normalization
- Pattern capture
- Pattern replace
- Phonetic
- Porter stem
- Predicate script
- Remove duplicates
- Reverse
- Shingle
- Snowball
- Stemmer
- Stemmer override
- Stop
- Synonym
- Synonym graph
- Trim
- Truncate
- Unique
- Uppercase
- Word delimiter
- Word delimiter graph
- Character filters reference
- Normalizers
- Modules
- Index modules
- Ingest node
- Pipeline Definition
- Accessing Data in Pipelines
- Conditional Execution in Pipelines
- Handling Failures in Pipelines
- Enrich your data
- Processors
- Append Processor
- Bytes Processor
- Circle Processor
- Convert Processor
- CSV Processor
- Date Processor
- Date Index Name Processor
- Dissect Processor
- Dot Expander Processor
- Drop Processor
- Enrich Processor
- Fail Processor
- Foreach Processor
- GeoIP Processor
- Grok Processor
- Gsub Processor
- HTML Strip Processor
- Inference Processor
- Join Processor
- JSON Processor
- KV Processor
- Lowercase Processor
- Pipeline Processor
- Remove Processor
- Rename Processor
- Script Processor
- Set Processor
- Set Security User Processor
- Split Processor
- Sort Processor
- Trim Processor
- Uppercase Processor
- URL Decode Processor
- User Agent processor
- ILM: Manage the index lifecycle
- SQL access
- Overview
- Getting Started with SQL
- Conventions and Terminology
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
- SQL ODBC
- SQL Client Applications
- SQL Language
- Functions and Operators
- Comparison Operators
- Logical Operators
- Math Operators
- Cast Operators
- LIKE and RLIKE Operators
- Aggregate Functions
- Grouping Functions
- Date/Time and Interval Functions and Operators
- Full-Text Search Functions
- Mathematical Functions
- String Functions
- Type Conversion Functions
- Geo Functions
- Conditional Functions And Expressions
- System Functions
- Reserved keywords
- SQL Limitations
- Monitor a cluster
- Frozen indices
- Roll up or transform your data
- Set up a cluster for high availability
- Snapshot and restore
- Secure a cluster
- Overview
- Configuring security
- User authentication
- Built-in users
- Internal users
- Token-based authentication services
- Realms
- Realm chains
- Active Directory user authentication
- File-based user authentication
- LDAP user authentication
- Native user authentication
- OpenID Connect authentication
- PKI user authentication
- SAML authentication
- Kerberos authentication
- Integrating with other authentication systems
- Enabling anonymous access
- Controlling the user cache
- Configuring SAML single-sign-on on the Elastic Stack
- Configuring single sign-on to the Elastic Stack using OpenID Connect
- User authorization
- Built-in roles
- Defining roles
- Security privileges
- Document level security
- Field level security
- Granting privileges for indices and aliases
- Mapping users and groups to roles
- Setting up field and document level security
- Submitting requests on behalf of other users
- Configuring authorization delegation
- Customizing roles and authorization
- Enabling audit logging
- Encrypting communications
- Restricting connections with IP filtering
- Cross cluster search, clients, and integrations
- Tutorial: Getting started with security
- Tutorial: Encrypting communications
- Troubleshooting
- Some settings are not returned via the nodes settings API
- Authorization exceptions
- Users command fails due to extra arguments
- Users are frequently locked out of Active Directory
- Certificate verification fails for curl on Mac
- SSLHandshakeException causes connections to fail
- Common SSL/TLS exceptions
- Common Kerberos exceptions
- Common SAML issues
- Internal Server Error in Kibana
- Setup-passwords command fails due to connection failure
- Failures due to relocation of the configuration files
- Limitations
- Alerting on cluster and index events
- Command line tools
- How To
- Glossary of terms
- REST APIs
- API conventions
- cat APIs
- Cluster APIs
- Cross-cluster replication APIs
- Document APIs
- Enrich APIs
- Explore API
- Index APIs
- Add index alias
- Analyze
- Clear cache
- Clone index
- Close index
- Create index
- Delete index
- Delete index alias
- Delete index template
- Flush
- Force merge
- Freeze index
- Get field mapping
- Get index
- Get index alias
- Get index settings
- Get index template
- Get mapping
- Index alias exists
- Index exists
- Index recovery
- Index segments
- Index shard stores
- Index stats
- Index template exists
- Open index
- Put index template
- Put mapping
- Refresh
- Rollover index
- Shrink index
- Split index
- Synced flush
- Type exists
- Unfreeze index
- Update index alias
- Update index settings
- Index lifecycle management API
- Ingest APIs
- Info API
- Licensing APIs
- Machine learning anomaly detection APIs
- Add events to calendar
- Add jobs to calendar
- Close jobs
- Create jobs
- Create calendar
- Create datafeeds
- Create filter
- Delete calendar
- Delete datafeeds
- Delete events from calendar
- Delete filter
- Delete forecast
- Delete jobs
- Delete jobs from calendar
- Delete model snapshots
- Delete expired data
- Find file structure
- Flush jobs
- Forecast jobs
- Get buckets
- Get calendars
- Get categories
- Get datafeeds
- Get datafeed statistics
- Get influencers
- Get jobs
- Get job statistics
- Get machine learning info
- Get model snapshots
- Get overall buckets
- Get scheduled events
- Get filters
- Get records
- Open jobs
- Post data to jobs
- Preview datafeeds
- Revert model snapshots
- Set upgrade mode
- Start datafeeds
- Stop datafeeds
- Update datafeeds
- Update filter
- Update jobs
- Update model snapshots
- Machine learning data frame analytics APIs
- Create data frame analytics jobs
- Create inference trained model
- Delete data frame analytics jobs
- Delete inference trained model
- Evaluate data frame analytics
- Explain data frame analytics API
- Get data frame analytics jobs
- Get data frame analytics jobs stats
- Get inference trained model
- Get inference trained model stats
- Start data frame analytics jobs
- Stop data frame analytics jobs
- Migration APIs
- Reload search analyzers
- Rollup APIs
- Search APIs
- Security APIs
- Authenticate
- Change passwords
- Clear cache
- Clear roles cache
- Create API keys
- Create or update application privileges
- Create or update role mappings
- Create or update roles
- Create or update users
- Delegate PKI authentication
- Delete application privileges
- Delete role mappings
- Delete roles
- Delete users
- Disable users
- Enable users
- Get API key information
- Get application privileges
- Get builtin privileges
- Get role mappings
- Get roles
- Get token
- Get users
- Has privileges
- Invalidate API key
- Invalidate token
- OpenID Connect Prepare Authentication API
- OpenID Connect authenticate API
- OpenID Connect logout API
- SAML prepare authentication API
- SAML authenticate API
- SAML logout API
- SAML invalidate API
- SSL certificate
- Snapshot and restore APIs
- Snapshot lifecycle management API
- Transform APIs
- Usage API
- Watcher APIs
- Definitions
- Release highlights
- Breaking changes
- Release notes
- Elasticsearch version 7.6.2
- Elasticsearch version 7.6.1
- Elasticsearch version 7.6.0
- Elasticsearch version 7.5.2
- Elasticsearch version 7.5.1
- Elasticsearch version 7.5.0
- Elasticsearch version 7.4.2
- Elasticsearch version 7.4.1
- Elasticsearch version 7.4.0
- Elasticsearch version 7.3.2
- Elasticsearch version 7.3.1
- Elasticsearch version 7.3.0
- Elasticsearch version 7.2.1
- Elasticsearch version 7.2.0
- Elasticsearch version 7.1.1
- Elasticsearch version 7.1.0
- Elasticsearch version 7.0.0
- Elasticsearch version 7.0.0-rc2
- Elasticsearch version 7.0.0-rc1
- Elasticsearch version 7.0.0-beta1
- Elasticsearch version 7.0.0-alpha2
- Elasticsearch version 7.0.0-alpha1
Percolate query
editPercolate query
editThe percolate
query can be used to match queries
stored in an index. The percolate
query itself
contains the document that will be used as query
to match with the stored queries.
Sample Usage
editCreate an index with two fields:
PUT /my-index { "mappings": { "properties": { "message": { "type": "text" }, "query": { "type": "percolator" } } } }
The message
field is the field used to preprocess the document defined in
the percolator
query before it gets indexed into a temporary index.
The query
field is used for indexing the query documents. It will hold a
json object that represents an actual Elasticsearch query. The query
field
has been configured to use the percolator field type. This field
type understands the query dsl and stores the query in such a way that it can be
used later on to match documents defined on the percolate
query.
Register a query in the percolator:
PUT /my-index/_doc/1?refresh { "query" : { "match" : { "message" : "bonsai tree" } } }
Match a document to the registered percolator queries:
GET /my-index/_search { "query" : { "percolate" : { "field" : "query", "document" : { "message" : "A new bonsai tree in the office" } } } }
The above request will yield the following response:
{ "took": 13, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 0.26152915, "hits": [ { "_index": "my-index", "_type": "_doc", "_id": "1", "_score": 0.26152915, "_source": { "query": { "match": { "message": "bonsai tree" } } }, "fields" : { "_percolator_document_slot" : [0] } } ] } }
The query with id |
|
The |
To provide a simple example, this documentation uses one index my-index
for both the percolate queries and documents.
This set-up can work well when there are just a few percolate queries registered. However, with heavier usage it is recommended
to store queries and documents in separate indices. Please see How it Works Under the Hood for more details.
Parameters
editThe following parameters are required when percolating a document:
|
The field of type |
|
The suffix to be used for the |
|
The source of the document being percolated. |
|
Like the |
|
The type / mapping of the document being percolated. This parameter is deprecated and will be removed in Elasticsearch 8.0. |
Instead of specifying the source of the document being percolated, the source can also be retrieved from an already
stored document. The percolate
query will then internally execute a get request to fetch that document.
In that case the document
parameter can be substituted with the following parameters:
|
The index the document resides in. This is a required parameter. |
|
The type of the document to fetch. This parameter is deprecated and will be removed in Elasticsearch 8.0. |
|
The id of the document to fetch. This is a required parameter. |
|
Optionally, routing to be used to fetch document to percolate. |
|
Optionally, preference to be used to fetch document to percolate. |
|
Optionally, the expected version of the document to be fetched. |
Percolating in a filter context
editIn case you are not interested in the score, better performance can be expected by wrapping
the percolator query in a bool
query’s filter clause or in a constant_score
query:
GET /my-index/_search { "query" : { "constant_score": { "filter": { "percolate" : { "field" : "query", "document" : { "message" : "A new bonsai tree in the office" } } } } } }
At index time terms are extracted from the percolator query and the percolator
can often determine whether a query matches just by looking at those extracted
terms. However, computing scores requires to deserialize each matching query
and run it against the percolated document, which is a much more expensive
operation. Hence if computing scores is not required the percolate
query
should be wrapped in a constant_score
query or a bool
query’s filter clause.
Note that the percolate
query never gets cached by the query cache.
Percolating multiple documents
editThe percolate
query can match multiple documents simultaneously with the indexed percolator queries.
Percolating multiple documents in a single request can improve performance as queries only need to be parsed and
matched once instead of multiple times.
The _percolator_document_slot
field that is being returned with each matched percolator query is important when percolating
multiple documents simultaneously. It indicates which documents matched with a particular percolator query. The numbers
correlate with the slot in the documents
array specified in the percolate
query.
GET /my-index/_search { "query" : { "percolate" : { "field" : "query", "documents" : [ { "message" : "bonsai tree" }, { "message" : "new tree" }, { "message" : "the office" }, { "message" : "office tree" } ] } } }
{ "took": 13, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 0.7093853, "hits": [ { "_index": "my-index", "_type": "_doc", "_id": "1", "_score": 0.7093853, "_source": { "query": { "match": { "message": "bonsai tree" } } }, "fields" : { "_percolator_document_slot" : [0, 1, 3] } } ] } }
The |
Percolating an Existing Document
editIn order to percolate a newly indexed document, the percolate
query can be used. Based on the response
from an index request, the _id
and other meta information can be used to immediately percolate the newly added
document.
Example
editBased on the previous example.
Index the document we want to percolate:
PUT /my-index/_doc/2 { "message" : "A new bonsai tree in the office" }
Index response:
{ "_index": "my-index", "_type": "_doc", "_id": "2", "_version": 1, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "result": "created", "_seq_no" : 1, "_primary_term" : 1 }
Percolating an existing document, using the index response as basis to build to new search request:
GET /my-index/_search { "query" : { "percolate" : { "field": "query", "index" : "my-index", "id" : "2", "version" : 1 } } }
The version is optional, but useful in certain cases. We can ensure that we are trying to percolate the document we just have indexed. A change may be made after we have indexed, and if that is the case the search request would fail with a version conflict error. |
The search response returned is identical as in the previous example.
Percolate query and highlighting
editThe percolate
query is handled in a special way when it comes to highlighting. The queries hits are used
to highlight the document that is provided in the percolate
query. Whereas with regular highlighting the query in
the search request is used to highlight the hits.
Example
editThis example is based on the mapping of the first example.
Save a query:
PUT /my-index/_doc/3?refresh { "query" : { "match" : { "message" : "brown fox" } } }
Save another query:
PUT /my-index/_doc/4?refresh { "query" : { "match" : { "message" : "lazy dog" } } }
Execute a search request with the percolate
query and highlighting enabled:
GET /my-index/_search { "query" : { "percolate" : { "field": "query", "document" : { "message" : "The quick brown fox jumps over the lazy dog" } } }, "highlight": { "fields": { "message": {} } } }
This will yield the following response.
{ "took": 7, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 2, "relation": "eq" }, "max_score": 0.26152915, "hits": [ { "_index": "my-index", "_type": "_doc", "_id": "3", "_score": 0.26152915, "_source": { "query": { "match": { "message": "brown fox" } } }, "highlight": { "message": [ "The quick <em>brown</em> <em>fox</em> jumps over the lazy dog" ] }, "fields" : { "_percolator_document_slot" : [0] } }, { "_index": "my-index", "_type": "_doc", "_id": "4", "_score": 0.26152915, "_source": { "query": { "match": { "message": "lazy dog" } } }, "highlight": { "message": [ "The quick brown fox jumps over the <em>lazy</em> <em>dog</em>" ] }, "fields" : { "_percolator_document_slot" : [0] } } ] } }
Instead of the query in the search request highlighting the percolator hits, the percolator queries are highlighting
the document defined in the percolate
query.
When percolating multiple documents at the same time like the request below then the highlight response is different:
GET /my-index/_search { "query" : { "percolate" : { "field": "query", "documents" : [ { "message" : "bonsai tree" }, { "message" : "new tree" }, { "message" : "the office" }, { "message" : "office tree" } ] } }, "highlight": { "fields": { "message": {} } } }
The slightly different response:
{ "took": 13, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 0.7093853, "hits": [ { "_index": "my-index", "_type": "_doc", "_id": "1", "_score": 0.7093853, "_source": { "query": { "match": { "message": "bonsai tree" } } }, "fields" : { "_percolator_document_slot" : [0, 1, 3] }, "highlight" : { "0_message" : [ "<em>bonsai</em> <em>tree</em>" ], "3_message" : [ "office <em>tree</em>" ], "1_message" : [ "new <em>tree</em>" ] } } ] } }
The highlight fields have been prefixed with the document slot they belong to, in order to know which highlight field belongs to what document. |
Specifying multiple percolate queries
editIt is possible to specify multiple percolate
queries in a single search request:
GET /my-index/_search { "query" : { "bool" : { "should" : [ { "percolate" : { "field" : "query", "document" : { "message" : "bonsai tree" }, "name": "query1" } }, { "percolate" : { "field" : "query", "document" : { "message" : "tulip flower" }, "name": "query2" } } ] } } }
The |
The _percolator_document_slot
field name will be suffixed with what is specified in the _name
parameter.
If that isn’t specified then the field
parameter will be used, which in this case will result in ambiguity.
The above search request returns a response similar to this:
{ "took": 13, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 0.26152915, "hits": [ { "_index": "my-index", "_type": "_doc", "_id": "1", "_score": 0.26152915, "_source": { "query": { "match": { "message": "bonsai tree" } } }, "fields" : { "_percolator_document_slot_query1" : [0] } } ] } }
The |
How it Works Under the Hood
editWhen indexing a document into an index that has the percolator field type mapping configured, the query part of the document gets parsed into a Lucene query and is stored into the Lucene index. A binary representation of the query gets stored, but also the query’s terms are analyzed and stored into an indexed field.
At search time, the document specified in the request gets parsed into a Lucene document and is stored in a in-memory temporary Lucene index. This in-memory index can just hold this one document and it is optimized for that. After this a special query is built based on the terms in the in-memory index that select candidate percolator queries based on their indexed query terms. These queries are then evaluated by the in-memory index if they actually match.
The selecting of candidate percolator queries matches is an important performance optimization during the execution
of the percolate
query as it can significantly reduce the number of candidate matches the in-memory index needs to
evaluate. The reason the percolate
query can do this is because during indexing of the percolator queries the query
terms are being extracted and indexed with the percolator query. Unfortunately the percolator cannot extract terms from
all queries (for example the wildcard
or geo_shape
query) and as a result of that in certain cases the percolator
can’t do the selecting optimization (for example if an unsupported query is defined in a required clause of a boolean query
or the unsupported query is the only query in the percolator document). These queries are marked by the percolator and
can be found by running the following search:
GET /_search { "query": { "term" : { "query.extraction_result" : "failed" } } }
The above example assumes that there is a query
field of type
percolator
in the mappings.
Given the design of percolation, it often makes sense to use separate indices for the percolate queries and documents being percolated, as opposed to a single index as we do in examples. There are a few benefits to this approach:
- Because percolate queries contain a different set of fields from the percolated documents, using two separate indices allows for fields to be stored in a denser, more efficient way.
- Percolate queries do not scale in the same way as other queries, so percolation performance may benefit from using a different index configuration, like the number of primary shards.
On this page