- Elasticsearch Guide: other versions:
- What’s new in 8.17
- Elasticsearch basics
- Quick starts
- Set up Elasticsearch
- Run Elasticsearch locally
- Installing Elasticsearch
- Configuring Elasticsearch
- Important Elasticsearch configuration
- Secure settings
- Auditing settings
- Circuit breaker settings
- Cluster-level shard allocation and routing settings
- Miscellaneous cluster settings
- Cross-cluster replication settings
- Discovery and cluster formation settings
- Data stream lifecycle settings
- Field data cache settings
- Local gateway settings
- Health Diagnostic settings
- Index lifecycle management settings
- Index management settings
- Index recovery settings
- Indexing buffer settings
- Inference settings
- License settings
- Machine learning settings
- Monitoring settings
- Node settings
- Networking
- Node query cache settings
- Path settings
- Search settings
- Security settings
- Shard request cache settings
- Snapshot and restore settings
- Transforms settings
- Thread pools
- Watcher settings
- Set JVM options
- Important system configuration
- Bootstrap Checks
- Heap size check
- File descriptor check
- Memory lock check
- Maximum number of threads check
- Max file size check
- Maximum size virtual memory check
- Maximum map count check
- Client JVM check
- Use serial collector check
- System call filter check
- OnError and OnOutOfMemoryError checks
- Early-access check
- All permission check
- Discovery configuration check
- Bootstrap Checks for X-Pack
- Starting Elasticsearch
- Stopping Elasticsearch
- Discovery and cluster formation
- Add and remove nodes in your cluster
- Full-cluster restart and rolling restart
- Remote clusters
- Plugins
- Upgrade Elasticsearch
- Index modules
- Mapping
- Dynamic mapping
- Explicit mapping
- Runtime fields
- Field data types
- Aggregate metric
- Alias
- Arrays
- Binary
- Boolean
- Completion
- Date
- Date nanoseconds
- Dense vector
- Flattened
- Geopoint
- Geoshape
- Histogram
- IP
- Join
- Keyword
- Nested
- Numeric
- Object
- Pass-through object
- Percolator
- Point
- Range
- Rank feature
- Rank features
- Search-as-you-type
- Semantic text
- Shape
- Sparse vector
- Text
- Token count
- Unsigned long
- Version
- Metadata fields
- Mapping parameters
analyzer
coerce
copy_to
doc_values
dynamic
eager_global_ordinals
enabled
format
ignore_above
index.mapping.ignore_above
ignore_malformed
index
index_options
index_phrases
index_prefixes
meta
fields
normalizer
norms
null_value
position_increment_gap
properties
search_analyzer
similarity
store
subobjects
term_vector
- Mapping limit settings
- Removal of mapping types
- Text analysis
- Overview
- Concepts
- Configure text analysis
- Built-in analyzer reference
- Tokenizer reference
- Token filter reference
- Apostrophe
- ASCII folding
- CJK bigram
- CJK width
- Classic
- Common grams
- Conditional
- Decimal digit
- Delimited payload
- Dictionary decompounder
- Edge n-gram
- Elision
- Fingerprint
- Flatten graph
- Hunspell
- Hyphenation decompounder
- Keep types
- Keep words
- Keyword marker
- Keyword repeat
- KStem
- Length
- Limit token count
- Lowercase
- MinHash
- Multiplexer
- N-gram
- Normalization
- Pattern capture
- Pattern replace
- Phonetic
- Porter stem
- Predicate script
- Remove duplicates
- Reverse
- Shingle
- Snowball
- Stemmer
- Stemmer override
- Stop
- Synonym
- Synonym graph
- Trim
- Truncate
- Unique
- Uppercase
- Word delimiter
- Word delimiter graph
- Character filters reference
- Normalizers
- Index templates
- Data streams
- Ingest pipelines
- Example: Parse logs
- Enrich your data
- Processor reference
- Append
- Attachment
- Bytes
- Circle
- Community ID
- Convert
- CSV
- Date
- Date index name
- Dissect
- Dot expander
- Drop
- Enrich
- Fail
- Fingerprint
- Foreach
- Geo-grid
- GeoIP
- Grok
- Gsub
- HTML strip
- Inference
- IP Location
- Join
- JSON
- KV
- Lowercase
- Network direction
- Pipeline
- Redact
- Registered domain
- Remove
- Rename
- Reroute
- Script
- Set
- Set security user
- Sort
- Split
- Terminate
- Trim
- Uppercase
- URL decode
- URI parts
- User agent
- Ingest pipelines in Search
- Aliases
- Search your data
- Re-ranking
- Query DSL
- Aggregations
- Bucket aggregations
- Adjacency matrix
- Auto-interval date histogram
- Categorize text
- Children
- Composite
- Date histogram
- Date range
- Diversified sampler
- Filter
- Filters
- Frequent item sets
- Geo-distance
- Geohash grid
- Geohex grid
- Geotile grid
- Global
- Histogram
- IP prefix
- IP range
- Missing
- Multi Terms
- Nested
- Parent
- Random sampler
- Range
- Rare terms
- Reverse nested
- Sampler
- Significant terms
- Significant text
- Terms
- Time series
- Variable width histogram
- Subtleties of bucketing range fields
- Metrics aggregations
- Pipeline aggregations
- Average bucket
- Bucket script
- Bucket count K-S test
- Bucket correlation
- Bucket selector
- Bucket sort
- Change point
- Cumulative cardinality
- Cumulative sum
- Derivative
- Extended stats bucket
- Inference bucket
- Max bucket
- Min bucket
- Moving function
- Moving percentiles
- Normalize
- Percentiles bucket
- Serial differencing
- Stats bucket
- Sum bucket
- Bucket aggregations
- Geospatial analysis
- Connectors
- EQL
- ES|QL
- SQL
- Overview
- Getting Started with SQL
- Conventions and Terminology
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
- SQL ODBC
- SQL Client Applications
- SQL Language
- Functions and Operators
- Comparison Operators
- Logical Operators
- Math Operators
- Cast Operators
- LIKE and RLIKE Operators
- Aggregate Functions
- Grouping Functions
- Date/Time and Interval Functions and Operators
- Full-Text Search Functions
- Mathematical Functions
- String Functions
- Type Conversion Functions
- Geo Functions
- Conditional Functions And Expressions
- System Functions
- Reserved keywords
- SQL Limitations
- Scripting
- Data management
- ILM: Manage the index lifecycle
- Tutorial: Customize built-in policies
- Tutorial: Automate rollover
- Index management in Kibana
- Overview
- Concepts
- Index lifecycle actions
- Configure a lifecycle policy
- Migrate index allocation filters to node roles
- Troubleshooting index lifecycle management errors
- Start and stop index lifecycle management
- Manage existing indices
- Skip rollover
- Restore a managed data stream or index
- Data tiers
- Autoscaling
- Monitor a cluster
- Roll up or transform your data
- Set up a cluster for high availability
- Snapshot and restore
- Secure the Elastic Stack
- Elasticsearch security principles
- Start the Elastic Stack with security enabled automatically
- Manually configure security
- Updating node security certificates
- User authentication
- Built-in users
- Service accounts
- Internal users
- Token-based authentication services
- User profiles
- Realms
- Realm chains
- Security domains
- Active Directory user authentication
- File-based user authentication
- LDAP user authentication
- Native user authentication
- OpenID Connect authentication
- PKI user authentication
- SAML authentication
- Kerberos authentication
- JWT authentication
- Integrating with other authentication systems
- Enabling anonymous access
- Looking up users without authentication
- Controlling the user cache
- Configuring SAML single-sign-on on the Elastic Stack
- Configuring single sign-on to the Elastic Stack using OpenID Connect
- User authorization
- Built-in roles
- Defining roles
- Role restriction
- Security privileges
- Document level security
- Field level security
- Granting privileges for data streams and aliases
- Mapping users and groups to roles
- Setting up field and document level security
- Submitting requests on behalf of other users
- Configuring authorization delegation
- Customizing roles and authorization
- Enable audit logging
- Restricting connections with IP filtering
- Securing clients and integrations
- Operator privileges
- Troubleshooting
- Some settings are not returned via the nodes settings API
- Authorization exceptions
- Users command fails due to extra arguments
- Users are frequently locked out of Active Directory
- Certificate verification fails for curl on Mac
- SSLHandshakeException causes connections to fail
- Common SSL/TLS exceptions
- Common Kerberos exceptions
- Common SAML issues
- Internal Server Error in Kibana
- Setup-passwords command fails due to connection failure
- Failures due to relocation of the configuration files
- Limitations
- Watcher
- Cross-cluster replication
- Data store architecture
- REST APIs
- API conventions
- Common options
- REST API compatibility
- Autoscaling APIs
- Behavioral Analytics APIs
- Compact and aligned text (CAT) APIs
- cat aliases
- cat allocation
- cat anomaly detectors
- cat component templates
- cat count
- cat data frame analytics
- cat datafeeds
- cat fielddata
- cat health
- cat indices
- cat master
- cat nodeattrs
- cat nodes
- cat pending tasks
- cat plugins
- cat recovery
- cat repositories
- cat segments
- cat shards
- cat snapshots
- cat task management
- cat templates
- cat thread pool
- cat trained model
- cat transforms
- Cluster APIs
- Cluster allocation explain
- Cluster get settings
- Cluster health
- Health
- Cluster reroute
- Cluster state
- Cluster stats
- Cluster update settings
- Nodes feature usage
- Nodes hot threads
- Nodes info
- Prevalidate node removal
- Nodes reload secure settings
- Nodes stats
- Cluster Info
- Pending cluster tasks
- Remote cluster info
- Task management
- Voting configuration exclusions
- Create or update desired nodes
- Get desired nodes
- Delete desired nodes
- Get desired balance
- Reset desired balance
- Cross-cluster replication APIs
- Connector APIs
- Create connector
- Delete connector
- Get connector
- List connectors
- Update connector API key id
- Update connector configuration
- Update connector index name
- Update connector features
- Update connector filtering
- Update connector name and description
- Update connector pipeline
- Update connector scheduling
- Update connector service type
- Create connector sync job
- Cancel connector sync job
- Delete connector sync job
- Get connector sync job
- List connector sync jobs
- Check in a connector
- Update connector error
- Update connector last sync stats
- Update connector status
- Check in connector sync job
- Claim connector sync job
- Set connector sync job error
- Set connector sync job stats
- Data stream APIs
- Document APIs
- Enrich APIs
- EQL APIs
- ES|QL APIs
- Features APIs
- Fleet APIs
- Graph explore API
- Index APIs
- Alias exists
- Aliases
- Analyze
- Analyze index disk usage
- Clear cache
- Clone index
- Close index
- Create index
- Create or update alias
- Create or update component template
- Create or update index template
- Create or update index template (legacy)
- Delete component template
- Delete dangling index
- Delete alias
- Delete index
- Delete index template
- Delete index template (legacy)
- Exists
- Field usage stats
- Flush
- Force merge
- Get alias
- Get component template
- Get field mapping
- Get index
- Get index settings
- Get index template
- Get index template (legacy)
- Get mapping
- Import dangling index
- Index recovery
- Index segments
- Index shard stores
- Index stats
- Index template exists (legacy)
- List dangling indices
- Open index
- Refresh
- Resolve index
- Resolve cluster
- Rollover
- Shrink index
- Simulate index
- Simulate template
- Split index
- Unfreeze index
- Update index settings
- Update mapping
- Index lifecycle management APIs
- Create or update lifecycle policy
- Get policy
- Delete policy
- Move to step
- Remove policy
- Retry policy
- Get index lifecycle management status
- Explain lifecycle
- Start index lifecycle management
- Stop index lifecycle management
- Migrate indices, ILM policies, and legacy, composable and component templates to data tiers routing
- Inference APIs
- Delete inference API
- Get inference API
- Perform inference API
- Create inference API
- Stream inference API
- Update inference API
- AlibabaCloud AI Search inference service
- Amazon Bedrock inference service
- Anthropic inference service
- Azure AI studio inference service
- Azure OpenAI inference service
- Cohere inference service
- Elasticsearch inference service
- ELSER inference service
- Google AI Studio inference service
- Google Vertex AI inference service
- HuggingFace inference service
- Mistral inference service
- OpenAI inference service
- Watsonx inference service
- Info API
- Ingest APIs
- Licensing APIs
- Logstash APIs
- Machine learning APIs
- Machine learning anomaly detection APIs
- Add events to calendar
- Add jobs to calendar
- Close jobs
- Create jobs
- Create calendars
- Create datafeeds
- Create filters
- Delete calendars
- Delete datafeeds
- Delete events from calendar
- Delete filters
- Delete forecasts
- Delete jobs
- Delete jobs from calendar
- Delete model snapshots
- Delete expired data
- Estimate model memory
- Flush jobs
- Forecast jobs
- Get buckets
- Get calendars
- Get categories
- Get datafeeds
- Get datafeed statistics
- Get influencers
- Get jobs
- Get job statistics
- Get model snapshots
- Get model snapshot upgrade statistics
- Get overall buckets
- Get scheduled events
- Get filters
- Get records
- Open jobs
- Post data to jobs
- Preview datafeeds
- Reset jobs
- Revert model snapshots
- Start datafeeds
- Stop datafeeds
- Update datafeeds
- Update filters
- Update jobs
- Update model snapshots
- Upgrade model snapshots
- Machine learning data frame analytics APIs
- Create data frame analytics jobs
- Delete data frame analytics jobs
- Evaluate data frame analytics
- Explain data frame analytics
- Get data frame analytics jobs
- Get data frame analytics jobs stats
- Preview data frame analytics
- Start data frame analytics jobs
- Stop data frame analytics jobs
- Update data frame analytics jobs
- Machine learning trained model APIs
- Clear trained model deployment cache
- Create or update trained model aliases
- Create part of a trained model
- Create trained models
- Create trained model vocabulary
- Delete trained model aliases
- Delete trained models
- Get trained models
- Get trained models stats
- Infer trained model
- Start trained model deployment
- Stop trained model deployment
- Update trained model deployment
- Migration APIs
- Node lifecycle APIs
- Query rules APIs
- Reload search analyzers API
- Repositories metering APIs
- Rollup APIs
- Root API
- Script APIs
- Search APIs
- Search Application APIs
- Searchable snapshots APIs
- Security APIs
- Authenticate
- Change passwords
- Clear cache
- Clear roles cache
- Clear privileges cache
- Clear API key cache
- Clear service account token caches
- Create API keys
- Create or update application privileges
- Create or update role mappings
- Create or update roles
- Bulk create or update roles API
- Bulk delete roles API
- Create or update users
- Create service account tokens
- Delegate PKI authentication
- Delete application privileges
- Delete role mappings
- Delete roles
- Delete service account token
- Delete users
- Disable users
- Enable users
- Enroll Kibana
- Enroll node
- Get API key information
- Get application privileges
- Get builtin privileges
- Get role mappings
- Get roles
- Query Role
- Get service accounts
- Get service account credentials
- Get Security settings
- Get token
- Get user privileges
- Get users
- Grant API keys
- Has privileges
- Invalidate API key
- Invalidate token
- OpenID Connect prepare authentication
- OpenID Connect authenticate
- OpenID Connect logout
- Query API key information
- Query User
- Update API key
- Update Security settings
- Bulk update API keys
- SAML prepare authentication
- SAML authenticate
- SAML logout
- SAML invalidate
- SAML complete logout
- SAML service provider metadata
- SSL certificate
- Activate user profile
- Disable user profile
- Enable user profile
- Get user profiles
- Suggest user profile
- Update user profile data
- Has privileges user profile
- Create Cross-Cluster API key
- Update Cross-Cluster API key
- Snapshot and restore APIs
- Snapshot lifecycle management APIs
- SQL APIs
- Synonyms APIs
- Text structure APIs
- Transform APIs
- Usage API
- Watcher APIs
- Definitions
- Command line tools
- elasticsearch-certgen
- elasticsearch-certutil
- elasticsearch-create-enrollment-token
- elasticsearch-croneval
- elasticsearch-keystore
- elasticsearch-node
- elasticsearch-reconfigure-node
- elasticsearch-reset-password
- elasticsearch-saml-metadata
- elasticsearch-service-tokens
- elasticsearch-setup-passwords
- elasticsearch-shard
- elasticsearch-syskeygen
- elasticsearch-users
- Optimizations
- Troubleshooting
- Fix common cluster issues
- Diagnose unassigned shards
- Add a missing tier to the system
- Allow Elasticsearch to allocate the data in the system
- Allow Elasticsearch to allocate the index
- Indices mix index allocation filters with data tiers node roles to move through data tiers
- Not enough nodes to allocate all shard replicas
- Total number of shards for an index on a single node exceeded
- Total number of shards per node has been reached
- Troubleshooting corruption
- Fix data nodes out of disk
- Fix master nodes out of disk
- Fix other role nodes out of disk
- Start index lifecycle management
- Start Snapshot Lifecycle Management
- Restore from snapshot
- Troubleshooting broken repositories
- Addressing repeated snapshot policy failures
- Troubleshooting an unstable cluster
- Troubleshooting discovery
- Troubleshooting monitoring
- Troubleshooting transforms
- Troubleshooting Watcher
- Troubleshooting searches
- Troubleshooting shards capacity health issues
- Troubleshooting an unbalanced cluster
- Capture diagnostics
- Migration guide
- Release notes
- Elasticsearch version 8.17.1
- Elasticsearch version 8.17.0
- Elasticsearch version 8.16.2
- Elasticsearch version 8.16.1
- Elasticsearch version 8.16.0
- Elasticsearch version 8.15.5
- Elasticsearch version 8.15.4
- Elasticsearch version 8.15.3
- Elasticsearch version 8.15.2
- Elasticsearch version 8.15.1
- Elasticsearch version 8.15.0
- Elasticsearch version 8.14.3
- Elasticsearch version 8.14.2
- Elasticsearch version 8.14.1
- Elasticsearch version 8.14.0
- Elasticsearch version 8.13.4
- Elasticsearch version 8.13.3
- Elasticsearch version 8.13.2
- Elasticsearch version 8.13.1
- Elasticsearch version 8.13.0
- Elasticsearch version 8.12.2
- Elasticsearch version 8.12.1
- Elasticsearch version 8.12.0
- Elasticsearch version 8.11.4
- Elasticsearch version 8.11.3
- Elasticsearch version 8.11.2
- Elasticsearch version 8.11.1
- Elasticsearch version 8.11.0
- Elasticsearch version 8.10.4
- Elasticsearch version 8.10.3
- Elasticsearch version 8.10.2
- Elasticsearch version 8.10.1
- Elasticsearch version 8.10.0
- Elasticsearch version 8.9.2
- Elasticsearch version 8.9.1
- Elasticsearch version 8.9.0
- Elasticsearch version 8.8.2
- Elasticsearch version 8.8.1
- Elasticsearch version 8.8.0
- Elasticsearch version 8.7.1
- Elasticsearch version 8.7.0
- Elasticsearch version 8.6.2
- Elasticsearch version 8.6.1
- Elasticsearch version 8.6.0
- Elasticsearch version 8.5.3
- Elasticsearch version 8.5.2
- Elasticsearch version 8.5.1
- Elasticsearch version 8.5.0
- Elasticsearch version 8.4.3
- Elasticsearch version 8.4.2
- Elasticsearch version 8.4.1
- Elasticsearch version 8.4.0
- Elasticsearch version 8.3.3
- Elasticsearch version 8.3.2
- Elasticsearch version 8.3.1
- Elasticsearch version 8.3.0
- Elasticsearch version 8.2.3
- Elasticsearch version 8.2.2
- Elasticsearch version 8.2.1
- Elasticsearch version 8.2.0
- Elasticsearch version 8.1.3
- Elasticsearch version 8.1.2
- Elasticsearch version 8.1.1
- Elasticsearch version 8.1.0
- Elasticsearch version 8.0.1
- Elasticsearch version 8.0.0
- Elasticsearch version 8.0.0-rc2
- Elasticsearch version 8.0.0-rc1
- Elasticsearch version 8.0.0-beta1
- Elasticsearch version 8.0.0-alpha2
- Elasticsearch version 8.0.0-alpha1
- Dependencies and versions
Suggesters
editSuggesters
editSuggests similar looking terms based on a provided text by using a suggester.
resp = client.search( index="my-index-000001", query={ "match": { "message": "tring out Elasticsearch" } }, suggest={ "my-suggestion": { "text": "tring out Elasticsearch", "term": { "field": "message" } } }, ) print(resp)
response = client.search( index: 'my-index-000001', body: { query: { match: { message: 'tring out Elasticsearch' } }, suggest: { "my-suggestion": { text: 'tring out Elasticsearch', term: { field: 'message' } } } } ) puts response
const response = await client.search({ index: "my-index-000001", query: { match: { message: "tring out Elasticsearch", }, }, suggest: { "my-suggestion": { text: "tring out Elasticsearch", term: { field: "message", }, }, }, }); console.log(response);
POST my-index-000001/_search { "query" : { "match": { "message": "tring out Elasticsearch" } }, "suggest" : { "my-suggestion" : { "text" : "tring out Elasticsearch", "term" : { "field" : "message" } } } }
Request
editThe suggest feature suggests similar looking terms based on a provided text by
using a suggester. The suggest request part is defined alongside the query part
in a _search
request. If the query part is left out, only suggestions are
returned.
Examples
editSeveral suggestions can be specified per request. Each suggestion is identified
with an arbitrary name. In the example below two suggestions are requested. Both
my-suggest-1
and my-suggest-2
suggestions use the term
suggester, but have
a different text
.
resp = client.search( suggest={ "my-suggest-1": { "text": "tring out Elasticsearch", "term": { "field": "message" } }, "my-suggest-2": { "text": "kmichy", "term": { "field": "user.id" } } }, ) print(resp)
response = client.search( body: { suggest: { "my-suggest-1": { text: 'tring out Elasticsearch', term: { field: 'message' } }, "my-suggest-2": { text: 'kmichy', term: { field: 'user.id' } } } } ) puts response
const response = await client.search({ suggest: { "my-suggest-1": { text: "tring out Elasticsearch", term: { field: "message", }, }, "my-suggest-2": { text: "kmichy", term: { field: "user.id", }, }, }, }); console.log(response);
POST _search { "suggest": { "my-suggest-1" : { "text" : "tring out Elasticsearch", "term" : { "field" : "message" } }, "my-suggest-2" : { "text" : "kmichy", "term" : { "field" : "user.id" } } } }
The below suggest response example includes the suggestion response for
my-suggest-1
and my-suggest-2
. Each suggestion part contains
entries. Each entry is effectively a token from the suggest text and
contains the suggestion entry text, the original start offset and length
in the suggest text and if found an arbitrary number of options.
{ "_shards": ... "hits": ... "took": 2, "timed_out": false, "suggest": { "my-suggest-1": [ { "text": "tring", "offset": 0, "length": 5, "options": [ {"text": "trying", "score": 0.8, "freq": 1 } ] }, { "text": "out", "offset": 6, "length": 3, "options": [] }, { "text": "elasticsearch", "offset": 10, "length": 13, "options": [] } ], "my-suggest-2": ... } }
Each options array contains an option object that includes the suggested text, its document frequency and score compared to the suggest entry text. The meaning of the score depends on the used suggester. The term suggester’s score is based on the edit distance.
Global suggest text
editTo avoid repetition of the suggest text, it is possible to define a
global text. In the example below the suggest text is defined globally
and applies to the my-suggest-1
and my-suggest-2
suggestions.
$params = [ 'body' => [ 'suggest' => [ 'text' => 'tring out Elasticsearch', 'my-suggest-1' => [ 'term' => [ 'field' => 'message', ], ], 'my-suggest-2' => [ 'term' => [ 'field' => 'user', ], ], ], ], ]; $response = $client->search($params);
resp = client.search( suggest={ "text": "tring out Elasticsearch", "my-suggest-1": { "term": { "field": "message" } }, "my-suggest-2": { "term": { "field": "user" } } }, ) print(resp)
response = client.search( body: { suggest: { text: 'tring out Elasticsearch', "my-suggest-1": { term: { field: 'message' } }, "my-suggest-2": { term: { field: 'user' } } } } ) puts response
res, err := es.Search( es.Search.WithBody(strings.NewReader(`{ "suggest": { "text": "tring out Elasticsearch", "my-suggest-1": { "term": { "field": "message" } }, "my-suggest-2": { "term": { "field": "user" } } } }`)), es.Search.WithPretty(), ) fmt.Println(res, err)
const response = await client.search({ suggest: { text: "tring out Elasticsearch", "my-suggest-1": { term: { field: "message", }, }, "my-suggest-2": { term: { field: "user", }, }, }, }); console.log(response);
POST _search { "suggest": { "text" : "tring out Elasticsearch", "my-suggest-1" : { "term" : { "field" : "message" } }, "my-suggest-2" : { "term" : { "field" : "user" } } } }
The suggest text can in the above example also be specified as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level.
Term suggester
editThe term
suggester suggests terms based on edit distance. The provided
suggest text is analyzed before terms are suggested. The suggested terms
are provided per analyzed suggest text token. The term
suggester
doesn’t take the query into account that is part of request.
Common suggest options:
edit
|
The suggest text. The suggest text is a required option that needs to be set globally or per suggestion. |
|
The field to fetch the candidate suggestions from. This is a required option that either needs to be set globally or per suggestion. |
|
The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field. |
|
The maximum corrections to be returned per suggest text token. |
|
Defines how suggestions should be sorted per suggest text term. Two possible values:
|
|
The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
|
Other term suggest options:
edit
|
The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value results in a bad request error being thrown. Defaults to 2. |
|
The number of minimal prefix characters that must match in order be a candidate for suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don’t occur in the beginning of terms. |
|
The minimum length a suggest text term must have in
order to be included. Defaults to |
|
Sets the maximum number of suggestions to be retrieved
from each individual shard. During the reduce phase only the top N
suggestions are returned based on the |
|
A factor that is used to multiply with the
|
|
The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified, then the number cannot be fractional. The shard level document frequencies are used for this option. |
|
The maximum threshold in number of documents in which a suggest text token can exist in order to be included. Can be a relative percentage number (e.g., 0.4) or an absolute number to represent document frequencies. If a value higher than 1 is specified, then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms — which are usually spelled correctly — from being spellchecked. This also improves the spellcheck performance. The shard level document frequencies are used for this option. |
|
Which string distance implementation to use for comparing how similar suggested terms are. Five possible values can be specified:
|
Phrase Suggester
editThe term
suggester provides a very convenient API to access word
alternatives on a per token basis within a certain string distance. The API
allows accessing each token in the stream individually while
suggest-selection is left to the API consumer. Yet, often pre-selected
suggestions are required in order to present to the end-user. The
phrase
suggester adds additional logic on top of the term
suggester
to select entire corrected phrases instead of individual tokens weighted
based on ngram-language
models. In practice this suggester will be
able to make better decisions about which tokens to pick based on
co-occurrence and frequencies.
API Example
editIn general the phrase
suggester requires special mapping up front to work.
The phrase
suggester examples on this page need the following mapping to
work. The reverse
analyzer is used only in the last example.
resp = client.indices.create( index="test", settings={ "index": { "number_of_shards": 1, "analysis": { "analyzer": { "trigram": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "shingle" ] }, "reverse": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "reverse" ] } }, "filter": { "shingle": { "type": "shingle", "min_shingle_size": 2, "max_shingle_size": 3 } } } } }, mappings={ "properties": { "title": { "type": "text", "fields": { "trigram": { "type": "text", "analyzer": "trigram" }, "reverse": { "type": "text", "analyzer": "reverse" } } } } }, ) print(resp) resp1 = client.index( index="test", refresh=True, document={ "title": "noble warriors" }, ) print(resp1) resp2 = client.index( index="test", refresh=True, document={ "title": "nobel prize" }, ) print(resp2)
response = client.indices.create( index: 'test', body: { settings: { index: { number_of_shards: 1, analysis: { analyzer: { trigram: { type: 'custom', tokenizer: 'standard', filter: [ 'lowercase', 'shingle' ] }, reverse: { type: 'custom', tokenizer: 'standard', filter: [ 'lowercase', 'reverse' ] } }, filter: { shingle: { type: 'shingle', min_shingle_size: 2, max_shingle_size: 3 } } } } }, mappings: { properties: { title: { type: 'text', fields: { trigram: { type: 'text', analyzer: 'trigram' }, reverse: { type: 'text', analyzer: 'reverse' } } } } } } ) puts response response = client.index( index: 'test', refresh: true, body: { title: 'noble warriors' } ) puts response response = client.index( index: 'test', refresh: true, body: { title: 'nobel prize' } ) puts response
const response = await client.indices.create({ index: "test", settings: { index: { number_of_shards: 1, analysis: { analyzer: { trigram: { type: "custom", tokenizer: "standard", filter: ["lowercase", "shingle"], }, reverse: { type: "custom", tokenizer: "standard", filter: ["lowercase", "reverse"], }, }, filter: { shingle: { type: "shingle", min_shingle_size: 2, max_shingle_size: 3, }, }, }, }, }, mappings: { properties: { title: { type: "text", fields: { trigram: { type: "text", analyzer: "trigram", }, reverse: { type: "text", analyzer: "reverse", }, }, }, }, }, }); console.log(response); const response1 = await client.index({ index: "test", refresh: "true", document: { title: "noble warriors", }, }); console.log(response1); const response2 = await client.index({ index: "test", refresh: "true", document: { title: "nobel prize", }, }); console.log(response2);
PUT test { "settings": { "index": { "number_of_shards": 1, "analysis": { "analyzer": { "trigram": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase","shingle"] }, "reverse": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase","reverse"] } }, "filter": { "shingle": { "type": "shingle", "min_shingle_size": 2, "max_shingle_size": 3 } } } } }, "mappings": { "properties": { "title": { "type": "text", "fields": { "trigram": { "type": "text", "analyzer": "trigram" }, "reverse": { "type": "text", "analyzer": "reverse" } } } } } } POST test/_doc?refresh=true {"title": "noble warriors"} POST test/_doc?refresh=true {"title": "nobel prize"}
Once you have the analyzers and mappings set up you can use the phrase
suggester in the same spot you’d use the term
suggester:
resp = client.search( index="test", suggest={ "text": "noble prize", "simple_phrase": { "phrase": { "field": "title.trigram", "size": 1, "gram_size": 3, "direct_generator": [ { "field": "title.trigram", "suggest_mode": "always" } ], "highlight": { "pre_tag": "<em>", "post_tag": "</em>" } } } }, ) print(resp)
const response = await client.search({ index: "test", suggest: { text: "noble prize", simple_phrase: { phrase: { field: "title.trigram", size: 1, gram_size: 3, direct_generator: [ { field: "title.trigram", suggest_mode: "always", }, ], highlight: { pre_tag: "<em>", post_tag: "</em>", }, }, }, }, }); console.log(response);
POST test/_search { "suggest": { "text": "noble prize", "simple_phrase": { "phrase": { "field": "title.trigram", "size": 1, "gram_size": 3, "direct_generator": [ { "field": "title.trigram", "suggest_mode": "always" } ], "highlight": { "pre_tag": "<em>", "post_tag": "</em>" } } } } }
The response contains suggestions scored by the most likely spelling correction first. In this case we received the expected correction "nobel prize".
{ "_shards": ... "hits": ... "timed_out": false, "took": 3, "suggest": { "simple_phrase" : [ { "text" : "noble prize", "offset" : 0, "length" : 11, "options" : [ { "text" : "nobel prize", "highlighted": "<em>nobel</em> prize", "score" : 0.48614594 }] } ] } }
Basic Phrase suggest API parameters
edit
|
The name of the field used to do n-gram lookups for the language model, the suggester will use this field to gain statistics to score corrections. This field is mandatory. |
|
Sets max size of the n-grams (shingles) in the |
|
The likelihood of a term being
misspelled even if the term exists in the dictionary. The default is
|
|
The confidence level defines a factor applied to the
input phrases score which is used as a threshold for other suggest
candidates. Only candidates that score higher than the threshold will be
included in the result. For instance a confidence level of |
|
The maximum percentage of the terms
considered to be misspellings in order to form a correction. This method
accepts a float value in the range |
|
The separator that is used to separate terms in the bigram field. If not set the whitespace character is used as a separator. |
|
The number of candidates that are generated for each
individual query term. Low numbers like |
|
Sets the analyzer to analyze to suggest text with.
Defaults to the search analyzer of the suggest field passed via |
|
Sets the maximum number of suggested terms to be
retrieved from each individual shard. During the reduce phase, only the
top N suggestions are returned based on the |
|
Sets the text / query to provide suggestions for. |
|
Sets up suggestion highlighting. If not provided then
no |
|
Checks each suggestion against the specified |
resp = client.search( index="test", suggest={ "text": "noble prize", "simple_phrase": { "phrase": { "field": "title.trigram", "size": 1, "direct_generator": [ { "field": "title.trigram", "suggest_mode": "always", "min_word_length": 1 } ], "collate": { "query": { "source": { "match": { "{{field_name}}": "{{suggestion}}" } } }, "params": { "field_name": "title" }, "prune": True } } } }, ) print(resp)
const response = await client.search({ index: "test", suggest: { text: "noble prize", simple_phrase: { phrase: { field: "title.trigram", size: 1, direct_generator: [ { field: "title.trigram", suggest_mode: "always", min_word_length: 1, }, ], collate: { query: { source: { match: { "{{field_name}}": "{{suggestion}}", }, }, }, params: { field_name: "title", }, prune: true, }, }, }, }, }); console.log(response);
POST test/_search { "suggest": { "text" : "noble prize", "simple_phrase" : { "phrase" : { "field" : "title.trigram", "size" : 1, "direct_generator" : [ { "field" : "title.trigram", "suggest_mode" : "always", "min_word_length" : 1 } ], "collate": { "query": { "source" : { "match": { "{{field_name}}" : "{{suggestion}}" } } }, "params": {"field_name" : "title"}, "prune": true } } } } }
This query will be run once for every suggestion. |
|
The |
|
An additional |
|
All suggestions will be returned with an extra |
Smoothing Models
editThe phrase
suggester supports multiple smoothing models to balance
weight between infrequent grams (grams (shingles) are not existing in
the index) and frequent grams (appear at least once in the index). The
smoothing model can be selected by setting the smoothing
parameter
to one of the following options. Each smoothing model supports specific
properties that can be configured.
|
A simple backoff model that backs off to lower
order n-gram models if the higher order count is |
|
A smoothing model that uses an additive smoothing where a
constant (typically |
|
A smoothing model that takes the weighted
mean of the unigrams, bigrams, and trigrams based on user supplied
weights (lambdas). Linear Interpolation doesn’t have any default values.
All parameters ( |
resp = client.search( index="test", suggest={ "text": "obel prize", "simple_phrase": { "phrase": { "field": "title.trigram", "size": 1, "smoothing": { "laplace": { "alpha": 0.7 } } } } }, ) print(resp)
const response = await client.search({ index: "test", suggest: { text: "obel prize", simple_phrase: { phrase: { field: "title.trigram", size: 1, smoothing: { laplace: { alpha: 0.7, }, }, }, }, }, }); console.log(response);
POST test/_search { "suggest": { "text" : "obel prize", "simple_phrase" : { "phrase" : { "field" : "title.trigram", "size" : 1, "smoothing" : { "laplace" : { "alpha" : 0.7 } } } } } }
Candidate Generators
editThe phrase
suggester uses candidate generators to produce a list of
possible terms per term in the given text. A single candidate generator
is similar to a term
suggester called for each individual term in the
text. The output of the generators is subsequently scored in combination
with the candidates from the other terms for suggestion candidates.
Currently only one type of candidate generator is supported, the
direct_generator
. The Phrase suggest API accepts a list of generators
under the key direct_generator
; each of the generators in the list is
called per term in the original text.
Direct Generators
editThe direct generators support the following parameters:
|
The field to fetch the candidate suggestions from. This is a required option that either needs to be set globally or per suggestion. |
|
The maximum corrections to be returned per suggest text token. |
|
The suggest mode controls what suggestions are included on the suggestions
generated on each shard. All values other than
|
|
The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value results in a bad request error being thrown. Defaults to 2. |
|
The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don’t occur in the beginning of terms. |
|
The minimum length a suggest text term must have in order to be included. Defaults to 4. |
|
A factor that is used to multiply with the
|
|
The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified, then the number cannot be fractional. The shard level document frequencies are used for this option. |
|
The maximum threshold in number of documents in which a suggest text token can exist in order to be included. Can be a relative percentage number (e.g., 0.4) or an absolute number to represent document frequencies. If a value higher than 1 is specified, then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms — which are usually spelled correctly — from being spellchecked. This also improves the spellcheck performance. The shard level document frequencies are used for this option. |
|
A filter (analyzer) that is applied to each of the tokens passed to this candidate generator. This filter is applied to the original token before candidates are generated. |
|
A filter (analyzer) that is applied to each of the generated tokens before they are passed to the actual phrase scorer. |
The following example shows a phrase
suggest call with two generators:
the first one is using a field containing ordinary indexed terms, and the
second one uses a field that uses terms indexed with a reverse
filter
(tokens are index in reverse order). This is used to overcome the limitation
of the direct generators to require a constant prefix to provide
high-performance suggestions. The pre_filter
and post_filter
options
accept ordinary analyzer names.
resp = client.search( index="test", suggest={ "text": "obel prize", "simple_phrase": { "phrase": { "field": "title.trigram", "size": 1, "direct_generator": [ { "field": "title.trigram", "suggest_mode": "always" }, { "field": "title.reverse", "suggest_mode": "always", "pre_filter": "reverse", "post_filter": "reverse" } ] } } }, ) print(resp)
const response = await client.search({ index: "test", suggest: { text: "obel prize", simple_phrase: { phrase: { field: "title.trigram", size: 1, direct_generator: [ { field: "title.trigram", suggest_mode: "always", }, { field: "title.reverse", suggest_mode: "always", pre_filter: "reverse", post_filter: "reverse", }, ], }, }, }, }); console.log(response);
POST test/_search { "suggest": { "text" : "obel prize", "simple_phrase" : { "phrase" : { "field" : "title.trigram", "size" : 1, "direct_generator" : [ { "field" : "title.trigram", "suggest_mode" : "always" }, { "field" : "title.reverse", "suggest_mode" : "always", "pre_filter" : "reverse", "post_filter" : "reverse" } ] } } } }
pre_filter
and post_filter
can also be used to inject synonyms after
candidates are generated. For instance for the query captain usq
we
might generate a candidate usa
for the term usq
, which is a synonym for
america
. This allows us to present captain america
to the user if this
phrase scores high enough.
Completion Suggester
editThe completion
suggester provides auto-complete/search-as-you-type
functionality. This is a navigational feature to guide users to
relevant results as they are typing, improving search precision.
It is not meant for spell correction or did-you-mean functionality
like the term
or phrase
suggesters.
Ideally, auto-complete functionality should be as fast as a user
types to provide instant feedback relevant to what a user has already
typed in. Hence, completion
suggester is optimized for speed.
The suggester uses data structures that enable fast lookups,
but are costly to build and are stored in-memory.
Mapping
editTo use the completion
suggester, map the field from
which you want to generate suggestions as type completion
. This indexes the
field values for fast completions.
resp = client.indices.create( index="music", mappings={ "properties": { "suggest": { "type": "completion" } } }, ) print(resp)
response = client.indices.create( index: 'music', body: { mappings: { properties: { suggest: { type: 'completion' } } } } ) puts response
const response = await client.indices.create({ index: "music", mappings: { properties: { suggest: { type: "completion", }, }, }, }); console.log(response);
PUT music { "mappings": { "properties": { "suggest": { "type": "completion" } } } }
Parameters for completion
fields
editThe following parameters are accepted by completion
fields:
The index analyzer to use, defaults to |
|
The search analyzer to use, defaults to value of |
|
|
Preserves the separators, defaults to |
|
Enables position increments, defaults to |
|
Limits the length of a single input, defaults to |
Indexing
editYou index suggestions like any other field. A suggestion is made of an
input
and an optional weight
attribute. An input
is the expected
text to be matched by a suggestion query and the weight
determines how
the suggestions will be scored. Indexing a suggestion is as follows:
resp = client.index( index="music", id="1", refresh=True, document={ "suggest": { "input": [ "Nevermind", "Nirvana" ], "weight": 34 } }, ) print(resp)
response = client.index( index: 'music', id: 1, refresh: true, body: { suggest: { input: [ 'Nevermind', 'Nirvana' ], weight: 34 } } ) puts response
const response = await client.index({ index: "music", id: 1, refresh: "true", document: { suggest: { input: ["Nevermind", "Nirvana"], weight: 34, }, }, }); console.log(response);
PUT music/_doc/1?refresh { "suggest" : { "input": [ "Nevermind", "Nirvana" ], "weight" : 34 } }
The following parameters are supported:
|
The input to store, this can be an array of strings or just a string. This field is mandatory. This value cannot contain the following UTF-16 control characters:
|
|
A positive integer or a string containing a positive integer, which defines a weight and allows you to rank your suggestions. This field is optional. |
You can index multiple suggestions for a document as follows:
resp = client.index( index="music", id="1", refresh=True, document={ "suggest": [ { "input": "Nevermind", "weight": 10 }, { "input": "Nirvana", "weight": 3 } ] }, ) print(resp)
response = client.index( index: 'music', id: 1, refresh: true, body: { suggest: [ { input: 'Nevermind', weight: 10 }, { input: 'Nirvana', weight: 3 } ] } ) puts response
const response = await client.index({ index: "music", id: 1, refresh: "true", document: { suggest: [ { input: "Nevermind", weight: 10, }, { input: "Nirvana", weight: 3, }, ], }, }); console.log(response);
PUT music/_doc/1?refresh { "suggest": [ { "input": "Nevermind", "weight": 10 }, { "input": "Nirvana", "weight": 3 } ] }
You can use the following shorthand form. Note that you can not specify a weight with suggestion(s) in the shorthand form.
resp = client.index( index="music", id="1", refresh=True, document={ "suggest": [ "Nevermind", "Nirvana" ] }, ) print(resp)
response = client.index( index: 'music', id: 1, refresh: true, body: { suggest: [ 'Nevermind', 'Nirvana' ] } ) puts response
const response = await client.index({ index: "music", id: 1, refresh: "true", document: { suggest: ["Nevermind", "Nirvana"], }, }); console.log(response);
PUT music/_doc/1?refresh { "suggest" : [ "Nevermind", "Nirvana" ] }
Querying
editSuggesting works as usual, except that you have to specify the suggest
type as completion
. Suggestions are near real-time, which means
new suggestions can be made visible by refresh and
documents once deleted are never shown. This request:
resp = client.search( index="music", pretty=True, suggest={ "song-suggest": { "prefix": "nir", "completion": { "field": "suggest" } } }, ) print(resp)
response = client.search( index: 'music', pretty: true, body: { suggest: { "song-suggest": { prefix: 'nir', completion: { field: 'suggest' } } } } ) puts response
const response = await client.search({ index: "music", pretty: "true", suggest: { "song-suggest": { prefix: "nir", completion: { field: "suggest", }, }, }, }); console.log(response);
POST music/_search?pretty { "suggest": { "song-suggest": { "prefix": "nir", "completion": { "field": "suggest" } } } }
Prefix used to search for suggestions |
|
Type of suggestions |
|
Name of the field to search for suggestions in |
returns this response:
{ "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits": ... "took": 2, "timed_out": false, "suggest": { "song-suggest" : [ { "text" : "nir", "offset" : 0, "length" : 3, "options" : [ { "text" : "Nirvana", "_index": "music", "_id": "1", "_score": 1.0, "_source": { "suggest": ["Nevermind", "Nirvana"] } } ] } ] } }
_source
metadata field must be enabled, which is the default
behavior, to enable returning _source
with suggestions.
The configured weight for a suggestion is returned as _score
. The
text
field uses the input
of your indexed suggestion. Suggestions
return the full document _source
by default. The size of the _source
can impact performance due to disk fetch and network transport overhead.
To save some network overhead, filter out unnecessary fields from the _source
using source filtering to minimize
_source
size. Note that the _suggest endpoint doesn’t support source
filtering but using suggest on the _search
endpoint does:
resp = client.search( index="music", source="suggest", suggest={ "song-suggest": { "prefix": "nir", "completion": { "field": "suggest", "size": 5 } } }, ) print(resp)
response = client.search( index: 'music', body: { _source: 'suggest', suggest: { "song-suggest": { prefix: 'nir', completion: { field: 'suggest', size: 5 } } } } ) puts response
const response = await client.search({ index: "music", _source: "suggest", suggest: { "song-suggest": { prefix: "nir", completion: { field: "suggest", size: 5, }, }, }, }); console.log(response);
POST music/_search { "_source": "suggest", "suggest": { "song-suggest": { "prefix": "nir", "completion": { "field": "suggest", "size": 5 } } } }
Filter the source to return only the |
|
Name of the field to search for suggestions in |
|
Number of suggestions to return |
Which should look like:
{ "took": 6, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 0, "relation": "eq" }, "max_score": null, "hits": [] }, "suggest": { "song-suggest": [ { "text": "nir", "offset": 0, "length": 3, "options": [ { "text": "Nirvana", "_index": "music", "_id": "1", "_score": 1.0, "_source": { "suggest": [ "Nevermind", "Nirvana" ] } } ] } ] } }
The basic completion suggester query supports the following parameters:
|
The name of the field on which to run the query (required). |
|
The number of suggestions to return (defaults to |
|
Whether duplicate suggestions should be filtered out (defaults to |
The completion suggester considers all documents in the index. See Context Suggester for an explanation of how to query a subset of documents instead.
In case of completion queries spanning more than one shard, the suggest is executed in two phases, where the last phase fetches the relevant documents from shards, implying executing completion requests against a single shard is more performant due to the document fetch overhead when the suggest spans multiple shards. To get best performance for completions, it is recommended to index completions into a single shard index. In case of high heap usage due to shard size, it is still recommended to break index into multiple shards instead of optimizing for completion performance.
Skip duplicate suggestions
editQueries can return duplicate suggestions coming from different documents.
It is possible to modify this behavior by setting skip_duplicates
to true.
When set, this option filters out documents with duplicate suggestions from the result.
resp = client.search( index="music", pretty=True, suggest={ "song-suggest": { "prefix": "nor", "completion": { "field": "suggest", "skip_duplicates": True } } }, ) print(resp)
response = client.search( index: 'music', pretty: true, body: { suggest: { "song-suggest": { prefix: 'nor', completion: { field: 'suggest', skip_duplicates: true } } } } ) puts response
const response = await client.search({ index: "music", pretty: "true", suggest: { "song-suggest": { prefix: "nor", completion: { field: "suggest", skip_duplicates: true, }, }, }, }); console.log(response);
POST music/_search?pretty { "suggest": { "song-suggest": { "prefix": "nor", "completion": { "field": "suggest", "skip_duplicates": true } } } }
When set to true, this option can slow down search because more suggestions need to be visited to find the top N.
Fuzzy queries
editThe completion suggester also supports fuzzy queries — this means you can have a typo in your search and still get results back.
resp = client.search( index="music", pretty=True, suggest={ "song-suggest": { "prefix": "nor", "completion": { "field": "suggest", "fuzzy": { "fuzziness": 2 } } } }, ) print(resp)
response = client.search( index: 'music', pretty: true, body: { suggest: { "song-suggest": { prefix: 'nor', completion: { field: 'suggest', fuzzy: { fuzziness: 2 } } } } } ) puts response
const response = await client.search({ index: "music", pretty: "true", suggest: { "song-suggest": { prefix: "nor", completion: { field: "suggest", fuzzy: { fuzziness: 2, }, }, }, }, }); console.log(response);
POST music/_search?pretty { "suggest": { "song-suggest": { "prefix": "nor", "completion": { "field": "suggest", "fuzzy": { "fuzziness": 2 } } } } }
Suggestions that share the longest prefix to the query prefix
will
be scored higher.
The fuzzy query can take specific fuzzy parameters. The following parameters are supported:
|
The fuzziness factor, defaults to |
|
if set to |
|
Minimum length of the input before fuzzy
suggestions are returned, defaults |
|
Minimum length of the input, which is not
checked for fuzzy alternatives, defaults to |
|
If |
If you want to stick with the default values, but
still use fuzzy, you can either use fuzzy: {}
or fuzzy: true
.
Regex queries
editThe completion suggester also supports regex queries meaning you can express a prefix as a regular expression
resp = client.search( index="music", pretty=True, suggest={ "song-suggest": { "regex": "n[ever|i]r", "completion": { "field": "suggest" } } }, ) print(resp)
response = client.search( index: 'music', pretty: true, body: { suggest: { "song-suggest": { regex: 'n[ever|i]r', completion: { field: 'suggest' } } } } ) puts response
const response = await client.search({ index: "music", pretty: "true", suggest: { "song-suggest": { regex: "n[ever|i]r", completion: { field: "suggest", }, }, }, }); console.log(response);
POST music/_search?pretty { "suggest": { "song-suggest": { "regex": "n[ever|i]r", "completion": { "field": "suggest" } } } }
The regex query can take specific regex parameters. The following parameters are supported:
|
Possible flags are |
|
Regular expressions are dangerous because it’s easy to accidentally
create an innocuous looking one that requires an exponential number of
internal determinized automaton states (and corresponding RAM and CPU)
for Lucene to execute. Lucene prevents these using the
|
Context Suggester
editThe completion suggester considers all documents in the index, but it is often desirable to serve suggestions filtered and/or boosted by some criteria. For example, you want to suggest song titles filtered by certain artists or you want to boost song titles based on their genre.
To achieve suggestion filtering and/or boosting, you can add context mappings while
configuring a completion field. You can define multiple context mappings for a
completion field.
Every context mapping has a unique name and a type. There are two types: category
and geo
. Context mappings are configured under the contexts
parameter in
the field mapping.
It is mandatory to provide a context when indexing and querying a context enabled completion field.
The maximum allowed number of completion field context mappings is 10.
The following defines types, each with two context mappings for a completion field:
resp = client.indices.create( index="place", mappings={ "properties": { "suggest": { "type": "completion", "contexts": [ { "name": "place_type", "type": "category" }, { "name": "location", "type": "geo", "precision": 4 } ] } } }, ) print(resp) resp1 = client.indices.create( index="place_path_category", mappings={ "properties": { "suggest": { "type": "completion", "contexts": [ { "name": "place_type", "type": "category", "path": "cat" }, { "name": "location", "type": "geo", "precision": 4, "path": "loc" } ] }, "loc": { "type": "geo_point" } } }, ) print(resp1)
response = client.indices.create( index: 'place', body: { mappings: { properties: { suggest: { type: 'completion', contexts: [ { name: 'place_type', type: 'category' }, { name: 'location', type: 'geo', precision: 4 } ] } } } } ) puts response response = client.indices.create( index: 'place_path_category', body: { mappings: { properties: { suggest: { type: 'completion', contexts: [ { name: 'place_type', type: 'category', path: 'cat' }, { name: 'location', type: 'geo', precision: 4, path: 'loc' } ] }, loc: { type: 'geo_point' } } } } ) puts response
const response = await client.indices.create({ index: "place", mappings: { properties: { suggest: { type: "completion", contexts: [ { name: "place_type", type: "category", }, { name: "location", type: "geo", precision: 4, }, ], }, }, }, }); console.log(response); const response1 = await client.indices.create({ index: "place_path_category", mappings: { properties: { suggest: { type: "completion", contexts: [ { name: "place_type", type: "category", path: "cat", }, { name: "location", type: "geo", precision: 4, path: "loc", }, ], }, loc: { type: "geo_point", }, }, }, }); console.log(response1);
PUT place { "mappings": { "properties": { "suggest": { "type": "completion", "contexts": [ { "name": "place_type", "type": "category" }, { "name": "location", "type": "geo", "precision": 4 } ] } } } } PUT place_path_category { "mappings": { "properties": { "suggest": { "type": "completion", "contexts": [ { "name": "place_type", "type": "category", "path": "cat" }, { "name": "location", "type": "geo", "precision": 4, "path": "loc" } ] }, "loc": { "type": "geo_point" } } } }
Defines a |
|
Defines a |
|
Defines a |
|
Defines a |
Adding context mappings increases the index size for completion field. The completion index is entirely heap resident, you can monitor the completion field index size using Index stats.
Category Context
editThe category
context allows you to associate one or more categories with suggestions at index
time. At query time, suggestions can be filtered and boosted by their associated categories.
The mappings are set up like the place_type
fields above. If path
is defined
then the categories are read from that path in the document, otherwise they must
be sent in the suggest field like this:
resp = client.index( index="place", id="1", document={ "suggest": { "input": [ "timmy's", "starbucks", "dunkin donuts" ], "contexts": { "place_type": [ "cafe", "food" ] } } }, ) print(resp)
response = client.index( index: 'place', id: 1, body: { suggest: { input: [ "timmy's", 'starbucks', 'dunkin donuts' ], contexts: { place_type: [ 'cafe', 'food' ] } } } ) puts response
const response = await client.index({ index: "place", id: 1, document: { suggest: { input: ["timmy's", "starbucks", "dunkin donuts"], contexts: { place_type: ["cafe", "food"], }, }, }, }); console.log(response);
PUT place/_doc/1 { "suggest": { "input": [ "timmy's", "starbucks", "dunkin donuts" ], "contexts": { "place_type": [ "cafe", "food" ] } } }
If the mapping had a path
then the following index request would be enough to
add the categories:
resp = client.index( index="place_path_category", id="1", document={ "suggest": [ "timmy's", "starbucks", "dunkin donuts" ], "cat": [ "cafe", "food" ] }, ) print(resp)
response = client.index( index: 'place_path_category', id: 1, body: { suggest: [ "timmy's", 'starbucks', 'dunkin donuts' ], cat: [ 'cafe', 'food' ] } ) puts response
const response = await client.index({ index: "place_path_category", id: 1, document: { suggest: ["timmy's", "starbucks", "dunkin donuts"], cat: ["cafe", "food"], }, }); console.log(response);
PUT place_path_category/_doc/1 { "suggest": ["timmy's", "starbucks", "dunkin donuts"], "cat": ["cafe", "food"] }
If context mapping references another field and the categories are explicitly indexed, the suggestions are indexed with both set of categories.
Category Query
editSuggestions can be filtered by one or more categories. The following filters suggestions by multiple categories:
resp = client.search( index="place", pretty=True, suggest={ "place_suggestion": { "prefix": "tim", "completion": { "field": "suggest", "size": 10, "contexts": { "place_type": [ "cafe", "restaurants" ] } } } }, ) print(resp)
response = client.search( index: 'place', pretty: true, body: { suggest: { place_suggestion: { prefix: 'tim', completion: { field: 'suggest', size: 10, contexts: { place_type: [ 'cafe', 'restaurants' ] } } } } } ) puts response
const response = await client.search({ index: "place", pretty: "true", suggest: { place_suggestion: { prefix: "tim", completion: { field: "suggest", size: 10, contexts: { place_type: ["cafe", "restaurants"], }, }, }, }, }); console.log(response);
POST place/_search?pretty { "suggest": { "place_suggestion": { "prefix": "tim", "completion": { "field": "suggest", "size": 10, "contexts": { "place_type": [ "cafe", "restaurants" ] } } } } }
If multiple categories or category contexts are set on the query they are merged as a disjunction. This means that suggestions match if they contain at least one of the provided context values.
Suggestions with certain categories can be boosted higher than others. The following filters suggestions by categories and additionally boosts suggestions associated with some categories:
resp = client.search( index="place", pretty=True, suggest={ "place_suggestion": { "prefix": "tim", "completion": { "field": "suggest", "size": 10, "contexts": { "place_type": [ { "context": "cafe" }, { "context": "restaurants", "boost": 2 } ] } } } }, ) print(resp)
response = client.search( index: 'place', pretty: true, body: { suggest: { place_suggestion: { prefix: 'tim', completion: { field: 'suggest', size: 10, contexts: { place_type: [ { context: 'cafe' }, { context: 'restaurants', boost: 2 } ] } } } } } ) puts response
const response = await client.search({ index: "place", pretty: "true", suggest: { place_suggestion: { prefix: "tim", completion: { field: "suggest", size: 10, contexts: { place_type: [ { context: "cafe", }, { context: "restaurants", boost: 2, }, ], }, }, }, }, }); console.log(response);
POST place/_search?pretty { "suggest": { "place_suggestion": { "prefix": "tim", "completion": { "field": "suggest", "size": 10, "contexts": { "place_type": [ { "context": "cafe" }, { "context": "restaurants", "boost": 2 } ] } } } } }
The context query filter suggestions associated with
categories cafe and restaurants and boosts the
suggestions associated with restaurants by a
factor of |
In addition to accepting category values, a context query can be composed of
multiple category context clauses. The following parameters are supported for a
category
context clause:
|
The value of the category to filter/boost on. This is mandatory. |
|
The factor by which the score of the suggestion
should be boosted, the score is computed by
multiplying the boost with the suggestion weight,
defaults to |
|
Whether the category value should be treated as a
prefix or not. For example, if set to |
If a suggestion entry matches multiple contexts the final score is computed as the maximum score produced by any matching contexts.
Geo location Context
editA geo
context allows you to associate one or more geo points or geohashes with suggestions
at index time. At query time, suggestions can be filtered and boosted if they are within
a certain distance of a specified geo location.
Internally, geo points are encoded as geohashes with the specified precision.
Geo Mapping
editIn addition to the path
setting, geo
context mapping accepts the following settings:
|
This defines the precision of the geohash to be indexed and can be specified
as a distance value ( |
The index time precision
setting sets the maximum geohash precision that
can be used at query time.
Indexing geo contexts
editgeo
contexts can be explicitly set with suggestions or be indexed from a geo point field in the
document via the path
parameter, similar to category
contexts. Associating multiple geo location context
with a suggestion, will index the suggestion for every geo location. The following indexes a suggestion
with two geo location contexts:
resp = client.index( index="place", id="1", document={ "suggest": { "input": "timmy's", "contexts": { "location": [ { "lat": 43.6624803, "lon": -79.3863353 }, { "lat": 43.6624718, "lon": -79.3873227 } ] } } }, ) print(resp)
response = client.index( index: 'place', id: 1, body: { suggest: { input: "timmy's", contexts: { location: [ { lat: 43.6624803, lon: -79.3863353 }, { lat: 43.6624718, lon: -79.3873227 } ] } } } ) puts response
const response = await client.index({ index: "place", id: 1, document: { suggest: { input: "timmy's", contexts: { location: [ { lat: 43.6624803, lon: -79.3863353, }, { lat: 43.6624718, lon: -79.3873227, }, ], }, }, }, }); console.log(response);
PUT place/_doc/1 { "suggest": { "input": "timmy's", "contexts": { "location": [ { "lat": 43.6624803, "lon": -79.3863353 }, { "lat": 43.6624718, "lon": -79.3873227 } ] } } }
Geo location Query
editSuggestions can be filtered and boosted with respect to how close they are to one or more geo points. The following filters suggestions that fall within the area represented by the encoded geohash of a geo point:
resp = client.search( index="place", suggest={ "place_suggestion": { "prefix": "tim", "completion": { "field": "suggest", "size": 10, "contexts": { "location": { "lat": 43.662, "lon": -79.38 } } } } }, ) print(resp)
response = client.search( index: 'place', body: { suggest: { place_suggestion: { prefix: 'tim', completion: { field: 'suggest', size: 10, contexts: { location: { lat: 43.662, lon: -79.38 } } } } } } ) puts response
const response = await client.search({ index: "place", suggest: { place_suggestion: { prefix: "tim", completion: { field: "suggest", size: 10, contexts: { location: { lat: 43.662, lon: -79.38, }, }, }, }, }, }); console.log(response);
POST place/_search { "suggest": { "place_suggestion": { "prefix": "tim", "completion": { "field": "suggest", "size": 10, "contexts": { "location": { "lat": 43.662, "lon": -79.380 } } } } } }
When a location with a lower precision at query time is specified, all suggestions that fall within the area will be considered.
If multiple categories or category contexts are set on the query they are merged as a disjunction. This means that suggestions match if they contain at least one of the provided context values.
Suggestions that are within an area represented by a geohash can also be boosted higher than others, as shown by the following:
resp = client.search( index="place", pretty=True, suggest={ "place_suggestion": { "prefix": "tim", "completion": { "field": "suggest", "size": 10, "contexts": { "location": [ { "lat": 43.6624803, "lon": -79.3863353, "precision": 2 }, { "context": { "lat": 43.6624803, "lon": -79.3863353 }, "boost": 2 } ] } } } }, ) print(resp)
response = client.search( index: 'place', pretty: true, body: { suggest: { place_suggestion: { prefix: 'tim', completion: { field: 'suggest', size: 10, contexts: { location: [ { lat: 43.6624803, lon: -79.3863353, precision: 2 }, { context: { lat: 43.6624803, lon: -79.3863353 }, boost: 2 } ] } } } } } ) puts response
const response = await client.search({ index: "place", pretty: "true", suggest: { place_suggestion: { prefix: "tim", completion: { field: "suggest", size: 10, contexts: { location: [ { lat: 43.6624803, lon: -79.3863353, precision: 2, }, { context: { lat: 43.6624803, lon: -79.3863353, }, boost: 2, }, ], }, }, }, }, }); console.log(response);
POST place/_search?pretty { "suggest": { "place_suggestion": { "prefix": "tim", "completion": { "field": "suggest", "size": 10, "contexts": { "location": [ { "lat": 43.6624803, "lon": -79.3863353, "precision": 2 }, { "context": { "lat": 43.6624803, "lon": -79.3863353 }, "boost": 2 } ] } } } } }
The context query filters for suggestions that fall under
the geo location represented by a geohash of (43.662, -79.380)
with a precision of 2 and boosts suggestions
that fall under the geohash representation of (43.6624803, -79.3863353)
with a default precision of 6 by a factor of |
If a suggestion entry matches multiple contexts the final score is computed as the maximum score produced by any matching contexts.
In addition to accepting context values, a context query can be composed of
multiple context clauses. The following parameters are supported for a
geo
context clause:
|
A geo point object or a geo hash string to filter or boost the suggestion by. This is mandatory. |
|
The factor by which the score of the suggestion
should be boosted, the score is computed by
multiplying the boost with the suggestion weight,
defaults to |
|
The precision of the geohash to encode the query geo point.
This can be specified as a distance value ( |
|
Accepts an array of precision values at which
neighbouring geohashes should be taken into account.
precision value can be a distance value ( |
The precision field does not result in a distance match.
Specifying a distance value like 10km
only results in a geohash precision value that represents tiles of that size.
The precision will be used to encode the search geo point into a geohash tile for completion matching.
A consequence of this is that points outside that tile, even if very close to the search point, will not be matched.
Reducing the precision, or increasing the distance, can reduce the risk of this happening, but not entirely remove it.
Returning the type of the suggester
editSometimes you need to know the exact type of a suggester in order to parse its results. The typed_keys
parameter
can be used to change the suggester’s name in the response so that it will be prefixed by its type.
Considering the following example with two suggesters term
and phrase
:
resp = client.search( typed_keys=True, suggest={ "text": "some test mssage", "my-first-suggester": { "term": { "field": "message" } }, "my-second-suggester": { "phrase": { "field": "message" } } }, ) print(resp)
response = client.search( typed_keys: true, body: { suggest: { text: 'some test mssage', "my-first-suggester": { term: { field: 'message' } }, "my-second-suggester": { phrase: { field: 'message' } } } } ) puts response
const response = await client.search({ typed_keys: "true", suggest: { text: "some test mssage", "my-first-suggester": { term: { field: "message", }, }, "my-second-suggester": { phrase: { field: "message", }, }, }, }); console.log(response);
POST _search?typed_keys { "suggest": { "text" : "some test mssage", "my-first-suggester" : { "term" : { "field" : "message" } }, "my-second-suggester" : { "phrase" : { "field" : "message" } } } }
In the response, the suggester names will be changed to respectively term#my-first-suggester
and
phrase#my-second-suggester
, reflecting the types of each suggestion:
{ "suggest": { "term#my-first-suggester": [ { "text": "some", "offset": 0, "length": 4, "options": [] }, { "text": "test", "offset": 5, "length": 4, "options": [] }, { "text": "mssage", "offset": 10, "length": 6, "options": [ { "text": "message", "score": 0.8333333, "freq": 4 } ] } ], "phrase#my-second-suggester": [ { "text": "some test mssage", "offset": 0, "length": 16, "options": [ { "text": "some test message", "score": 0.030227963 } ] } ] }, ... }
On this page
- Request
- Examples
- Global suggest text
- Term suggester
- Common suggest options:
- Other term suggest options:
- Phrase Suggester
- API Example
- Basic Phrase suggest API parameters
- Smoothing Models
- Candidate Generators
- Direct Generators
- Completion Suggester
- Mapping
- Parameters for
completion
fields - Indexing
- Querying
- Skip duplicate suggestions
- Fuzzy queries
- Regex queries
- Context Suggester
- Category Context
- Geo location Context
- Returning the type of the suggester