- Elasticsearch Guide: other versions:
- Elasticsearch introduction
- Getting started with Elasticsearch
- Set up Elasticsearch
- Installing Elasticsearch
- Configuring Elasticsearch
- Important Elasticsearch configuration
- Important System Configuration
- Bootstrap Checks
- Heap size check
- File descriptor check
- Memory lock check
- Maximum number of threads check
- Max file size check
- Maximum size virtual memory check
- Maximum map count check
- Client JVM check
- Use serial collector check
- System call filter check
- OnError and OnOutOfMemoryError checks
- Early-access check
- G1GC check
- All permission check
- Discovery configuration check
- Starting Elasticsearch
- Stopping Elasticsearch
- Adding nodes to your cluster
- Set up X-Pack
- Configuring X-Pack Java Clients
- Bootstrap Checks for X-Pack
- Upgrade Elasticsearch
- API conventions
- Document APIs
- Search APIs
- Aggregations
- Metrics Aggregations
- Avg Aggregation
- Weighted Avg Aggregation
- Cardinality Aggregation
- Extended Stats Aggregation
- Geo Bounds Aggregation
- Geo Centroid Aggregation
- Max Aggregation
- Min Aggregation
- Percentiles Aggregation
- Percentile Ranks Aggregation
- Scripted Metric Aggregation
- Stats Aggregation
- Sum Aggregation
- Top Hits Aggregation
- Value Count Aggregation
- Median Absolute Deviation Aggregation
- Bucket Aggregations
- Adjacency Matrix Aggregation
- Auto-interval Date Histogram Aggregation
- Children Aggregation
- Composite Aggregation
- Date Histogram Aggregation
- Date Range Aggregation
- Diversified Sampler Aggregation
- Filter Aggregation
- Filters Aggregation
- Geo Distance Aggregation
- GeoHash grid Aggregation
- GeoTile Grid Aggregation
- Global Aggregation
- Histogram Aggregation
- IP Range Aggregation
- Missing Aggregation
- Nested Aggregation
- Parent Aggregation
- Range Aggregation
- Reverse nested Aggregation
- Sampler Aggregation
- Significant Terms Aggregation
- Significant Text Aggregation
- Terms Aggregation
- Pipeline Aggregations
- Avg Bucket Aggregation
- Derivative Aggregation
- Max Bucket Aggregation
- Min Bucket Aggregation
- Sum Bucket Aggregation
- Stats Bucket Aggregation
- Extended Stats Bucket Aggregation
- Percentiles Bucket Aggregation
- Moving Average Aggregation
- Moving Function Aggregation
- Cumulative Sum Aggregation
- Bucket Script Aggregation
- Bucket Selector Aggregation
- Bucket Sort Aggregation
- Serial Differencing Aggregation
- Matrix Aggregations
- Caching heavy aggregations
- Returning only aggregation results
- Aggregation Metadata
- Returning the type of the aggregation
- Metrics Aggregations
- Indices APIs
- Create Index
- Delete Index
- Get Index
- Indices Exists
- Open / Close Index API
- Shrink Index
- Split Index
- Rollover Index
- Put Mapping
- Get Mapping
- Get Field Mapping
- Types Exists
- Index Aliases
- Update Indices Settings
- Get Settings
- Analyze
- Index Templates
- Indices Stats
- Indices Segments
- Indices Recovery
- Indices Shard Stores
- Clear Cache
- Flush
- Refresh
- Force Merge
- cat APIs
- Cluster APIs
- Query DSL
- Scripting
- Mapping
- Analysis
- Anatomy of an analyzer
- Testing analyzers
- Analyzers
- Normalizers
- Tokenizers
- Standard Tokenizer
- Letter Tokenizer
- Lowercase Tokenizer
- Whitespace Tokenizer
- UAX URL Email Tokenizer
- Classic Tokenizer
- Thai Tokenizer
- NGram Tokenizer
- Edge NGram Tokenizer
- Keyword Tokenizer
- Pattern Tokenizer
- Char Group Tokenizer
- Simple Pattern Tokenizer
- Simple Pattern Split Tokenizer
- Path Hierarchy Tokenizer
- Path Hierarchy Tokenizer Examples
- Token Filters
- ASCII Folding Token Filter
- Flatten Graph Token Filter
- Length Token Filter
- Lowercase Token Filter
- Uppercase Token Filter
- NGram Token Filter
- Edge NGram Token Filter
- Porter Stem Token Filter
- Shingle Token Filter
- Stop Token Filter
- Word Delimiter Token Filter
- Word Delimiter Graph Token Filter
- Multiplexer Token Filter
- Conditional Token Filter
- Predicate Token Filter Script
- Stemmer Token Filter
- Stemmer Override Token Filter
- Keyword Marker Token Filter
- Keyword Repeat Token Filter
- KStem Token Filter
- Snowball Token Filter
- Phonetic Token Filter
- Synonym Token Filter
- Parsing synonym files
- Synonym Graph Token Filter
- Compound Word Token Filters
- Reverse Token Filter
- Elision Token Filter
- Truncate Token Filter
- Unique Token Filter
- Pattern Capture Token Filter
- Pattern Replace Token Filter
- Trim Token Filter
- Limit Token Count Token Filter
- Hunspell Token Filter
- Common Grams Token Filter
- Normalization Token Filter
- CJK Width Token Filter
- CJK Bigram Token Filter
- Delimited Payload Token Filter
- Keep Words Token Filter
- Keep Types Token Filter
- Exclude mode settings example
- Classic Token Filter
- Apostrophe Token Filter
- Decimal Digit Token Filter
- Fingerprint Token Filter
- Minhash Token Filter
- Remove Duplicates Token Filter
- Character Filters
- Modules
- Index modules
- Ingest node
- Pipeline Definition
- Ingest APIs
- Accessing Data in Pipelines
- Conditional Execution in Pipelines
- Handling Failures in Pipelines
- Processors
- Append Processor
- Bytes Processor
- Convert Processor
- Date Processor
- Date Index Name Processor
- Dissect Processor
- Dot Expander Processor
- Drop Processor
- Fail Processor
- Foreach Processor
- GeoIP Processor
- Grok Processor
- Gsub Processor
- Join Processor
- JSON Processor
- KV Processor
- Lowercase Processor
- Pipeline Processor
- Remove Processor
- Rename Processor
- Script Processor
- Set Processor
- Set Security User Processor
- Split Processor
- Sort Processor
- Trim Processor
- Uppercase Processor
- URL Decode Processor
- User Agent processor
- Managing the index lifecycle
- Getting started with index lifecycle management
- Policy phases and actions
- Set up index lifecycle management policy
- Using policies to manage index rollover
- Update policy
- Index lifecycle error handling
- Restoring snapshots of managed indices
- Start and stop index lifecycle management
- Using ILM with existing indices
- SQL access
- Monitor a cluster
- Rolling up historical data
- Frozen indices
- Set up a cluster for high availability
- X-Pack APIs
- Info API
- Cross-cluster replication APIs
- Explore API
- Freeze index
- Index lifecycle management API
- Licensing APIs
- Migration APIs
- Machine learning APIs
- Add events to calendar
- Add jobs to calendar
- Close jobs
- Create calendar
- Create datafeeds
- Create filter
- Create jobs
- Delete calendar
- Delete datafeeds
- Delete events from calendar
- Delete filter
- Delete forecast
- Delete jobs
- Delete jobs from calendar
- Delete model snapshots
- Delete expired data
- Find file structure
- Flush jobs
- Forecast jobs
- Get calendars
- Get buckets
- Get overall buckets
- Get categories
- Get datafeeds
- Get datafeed statistics
- Get influencers
- Get jobs
- Get job statistics
- Get machine learning info
- Get model snapshots
- Get scheduled events
- Get filters
- Get records
- Open jobs
- Post data to jobs
- Preview datafeeds
- Revert model snapshots
- Set upgrade mode
- Start datafeeds
- Stop datafeeds
- Update datafeeds
- Update filter
- Update jobs
- Update model snapshots
- Rollup APIs
- Security APIs
- Authenticate
- Change passwords
- Clear cache
- Clear roles cache
- Create API keys
- Create or update application privileges
- Create or update role mappings
- Create or update roles
- Create or update users
- Delete application privileges
- Delete role mappings
- Delete roles
- Delete users
- Disable users
- Enable users
- Get API key information
- Get application privileges
- Get role mappings
- Get roles
- Get token
- Get users
- Has privileges
- Invalidate API key
- Invalidate token
- SSL certificate
- Unfreeze index
- Watcher APIs
- Definitions
- Secure a cluster
- Overview
- Configuring security
- Encrypting communications in Elasticsearch
- Encrypting communications in an Elasticsearch Docker Container
- Enabling cipher suites for stronger encryption
- Separating node-to-node and client traffic
- Configuring an Active Directory realm
- Configuring a file realm
- Configuring an LDAP realm
- Configuring a native realm
- Configuring a PKI realm
- Configuring a SAML realm
- Configuring a Kerberos realm
- FIPS 140-2
- Security files
- How security works
- User authentication
- Built-in users
- Internal users
- Token-based authentication services
- Realms
- Realm chains
- Active Directory user authentication
- File-based user authentication
- LDAP user authentication
- Native user authentication
- PKI user authentication
- SAML authentication
- Kerberos authentication
- Integrating with other authentication systems
- Enabling anonymous access
- Controlling the user cache
- Configuring SAML single-sign-on on the Elastic Stack
- User authorization
- Auditing security events
- Encrypting communications
- Restricting connections with IP filtering
- Cross cluster search, clients, and integrations
- Tutorial: Getting started with security
- Tutorial: Encrypting communications
- Troubleshooting
- Some settings are not returned via the nodes settings API
- Authorization exceptions
- Users command fails due to extra arguments
- Users are frequently locked out of Active Directory
- Certificate verification fails for curl on Mac
- SSLHandshakeException causes connections to fail
- Common SSL/TLS exceptions
- Common Kerberos exceptions
- Common SAML issues
- Internal Server Error in Kibana
- Setup-passwords command fails due to connection failure
- Failures due to relocation of the configuration files
- Limitations
- Alerting on cluster and index events
- Command line tools
- How To
- Testing
- Glossary of terms
- Release highlights
- Breaking changes
- Release notes
Rank feature query
editRank feature query
editBoosts the relevance score of documents based on the
numeric value of a rank_feature
or
rank_features
field.
The rank_feature
query is typically used in the should
clause of a
bool
query so its relevance scores are added to other
scores from the bool
query.
Unlike the function_score
query or other
ways to change relevance scores, the
rank_feature
query efficiently skips non-competitive hits when the
track_total_hits
parameter is not true
. This can
dramatically improve query speed.
Rank feature functions
editTo calculate relevance scores based on rank feature fields, the rank_feature
query supports the following mathematical functions:
If you don’t know where to start, we recommend using the saturation
function.
If no function is provided, the rank_feature
query uses the saturation
function by default.
Example request
editIndex setup
editTo use the rank_feature
query, your index must include a
rank_feature
or rank_features
field
mapping. To see how you can set up an index for the rank_feature
query, try
the following example.
Create a test
index with the following field mappings:
-
pagerank
, arank_feature
field which measures the importance of a website -
url_length
, arank_feature
field which contains the length of the website’s URL. For this example, a long URL correlates negatively to relevance, indicated by apositive_score_impact
value offalse
. -
topics
, arank_features
field which contains a list of topics and a measure of how well each document is connected to this topic
PUT /test { "mappings": { "properties": { "pagerank": { "type": "rank_feature" }, "url_length": { "type": "rank_feature", "positive_score_impact": false }, "topics": { "type": "rank_features" } } } }
Index several documents to the test
index.
PUT /test/_doc/1?refresh { "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics", "content": "Rio 2016", "pagerank": 50.3, "url_length": 42, "topics": { "sports": 50, "brazil": 30 } } PUT /test/_doc/2?refresh { "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix", "content": "Formula One motor race held on 13 November 2016", "pagerank": 50.3, "url_length": 47, "topics": { "sports": 35, "formula one": 65, "brazil": 20 } } PUT /test/_doc/3?refresh { "url": "http://en.wikipedia.org/wiki/Deadpool_(film)", "content": "Deadpool is a 2016 American superhero film", "pagerank": 50.3, "url_length": 37, "topics": { "movies": 60, "super hero": 65 } }
Example query
editThe following query searches for 2016
and boosts relevance scores based or
pagerank
, url_length
, and the sports
topic.
GET /test/_search { "query": { "bool": { "must": [ { "match": { "content": "2016" } } ], "should": [ { "rank_feature": { "field": "pagerank" } }, { "rank_feature": { "field": "url_length", "boost": 0.1 } }, { "rank_feature": { "field": "topics.sports", "boost": 0.4 } } ] } } }
Top-level parameters for rank_feature
edit-
field
-
(Required, string)
rank_feature
orrank_features
field used to boost relevance scores. -
boost
-
(Optional, float) Floating point number used to decrease or increase relevance scores. Defaults to
1.0
.Boost values are relative to the default value of
1.0
. A boost value between0
and1.0
decreases the relevance score. A value greater than1.0
increases the relevance score. -
saturation
-
(Optional, function object) Saturation function used to boost relevance scores based on the value of the rank feature
field
. If no function is provided, therank_feature
query defaults to thesaturation
function. See Saturation for more information.Only one function
saturation
,log
, orsigmoid
can be provided. -
log
-
(Optional, function object) Logarithmic function used to boost relevance scores based on the value of the rank feature
field
. See Logarithm for more information.Only one function
saturation
,log
, orsigmoid
can be provided. -
sigmoid
-
(Optional, function object) Sigmoid function used to boost relevance scores based on the value of the rank feature
field
. See Sigmoid for more information.Only one function
saturation
,log
, orsigmoid
can be provided.
Notes
editSaturation
editThe saturation
function gives a score equal to S / (S + pivot)
, where S
is
the value of the rank feature field and pivot
is a configurable pivot value so
that the result will be less than 0.5
if S
is less than pivot and greater
than 0.5
otherwise. Scores are always (0,1)
.
If the rank feature has a negative score impact then the function will be
computed as pivot / (S + pivot)
, which decreases when S
increases.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "saturation": { "pivot": 8 } } } }
If a pivot
value is not provided, Elasticsearch computes a default value equal to the
approximate geometric mean of all rank feature values in the index. We recommend
using this default value if you haven’t had the opportunity to train a good
pivot value.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "saturation": {} } } }
Logarithm
editThe log
function gives a score equal to log(scaling_factor + S)
, where S
is the value of the rank feature field and scaling_factor
is a configurable
scaling factor. Scores are unbounded.
This function only supports rank features that have a positive score impact.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "log": { "scaling_factor": 4 } } } }
Sigmoid
editThe sigmoid
function is an extension of saturation
which adds a configurable
exponent. Scores are computed as S^exp^ / (S^exp^ + pivot^exp^)
. Like for the
saturation
function, pivot
is the value of S
that gives a score of 0.5
and scores are (0,1)
.
The exponent
must be positive and is typically in [0.5, 1]
. A
good value should be computed via training. If you don’t have the opportunity to
do so, we recommend you use the saturation
function instead.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "sigmoid": { "pivot": 7, "exponent": 0.6 } } } }
On this page