Relevance Tuning Guide, Weights and Boosts
editRelevance Tuning Guide, Weights and Boosts
editOut of the box, App Search provides quality search relevance.
Built on-top of Elasticsearch, App Search is a managed, expertly crafted distillation of its finest points.
It provides tools to help you further tune the search experience to optimize for your own needs.
Relevance Tuning has two core components: Weights and Boosts.
Before we get into them, we shall take a quick dive into the basics of fields.
Quick Recap: Fields
editOnce documents appear within an Engine, they appear alongside a schema.
The schema takes all of the fields of a document and defines a matching type.
The National Parks demo Engine has a schema like this:
{ "description": "text", "nps_link": "text", "states": "text", "title": "text", "visitors": "number", "world_heritage_site": "text", "location": "geolocation", "acres": "number", "square_km": "number", "date_established": "date" }
Your fields will be different, but their type will be one of: text, number, geolocation, and date.
Relevance Tuning is changing how fields are weighted against one another or boosting relevance given a value within a field.
Note: You must have at least two schema fields to tune relevance.
Add documents with multiple fields, or add more schema fields through the dashboard or the API to address this.
Weights
editEach field has a possible weight of 0 to 10, 10 being the most substantial weight.
Without the ability to tune field weight, you would run into uncomfortable situations like this:
You have one application called Magicapp...
{ "title": "Magicapp", "subtitle": "The most magical app of all", "description": "The original, magical app that started it all." }
... And another called Mysticapp:
{ "title": "Mysticapp", "subtitle": "An app that is similar to Magicapp", "description": "Similar to Magicapp, but improved in many areas." }
If someone were to use our search engine to look for "magicapp", which would they find?
If all fields were equal, Mysticapp would be the first result: the name Magicapp is present twice within two different fields. This is not ideal.
We want people to find the app they are looking for, and so we need to prioritize the title
field.
We can increase its weight so that it is more impactful than the subtitle
and description
fields.
If title
had higher weight, people would find Magicapp before Mysticapp — as they should, it is the best and most original!
There are two different ways to adjust weight: via the dashboard or via the Search API.
Weights via the Dashboard
editWithin your Engine, click on Relevance Tuning.
The initial view will show all of your schema fields with their default weight:
Relevance Tuning, Weights - All of your schema fields.
Next to our schema fields, there is a query tester.
We want people who are looking for mountains to find a park that is known for the splendor of its mountains.
So, we will use "mountains" as our reference query.
Relevance Tuning, Weights - Within the search bar on the right, we can test queries. A query for "mountains" returns fairly relevant results.
The results that we see include parks that contain mountains.
The term is present within one of the result’s text fields: title
or description
, most likely.
Our results could be more accurate...
Naturally, National Parks that include the term "mountains" in the title
have beautiful mountain ranges.
We should adjust the weight of the title
field.
As we do so, the results for our reference query will shift in real time.
Relevance Tuning, Weights - Tuning the weight of the "title" field. The weight is now "3" and we see more relevant results within our sample "mountains" query.
The field we have adjusted, title
, is now bolded and marked with the Weights icon.
Two of our parks fell from our top three results, and the overall result set now reflects more relevant parks for those looking for "mountains".
Once we click Save, the change will be live.
Weights via API
editWeights are applied at query time.
Within the /search
API endpoint, a weight value can be passed within the search_fields
object on each search.
The search_field
object can define fields.
Only the defined fields will be returned within your result set:
curl -X GET '<ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/national-parks-demo/search' \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer search-soaewu2ye6uc45dr8mcd54v8' \ -d '{ "search_fields": { "title": { "weight": 10 }, "description": { "weight": 1 }, "states": { "weight": 2 } }, "query": "mountains" }'
In our example, we are asking to return only three fields within our results: title
, description
, and states
.
We are weighting each field: 10
, 1
, and 2
, respectively.
Through the API, you can build fluid search, leveraging dynamic weights when you need them.
Read the Weights API Reference.
Boosts
editWeights are applied to fields. Boosts are set-up on top of fields, but they are applied to field values.
There are four kinds of boosts:
-
Value Boost: Applies to text, number, and date fields. A value boost looks for a specific value within a given field. The value might be
true
,1
ortomorrow
. If the value is present within the boosted field, then the entire document is boosted. -
Functional Boost: Applies only to number fields. You can boost depending on a number value. If you have a
rating
field, for example, you can ensure that highly rated documents will appear. - Proximity Boost: Applies to number and geolocation fields. Provide a mode, or a "center", then boost results given their proximity to that center. Take GPS coordinates from a user and display the nearest results, for example.
-
Recency Boost: A type of Proximity Boost, it will take a date field as the center. If one were to be searching for concerts, you can boost concert documents given their proximity to
now
, or any other date in the future.
Boosts are highly flexible, giving you multiple functions to apply depending on the type of boost: linear, exponential, gaussian, and logarithmic.
Optionally, a factor
between 0 and 10 can be provided to dictate just how assertive your boost is.
Use a boost to increase relevance. A boost cannot decrease the relevance of a result.
You can calibrate boosts via the dashboard or using the Search API.
Consider learning the concepts of the different boost functions before getting into practice!
Conceptual: Linear, Exponential, Gaussian, Logarithmic
editWhen boosting on number, date, or geolocation fields, you will need to define a function
parameter and a factor
.
There are four types of function, depending on the boost: linear, exponential, gaussian, and logarithmic.
The function and factor are used to compute half of the boosted relevance score, known as the boost value.
The other half is the original document score.
They combine to produce the overall document score, which governs the order of the result set.
You can provide an operation
value - either add
or multiply
- to combine the two halves using either addition or multiplication:
- Add: (Original Document Score) + (Boost Value) = Overall Document Score
- Multiply: (Original Document Score) * (Boost Value) = Overall Document Score
|
Provides a multiplicative boost. Multiplies the |
|
Provides an exponential boost. The value within the boosted fields becomes the exponent. The base of that exponent is Euler’s Number. |
|
Gaussian distribution is more commonly known as a bell curve. Results are distributed across the curve and given weight relative to their position. Consider you are boosting the |
|
A logarithmic curve is one that raises quickly along the y axis, but then rises slowly as you travel along the x axis. This function is ideal when you consider something like a ratings field. If you had a |
Boosts via the Dashboard
editClick into Relevance Tuning.
Relevance Tuning, Weights - Click Relevance Tuning and see your schema.
Place a reference query within the query tester.
This will give us a baseline that we can use to calibrate the boost.
We will use "old growth".
Relevance Tuning, Weights - The "old growth" query within the query tester.
Now, click or mouse over to the Add Boosts button.
Depending on the type of field - number, text, geolocation, or date - different boosting options will appear.
We want to put a Value Boost on the world_heritage_site
field, which is of type text.
Relevance Tuning, Weights - Mouse over or click the Add Boost button, then select the ideal boost.
A Value Boost will look at the value of the field and then apply the boost given what is present.
The world_heritage_site
field can be either: true
or false
.
We want to give sites that are world heritage sites precedence, so we provide true
as our value, then calibrate the impact of the boost.
Relevance Tuning, Weights - The value true
has been provided and the impact of the boost has been scaled to 4.6.
An icon will appear next to the boosted field that matches the type of boost.
In this case, the world_heritage_site
has the Value Boost icon and the field is bolded.
As we drag the impact slider higher or lower, we can see the results shift.
As a result of the boost, Yosemite National Park is now our top rated park for the "old growth" query because it is also a world heritage site.
Once we click Save, the boost will impact all queries.
Boosts via API
editLike Weights, Boosts are applied at query time.
Using the /search
endpoint, a boost
object can be passed in along with each unique search.
We can look at a Value Boost:
curl -X GET '<ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/national-parks-demo/search' \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer search-soaewu2ye6uc45dr8mcd54v8' \ -d '{ "query": "old growth", "boosts": { "world_heritage_site": [ { "type": "value", "value": "true", "operation": "multiply", "factor": 10 } ] } }'
Within our example Engine, the National Parks demo, we have established a boost on the world_heritage_site
field.
The field is a text field and each document has either true
or false
as their value.
Our query gives a 10x boost to relevance when documents contain world_heritage_site: true
.
A Proximity Boost utilizes deep geolocation capability:
curl -X GET '<ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/national-parks-demo/search' \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer search-soaewu2ye6uc45dr8mcd54v8' \ -d '{ "query": "old growth", "boosts": { "location": { "type": "proximity", "function": "exponential", "center": "25.32, -80.93", "factor": 3 } } }'
In this second example, we are looking for "old growth" parks.
But we’re boosting the results based on their location to the Elastic office in San Francisco.
For experimentation, try putting your own coordinates as the center value.
Which "old growth" US National Park is closest to you?
Read the Boosts API Reference.
Search Settings API
editThe Search Settings API can also be used to adjust weights and boosts.
Unlike the other endpoints, Search Settings does not set them at query time.
Similar to how you define them in the dashboard, Search Settings apply to all subsequent queries.
Read the Search Settings API Reference