Precision tuning (beta)

edit

Precision tuning (beta)

edit

This functionality is in beta. Beta features are subject to change and are not covered by the support SLA of general release (GA) features. Elastic plans to promote this feature to GA in a future release.

Precision tuning has no effect in queries that contain synonyms.

To use precision tuning with an Elasticsearch based engine, you must define text subfields that conform to the following: Elasticsearch engines text field conventions.

App Search defaults to high recall results: we cast a wide net on your searches. You can use precision tuning to search with a different level of precision and recall—​tightening or loosening the term and phrase requirements needed for a document to be considered a match to a given query. Generally, more precision leads to less recall: getting more specific results usually comes at the cost of a lower tolerance of errors or variations in queries.

Tune precision

edit

To tune precision, set precision tuning values in any of the following ways:

Tune precision per engine using the UI

edit

Tune precision within Kibana: Navigate to SearchEnterprise SearchApp SearchEnginesengine nameRelevance tuning. Locate Precision tuning, and set the default precision value for the engine. The value will therefore apply to all queries sent to that engine that don’t provide their own precision value.

Precision tuning is not available for Elasticsearch based engines that do not conform to Elasticsearch engines text field conventions.

Tune precision per engine using the API

edit

Use the search settings API to set the default precision value for an engine. The value will therefore apply to all queries sent to that engine that don’t provide their own precision value.

Tune precision per query using the API

edit

Use the precision parameter of the search API to set the value for a particular query. When set per query, the precision value overrides the default value set for the engine.

Precision tuning values

edit

Precision tuning combines analyzers, fuzzy queries and term and phrase matching using numeric values.

Precision tuning values are integers that range from 1 to 11. The range of values represents a sliding scale that manages the inherent tradeoff between precision and recall. Lower values favor recall, while higher values favor precision.

The precision tuning value for a query changes which documents match that query.

The following table describes each precision tuning value, including the affect on analyzers, fuzzy queries and term and phrase matching. You can change the precision tuning for the same query and observe the effects it has on the results. Experiment with different values to find the value that works best with each engine’s documents.

Value Description Analyzers Fuzzy queries Phrase matching

1

Lowest precision and highest recall setting.

All

Yes

At least one term in any field must match.

2

Default. High recall, low precision.

All

Yes

Less than half of the terms must match.

3

Increasing phrase matching: half the terms.

All

Yes

Queries with two or fewer terms require all terms to match. With more terms, half the terms must match (rounded up).

4

Increasing phrase matching: three-quarters of the terms.

All

Yes

Queries with three or fewer terms require all terms to match, then three-quarters of terms must match (rounded down).

5

Increasing phrase matching requirements: all but one of the terms.

All

Yes

Queries with four or fewer terms require all terms to match, then all but one terms must match.

6

All terms must match.

All

Yes

Every term must appear in the document, in any field.

7

The strictest phrase matching requirement: all terms must match, and in the same field.

All

Yes

Every term must appear in the document.

8

Decreasing typo tolerance: advanced typo tolerance is disabled.

All

No

Every term must appear in the same field in the document.

9

Decreasing term matching: prefixing is disabled.

Default Stem Joined

No

Every term must appear in the same field.

10

Decreasing typo-tolerance: no compound-word correction.

Default Stem

No

Every term must appear in the same field.

11

Exact spelling matches only.

Default

No

Every tokenized term must appear in the same field. NOTE: This is not an exact match against the field value (e.g. a search for "PART-123" can return documents that contain both tokenized "PART" and "123" terms such as "PART-123-456"). To exactly match a field value, use Search API filters.

Precision tuning concepts

edit

The following concepts describe how precision tuning works:

Analyzers

edit

Precision tuning works by using different analyzers for your documents fields, using multi fields. Using different analyzers allows you to change how search queries look for results.

Enterprise Search provides the following analyzers for text fields:

Default analyzer
edit

The default analyzer does not change the analysis for a text field. It merely ignores upper and lower casing, and removes stop words according to the language used for the engine.

For example, a "a brand new super duper model" query would match a document containing "Brand New Super Duper model".

Stem analyzer
edit

The stem analyzer tries to retrieve the root from the different words that are introduced. This ensures variants of a word match during a search.

This analyzer depends on the language chosen for the engine, as different strategies are used for obtaining the roots for a word depending on the language.

For example, a query for "the fox jumps" would match a document containing "the foxes jumping". This ensures that any variation on the verb used (jump, jumps, jumping) or the noun (fox, foxes) are not taken into account for retrieving the search results.

For more information, see stemming

Prefix analyzer
edit

The prefix analyzer uses a query as a prefix for matching words. Using the prefix analyzer, a "congrat" query would match documents containing both "congratulations" and "congrats", as they share the same prefix as the input query.

The prefix analyzer is useful for autocomplete and suggestion search types, where we are interested in results that match a specified prefix.

Joined analyzer
edit

The joined analyzer checks separate words as if they were a single word. For example, a query for "ecommerce" would match a document containing "e commerce".

It is useful to allow for words that can appear joined or separated in searches and document results.

Delimiter analyzer
edit

The delimiter analyzer removes some delimiters that might not be meaningful to the search. For example, a query containing "super-duper-xl" would match documents containing "super duper xl" and "superduperxl".

It is similar to the joined analyzer conceptually, but instead of being focused on words or part of words, it removes delimiters that may not be meaningful to the search.

It is useful to remove delimiters and focus on the text content of the search.

For more information about the delimiter capabilities, see delimiter token filter.

Fuzzy queries

edit

Fuzzy queries create small variations of the query terms, by changing one or more characters:

  • Changing a character (box → fox)
  • Removing a character (black → lack)
  • Inserting a character (sic → sick)
  • Transposing two adjacent characters (act → cat)

Depending on how long each query term is, more characters or fewer characters are allowed to change:

  • Words 1 or 2 letters long, or the first two letters of a longer word, must match exactly ("at" won’t match "ax", "click" won’t match "slick")
  • Words with length 3 to 5 can differ in 1 character ("click" will match "clack")
  • Words with more than 5 letters can differ in 2 characters ("fussiness" will match "fuzziness")

Only the default analyzer and the stem analyzer are used in fuzzy queries.

Fuzzy queries are helpful for allowing typo tolerance in searches.

Matching terms and phrases

edit

App Search matches documents to a query at the term and phrase levels.

  • Term matching refers to how App Search handles individual terms within queries and documents. Terms are usually words, but can be any arbitrary group of letters or numbers. App Search uses Analyzers to process text into terms.
  • Phrase Matching applies when a query contains multiple terms. When determining which documents match a query, App Search may consider the number of query terms that appear in the document, the ordering of the terms, or where in the document the terms appear (for example, within the same field). See Phrase matching examples for examples on phrase matching.

Troubleshooting precision tuning

edit

Precision tuning is not an exact science. You may find that some results are not what you would expect.

These are some recommendations for understanding precision tuning results:

  • Review the descriptions for the precision tuning values. Review the analyzers and matching for the current precision tuning value.
  • Within the precision tuning UI, test different values using the precision slider, and experiment with different query terms.
  • Use the Search Explain API to understand the Elasticsearch query for different precision settings.
  • Use the Elasticsearch Search Explain API to understand why a particular result does (or does not) match an Elasticsearch query

Examples

edit
Phrase matching examples
edit

Consider an engine with a single document with a title field "American Samoa National Park".

Let’s change precision tuning values and check what the results are:

Precision value

Query

Results

Explanation

1

Joshua Tree Park

Yes

A single term in the query (Park) causes the result to be retrieved

2

Joshua Tree Park

No

Fewer than half of the terms in the query match (only Park)

2

Joshua Tree National Park

Yes

Half of the terms in the query (National, Park) match with the document

3

Joshua Park

No

It’s a 2 term query, all elements must match

3

National Park

Yes

It’s a 2 term query, and all elements match

3

Joshua Tree National Park

Yes

It’s a query with more than 2 terms, so it’s enough for half of them to match

4

Joshua Tree Park

No

It’s a query with 3 terms, so every term must match

4

Joshua Tree National Park

No

It’s a query with more than 3 terms, and only half of them match (National Park)

4

Joshua Tree American National Park

Yes

It’s a query with more than 3 terms, and three quarters of them match (American National Park)

5

American Tree National Park

No

It’s a query with 4 terms, all terms should match

5

American Samoa Tree National Park

Yes

It’s a query with more than 4 terms, all terms but one should match (Tree does not match)

5

American Samoa Joshua Tree National Park

No

It’s a query with more than 4 terms, all terms but one should match (Joshua and Tree do not match)

Term matching examples
edit

Consider an engine with a single document with the following fields:

  • title: e-commerce results
  • year: FY 2022

Let’s change precision tuning values and check what the results are:

Precision values

Query

Results

Explanation.

1 to 9

ecommerce

Yes

Delimiter analyzer allows matching ecommerce query to e-commerce.

10 to 11

ecommerce

No

Delimiter analyzer is not active for precision levels 10-11, so ecommerce does not match e-commerce.

1 to 9

FY2022

Yes

Joined analyzer allows matching FY2022 query to FY 2022.

10 to 11

FY2022

No

Joined analyzer is not active for precision levels 10-11, so FY2022 does not match FY 2022.

1 to 10

resulting

Yes

Stemming is used for finding the root for resulting and results.

11

resulting

No

Stemming is disabled at value 11.

1 to 8

res

Yes

Prefix analyzer is used for retrieving result.

9 to 11

res

No

Prefixing is disabled from precision value 9.

1 to 8

comerc

Yes

Fuzzy matching allows up to two characters (as it’s a word longer than 5 characters) to be missing from commerce.

9 to 11

comerc

No

Fuzzy matching is disabled, comerc cannot match e-commerce.

1 to 6

e-commerce FY 2022

Yes

Query terms can be present in any field (title and year).

7 to 11

e-commerce FY 2022

No

Query terms must be present in the same field from precision level 7. e-commerce is in title field and FY 2022 in year field, so there is no match.