Full-Text Search Functions

edit

Search functions should be used when performing full-text search, namely when the MATCH or QUERY predicates are being used. Outside a, so-called, search context, these functions will return default values such as 0 or NULL.

Elasticsearch SQL optimizes all queries executed against Elasticsearch depending on the scoring needs. Using track_scores on the search request or _doc sorting that disables scores calculation, Elasticsearch SQL instructs Elasticsearch not to compute scores when these are not needed. For example, every time a SCORE() function is encountered in the SQL query, the scores are computed.

MATCH

edit

Synopsis:

MATCH(
    field_exp,   
    constant_exp 
    [, options]) 

Input:

field(s) to match

matching text

additional parameters; optional

Description: A full-text search option, in the form of a predicate, available in Elasticsearch SQL that gives the user control over powerful match and multi_match Elasticsearch queries.

The first parameter is the field or fields to match against. In case it receives one value only, Elasticsearch SQL will use a match query to perform the search:

SELECT author, name FROM library WHERE MATCH(author, 'frank');

    author     |       name
---------------+-------------------
Frank Herbert  |Dune
Frank Herbert  |Dune Messiah
Frank Herbert  |Children of Dune
Frank Herbert  |God Emperor of Dune

However, it can also receive a list of fields and their corresponding optional boost value. In this case, Elasticsearch SQL will use a multi_match query to match the documents:

SELECT author, name, SCORE() FROM library WHERE MATCH('author^2,name^5', 'frank dune');

    author     |       name        |    SCORE()
---------------+-------------------+---------------
Frank Herbert  |Dune               |11.443176
Frank Herbert  |Dune Messiah       |9.446629
Frank Herbert  |Children of Dune   |8.043278
Frank Herbert  |God Emperor of Dune|7.0029488

The multi_match query in Elasticsearch has the option of per-field boosting that gives preferential weight (in terms of scoring) to fields being searched in, using the ^ character. In the example above, the name field has a greater weight in the final score than the author field when searching for frank dune text in both of them.

Both options above can be used in combination with the optional third parameter of the MATCH() predicate, where one can specify additional configuration parameters (separated by semicolon ;) for either match or multi_match queries. For example:

SELECT author, name, SCORE() FROM library WHERE MATCH(name, 'to the star', 'operator=or;cutoff_frequency=0.2');

     author      |                name                |    SCORE()
-----------------+------------------------------------+---------------
Peter F. Hamilton|Pandora's Star                      |3.0997515
Douglas Adams    |The Hitchhiker's Guide to the Galaxy|3.1756816

In the more advanced example above, the cutoff_frequency parameter allows specifying an absolute or relative document frequency where high frequency terms are moved into an optional subquery and are only scored if one of the low frequency (below the cutoff) terms in the case of an or operator or all of the low frequency terms in the case of an and operator match. More about this you can find in the Cutoff frequency page.

The allowed optional parameters for a single-field MATCH() variant (for the match Elasticsearch query) are: analyzer, auto_generate_synonyms_phrase_query, cutoff_frequency, lenient, fuzziness, fuzzy_transpositions, fuzzy_rewrite, minimum_should_match, operator, max_expansions, prefix_length.

The allowed optional parameters for a multi-field MATCH() variant (for the multi_match Elasticsearch query) are: analyzer, auto_generate_synonyms_phrase_query, cutoff_frequency, lenient, fuzziness, fuzzy_transpositions, fuzzy_rewrite, minimum_should_match, operator, max_expansions, prefix_length, slop, tie_breaker, type.

QUERY

edit

Synopsis:

QUERY(
    constant_exp 
    [, options]) 

Input:

query text

additional parameters; optional

Description: Just like MATCH, QUERY is a full-text search predicate that gives the user control over the query_string query in Elasticsearch.

The first parameter is basically the input that will be passed as is to the query_string query, which means that anything that query_string accepts in its query field can be used here as well:

SELECT author, name, SCORE() FROM library WHERE QUERY('name:dune');

    author     |       name        |    SCORE()
---------------+-------------------+---------------
Frank Herbert  |Dune               |2.2886353
Frank Herbert  |Dune Messiah       |1.8893257
Frank Herbert  |Children of Dune   |1.6086556
Frank Herbert  |God Emperor of Dune|1.4005898

A more advanced example, showing more of the features that query_string supports, of course possible with Elasticsearch SQL:

SELECT author, name, page_count, SCORE() FROM library WHERE QUERY('_exists_:"author" AND page_count:>200 AND (name:/star.*/ OR name:duna~)');

      author      |       name        |  page_count   |    SCORE()
------------------+-------------------+---------------+---------------
Frank Herbert     |Dune               |604            |3.7164764
Frank Herbert     |Dune Messiah       |331            |3.4169943
Frank Herbert     |Children of Dune   |408            |3.2064917
Frank Herbert     |God Emperor of Dune|454            |3.0504425
Peter F. Hamilton |Pandora's Star     |768            |3.0
Robert A. Heinlein|Starship Troopers  |335            |3.0

The query above uses the _exists_ query to select documents that have values in the author field, a range query for page_count and regex and fuzziness queries for the name field.

If one needs to customize various configuration options that query_string exposes, this can be done using the second optional parameter. Multiple settings can be specified separated by a semicolon ;:

SELECT author, name, SCORE() FROM library WHERE QUERY('dune god', 'default_operator=and;default_field=name');

    author     |       name        |    SCORE()
---------------+-------------------+---------------
Frank Herbert  |God Emperor of Dune|3.6984892

The allowed optional parameters for QUERY() are: allow_leading_wildcard, analyze_wildcard, analyzer, auto_generate_synonyms_phrase_query, default_field, default_operator, enable_position_increments, escape, fuzziness, fuzzy_max_expansions, fuzzy_prefix_length, fuzzy_rewrite, fuzzy_transpositions, lenient, max_determinized_states, minimum_should_match, phrase_slop, rewrite, quote_analyzer, quote_field_suffix, tie_breaker, time_zone, type.

SCORE

edit

Synopsis:

SCORE()

Input: none

Output: double numeric value

Description: Returns the relevance of a given input to the executed query. The higher score, the more relevant the data.

When doing multiple text queries in the WHERE clause then, their scores will be combined using the same rules as Elasticsearch’s bool query.

Typically SCORE is used for ordering the results of a query based on their relevance:

SELECT SCORE(), * FROM library WHERE MATCH(name, 'dune') ORDER BY SCORE() DESC;

    SCORE()    |    author     |       name        |  page_count   |    release_date
---------------+---------------+-------------------+---------------+--------------------
2.2886353      |Frank Herbert  |Dune               |604            |1965-06-01T00:00:00Z
1.8893257      |Frank Herbert  |Dune Messiah       |331            |1969-10-15T00:00:00Z
1.6086556      |Frank Herbert  |Children of Dune   |408            |1976-04-21T00:00:00Z
1.4005898      |Frank Herbert  |God Emperor of Dune|454            |1981-05-28T00:00:00Z

However, it is perfectly fine to return the score without sorting by it:

SELECT SCORE() AS score, name, release_date FROM library WHERE QUERY('dune') ORDER BY YEAR(release_date) DESC;

     score     |       name        |    release_date
---------------+-------------------+--------------------
1.4005898      |God Emperor of Dune|1981-05-28T00:00:00Z
1.6086556      |Children of Dune   |1976-04-21T00:00:00Z
1.8893257      |Dune Messiah       |1969-10-15T00:00:00Z
2.2886353      |Dune               |1965-06-01T00:00:00Z