As with vector search in the previous section, in this section you will learn how to combine the best search results from full-text and semantic queries using the Reciprocal Rank Fusion algorithm.
Introduction to Sub-Searches
The solution to implementing a hybrid full-text and dense vector search was to send a search request that included the query
, knn
arguments to request the two searches, and the rrf
argument to combine them into a single results list.
The complication that is presented when trying to do the same to combine full-text and sparse vector search requests is that both use the query
argument. To be able to provide the two queries that need to be combined with the RRF algorithm, it is necessary to include two query
arguments, and the solution to do this is to do it with Sub-Searches.
Sub-searches is a feature that is currently in technical preview. For this reason the Python Elasticsearch client does not natively support it. To work around this limitation, the search()
method of the Search
class can be changed to send the search request using the body
argument. Below you can see a new, yet similar implementation that uses the body
argument of the client to send a search request:
class Search: # ...
def search(self, **query_args): # sub_searches is not currently supported in the client, so we send # search requests using the body argument if 'from_' in query_args: query_args['from'] = query_args['from_'] del query_args['from_'] return self.es.search( index='my_documents', body=json.dumps(query_args), )
This implementation does not require any changes to the application, as it is functionally equivalent. The only difference is that the search()
method validates all arguments before sending the request, with body
being the only exception. The server always validates requests regardless of how the client sends them.
With this version, the sub_searches
argument can be used in Search.search()
to send multiple search queries as follows:
results = es.search( sub_searches=[ { 'query': { ... }, # full-text search }, { 'query': { ... }, # semantic search }, ], rank={ 'rrf': {}, # combine sub-search results }, aggs={ ... }, size=5, from_=from_,)
Hybrid Search Implementation
To complete this section, let's bring back the full-text logic and combine it with the semantic search query presented earlier in this chapter.
Below you can see the updated handle_search()
endpoint:
@app.post('/')def handle_search(): query = request.form.get('query', '') filters, parsed_query = extract_filters(query) from_ = request.form.get('from_', type=int, default=0)
if parsed_query: search_query = { 'sub_searches': [ { 'query': { 'bool': { 'must': { 'multi_match': { 'query': parsed_query, 'fields': ['name', 'summary', 'content'], } }, **filters } } }, { 'query': { 'bool': { 'must': [ { 'text_expansion': { 'elser_embedding': { 'model_id': '.elser_model_2', 'model_text': parsed_query, } }, } ], **filters, } }, }, ], 'rank': { 'rrf': {} }, } else: search_query = { 'query': { 'bool': { 'must': { 'match_all': {} }, **filters } } }
results = es.search( **search_query, aggs={ 'category-agg': { 'terms': { 'field': 'category.keyword', } }, 'year-agg': { 'date_histogram': { 'field': 'updated_at', 'calendar_interval': 'year', 'format': 'yyyy', }, }, }, size=5, from_=from_, ) aggs = { 'Category': { bucket['key']: bucket['doc_count'] for bucket in results['aggregations']['category-agg']['buckets'] }, 'Year': { bucket['key_as_string']: bucket['doc_count'] for bucket in results['aggregations']['year-agg']['buckets'] if bucket['doc_count'] > 0 }, } return render_template('index.html', results=results['hits']['hits'], query=query, from_=from_, total=results['hits']['total']['value'], aggs=aggs)
As you recall, the extract_filters()
function looked for category filters entered by the user on the search prompt, and returned the left over portion as parsed_query
. If parsed_query
is empty, it means that the user only enter a category filter, and in that case the query should be a simple match_all
with the selected category as a filter. This is implemented in the else
portion of the big conditional.
When there is a search query, the sub_searches
option is used as shown in the previous section to include the multi_match
and text_expansion
queries, with the rank
option requesting that the results from the two sub-searches are combined into a single list of ranked results. To complete the query, the size
and from_
argument are provided to maintain the support for pagination.
Click here to review this version of the application.
Previously
Semantic QueriesNext
Conclusion