- Machine Learning: other versions:
- What is Elastic Machine Learning?
- Setup and security
- Anomaly detection
- Finding anomalies
- Tutorial: Getting started with anomaly detection
- Advanced concepts
- API quick reference
- How-tos
- Generating alerts for anomaly detection jobs
- Aggregating data for faster performance
- Altering data in your datafeed with runtime fields
- Customizing detectors with custom rules
- Detecting anomalous categories of data
- Reverting to a model snapshot
- Detecting anomalous locations in geographic data
- Mapping anomalies by location
- Adding custom URLs to machine learning results
- Anomaly detection jobs from visualizations
- Exporting and importing machine learning jobs
- Resources
- Data frame analytics
- Natural language processing
Search and compare text
editSearch and compare text
editThe Elastic Stack machine learning features can generate embeddings, which you can use to search in unstructured text or compare different pieces of text.
Text embedding
editText embedding is a task which produces a mathematical representation of text called an embedding. The machine learning model turns the text into an array of numerical values (also known as a vector). Pieces of content with similar meaning have similar representations. This means it is possible to determine whether different pieces of text are either semantically similar, different, or even opposite by using a mathematical similarity function.
This task is responsible for producing only the embedding. When the embedding is created, it can be stored in a dense_vector field and used at search time. For example, you can use these vectors in a k-nearest neighbor (kNN) search to achieve semantic search capabilities.
The following is an example of producing a text embedding:
{ docs: [{"text_field": "The quick brown fox jumps over the lazy dog."}] } ...
The task returns the following result:
... { "predicted_value": [0.293478, -0.23845, ..., 1.34589e2, 0.119376] ... } ...
Text similarity
editThe text similarity task estimates how similar two pieces of text are to each other and expresses the similarity in a numeric value. This is commonly referred to as cross-encoding. This task is useful for ranking document text when comparing it to another provided text input.
You can provide multiple strings of text to compare to another text input sequence. Each string is compared to the given text sequence at inference time and a prediction of similarity is calculated for every string of text.
{ "docs":[{ "text_field": "Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."}, {"text_field": "New York City is famous for the Metropolitan Museum of Art."}], "inference_config": { "text_similarity": { "text": "How many people live in Berlin?" } } }
In the example above, every string in the docs
array is compared individually
to the text provided in the text_similarity
.text
field and a predicted
similarity is calculated for both as the API response shows:
... { "predicted_value": 7.235751628875732 }, { "predicted_value": -11.562295913696289 } ...
On this page
ElasticON events are back!
Learn about the Elastic Search AI Platform from the experts at our live events.
Register now