Using Eland on Elasticsearch Serverless

This blog will show you how to use Eland to import machine learning models to Elasticsearch Serverless, and then how to explore Elasticsearch using a Pandas-like API.

NLP in Elasticsearch Serverless

Since Elasticsearch 8.0, it is possible to use NLP machine learning models directly from Elasticsearch. While some models such as ELSER (for English data) or E5 (for multilingual data) can be deployed directly from Kibana, all other compatible PyTorch models need to be uploaded using Eland.

Since Eland 8.14.0, eland_import_hub_model fully supports Serverless. To get the connection details, open your Serverless project in Kibana, select the "cURL" client, create an API key, and export the environment variables:

export ES_URL="https://[...].elastic.cloud:443"
export API_KEY="..."

You can then use those variables when running eland_import_hub_model:

$ docker run -it --rm --network host \
    docker.elastic.co/eland/eland \
    eland_import_hub_model \
      --url $ES_URL \
      --es-api-key $API_KEY \
      --hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \
      --task-type ner

Next, search for "Trained Models" in Kibana, which will offer to synchronize your trained models.

Synchronize trained models UI

Once done, you will get the option to deploy your model:

Start deployment UI

Less than a minute later, your model should be deployed and you'll be able to test it directly from Kibana.

Test model UI

In this test sentence, the model successfully identified Joe as "Person" and "Reunion Island" as a location, with high probability.

For more details on using Eland for machine learning models (including scikit-learn, XGBoost and LightGBM, not covered here), consider reading the detailed Accessing machine learning models in Elastic blog post and referring to the Eland documentation.

Data frames in Eland

The other main functionality of Eland is exploring Elasticsearch data using a Pandas-like API.

Ingesting test data

Let's first index some test data to Elasticsearch. We'll use a fake flights dataset. While uploading using the Python Elasticsearch client is possible, in this post we'll use Kibana's file upload functionality instead, which is enough for quick tests.

  1. First, download the dataset https://github.com/elastic/eland/blob/main/tests/flights.json.gz and decompress it (gunzip flights.json.gz).
  2. Next, type "File Upload" in Kibana's search bar and import the flights.json file.
  3. Kibana will show you the resulting fields, with "Cancelled" detected as a boolean, for example. Click on "Import".
  4. On the next screen, choose "flights" for the index name and click "Import" again.

As in the screenshot below, you should see that the 13059 documents were successfully ingested in the "flights" index.

File upload UI

Connecting to Elasticsearch

Now that we have data to search, let's setup the Elasticsearch Serverless Python client. (While we could use the main client, the Serverless Elasticsearch Python client is usually easier to use, as it only supports Elasticsearch Serverless features and APIs.) From the Kibana home page, you can select Python which will explain how to install the Elasticsearch Serverless Python client, create an API key, and use it in your code. You should end up with this code:

from elasticsearch_serverless import Elasticsearch

client = Elasticsearch(
    "https://[...].es.eu-west-1.aws.elastic.cloud:443",
    api_key="your_api_key"
)

print(client.info())

Searching data with Eland

Finally, assuming that the above code worked, we can start using Eland. After having installed it with python -m pip install eland>=8.14, we can start exploring our flights dataset.

import eland as ed
from elasticsearch_serverless import Elasticsearch

client = Elasticsearch("https//...", api_key="...)
df = ed.DataFrame(client, es_index_pattern="flights")
df.head()

If you run this code in a notebook, the result will be the following table:

AvgTicketPriceCancelledCarrierDestDestAirportIDDestCityNameDestCountryDestLocation.latDestLocation.lonDestRegion...OriginOriginAirportIDOriginCityNameOriginCountryOriginLocation.latOriginLocation.lonOriginRegionOriginWeatherdayOfWeektimestamp
882.982662FalseLogstash AirwaysVenice Marco Polo AirportVE05VeniceIT45.50529912.3519IT-34...Cape Town International AirportCPTCape TownZA-33.9648017918.60169983SE-BDClear02018-01-01T18:27:00
730.041778FalseKibana AirlinesXi'an Xianyang International AirportXIYXi'anCN34.447102108.751999SE-BD...Licenciado Benito Juarez International AirportAICMMexico CityMX19.4363-99.072098MX-DIFDamaging Wind02018-01-01T05:13:00
841.265642FalseKibana AirlinesSydney Kingsford Smith International AirportSYDSydneyAU-33.94609833151.177002SE-BD...Frankfurt am Main AirportFRAFrankfurt am MainDE50.0333338.570556DE-HESunny02018-01-01T00:00:00
181.694216TrueKibana AirlinesTreviso-Sant'Angelo AirportTV01TrevisoIT45.64839912.1944IT-34...Naples International AirportNA01NaplesIT40.88600214.2908IT-72Thunder & Lightning02018-01-01T10:33:28
552.917371FalseLogstash AirwaysLuis Munoz Marin International AirportSJUSan JuanPR18.43939972-66.00180054PR-U-A...Ciampino___G. B. Pastine International AirportRM12RomeIT41.799412.5949IT-62Cloudy02018-01-01T17:42:53

You can also run more complex queries such as aggregations:

df[["DistanceKilometers", "AvgTicketPrice"]].aggregate(["sum", "min", "std"])

which outputs the following:

DistanceKilometersAvgTicketPrice
sum9.261629e+078.204365e+06
min0.000000e+001.000205e+02
std4.578614e+032.664071e+02

The demo notebook in the documentation has many more examples that use the same dataset and the reference documentation lists all supported operations.

Ready to try this out on your own? Start a free trial.
Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our advanced semantic search webinar to build your next GenAI app!
Recommended Articles