Trained models
editTrained models
editWhen you use a data frame analytics job to perform classification or regression analysis, it creates a machine learning model that is trained and tested against a labelled data set. When you are satisfied with your trained model, you can use it to make predictions against new data. For example, you can use it in the processor of an ingest pipeline or in a pipeline aggregation within a search query. For more information about this process, see Introduction to supervised learning and inference for classification and regression.
You can also supply trained models that are not created by data frame analytics job but adhere to the appropriate JSON schema. Likewise, you can use third-party models to perform natural language processing (NLP) tasks. If you want to use these trained models in the Elastic Stack, you must store them in Elasticsearch documents by using the create trained models API. For more information about NLP models, refer to Deploy trained models.
In Kibana, you can view and manage your trained models in Stack Management > Alerts and Insights > Machine Learning and Machine Learning > Model Management:
Alternatively, you can use APIs like get trained models and delete trained models.
Exporting and importing models
editModels trained in Elasticsearch are portable and can be transferred between
clusters. This is particularly useful when models are trained in isolation from
the cluster where they are used for inference. The following instructions show
how to use curl
and jq
to
export a model as JSON and import it to another cluster.
-
Given a model name, find the model ID. You can use
curl
to call the get trained model API to list all models with their IDs.curl -s -u username:password \ -X GET "http://localhost:9200/_ml/trained_models" \ | jq . -C \ | more
If you want to show just the model IDs available, use
jq
to select a subset.curl -s -u username:password \ -X GET "http://localhost:9200/_ml/trained_models" \ | jq -C -r '.trained_model_configs[].model_id'
flights1-1607953694065 flights0-1607953585123 lang_ident_model_1
In this example, you are exporting the model with ID
flights1-1607953694065
. -
Using
curl
from the command line, again use the get trained models API to export the entire model definition and save it to a JSON file.curl -u username:password \ -X GET "http://localhost:9200/_ml/trained_models/flights1-1607953694065?exclude_generated=true&include=definition&decompress_definition=false" \ | jq '.trained_model_configs[0] | del(.model_id)' \ > flights1.json
A few observations:
-
Exporting models requires using
curl
or a similar tool that can stream the model over HTTP into a file. If you use the Kibana Console, the browser might be unresponsive due to the size of exported models. - Note the query parameters that are used during export. These parameters are necessary to export the model in a way that it can later be imported again and used for inference.
-
You must unnest the JSON object by one level to extract just the model
definition. You must also remove the existing model ID in order to not have
ID collisions when you import again. You can do these steps using
jq
inline or alternatively it can be done to the resulting JSON file after downloading usingjq
or other tools.
-
Exporting models requires using
-
Import the saved model using
curl
to upload the JSON file to the created trained model API. When you specify the URL, you can also set the model ID to something new using the last path part of the URL.curl -u username:password \ -H 'Content-Type: application/json' \ -X PUT "http://localhost:9200/_ml/trained_models/flights1-imported" \ --data-binary @flights1.json
-
Models exported from the get trained models API
are limited in size by the
http.max_content_length
global configuration value in Elasticsearch. The default value is
100mb
and may need to be increased depending on the size of model being exported. -
Connection timeouts can occur, for example, when model sizes are very large or
your cluster is under load. If needed, you can increase
timeout configurations for
curl
(for example,curl --max-time 600
) or your client of choice.
If you also want to copy the data frame analytics job to the new cluster, you can export and import jobs in the Stack Management app in Kibana. Refer to Exporting and importing machine learning jobs.
Importing an external model to the Elastic Stack
editIt is possible to import a model to your Elasticsearch cluster even if the model is not trained by Elastic data frame analytics. Eland supports importing models directly through its APIs. Please refer to the latest Eland documentation for more information on supported model types and other details of using Eland to import models with.