This documentation contains work-in-progress information for future Elastic Stack and Cloud releases. Use the version selector to view supported release docs. It also contains some Elastic Cloud serverless information. Check out our serverless docs for more details.
Create trained model vocabulary API
editCreate trained model vocabulary API
editCreates a trained model vocabulary. This is supported only for natural language processing (NLP) models.
Request
editPUT _ml/trained_models/<model_id>/vocabulary/
Prerequisites
editRequires the manage_ml
cluster privilege. This privilege is included in the
machine_learning_admin
built-in role.
Description
editThe vocabulary is stored in the index as described in
inference_config.*.vocabulary
of the trained model definition.
Path parameters
edit-
<model_id>
- (Required, string) The unique identifier of the trained model.
Request body
edit-
vocabulary
- (array) The model vocabulary. Must not be empty.
-
merges
- (Optional, array) The model merges used in byte-pair encoding. The merges must be sub-token pairs, space delimited, and in order of preference. Example: ["f o", "fo o"]. Must be provided for RoBERTa and BART style models.
-
scores
-
(Optional, array)
Vocabulary value scores used by sentence-piece tokenization. Must have the same length as
vocabulary
. Required for unigram sentence-piece tokenized models like XLMRoberta and T5.
Examples
editThe following example shows how to create a model vocabulary for a previously stored trained model configuration.
PUT _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/vocabulary { "vocabulary": [ "[PAD]", "[unused0]", ... ] }
The API returns the following results:
{ "acknowledged": true }