IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Working with data frame analytics at scale Data frame analytics feature processors »

› › ›

Feature encoding

edit

Feature encoding

edit

Machine learning models can only work with numerical values. For this reason, it is necessary to transform the categorical values of the relevant features into numerical ones. This process is called feature encoding.

Data frame analytics automatically performs feature encoding. The input data is pre-processed with the following encoding techniques:

one-hot encoding: Assigns vectors to each category. The vector represent whether the corresponding feature is present (1) or not (0).
target-mean encoding: Replaces categorical values with the mean value of the target variable.
frequency encoding: Takes into account how many times a given categorical value is present in relation with a feature.

When the model makes predictions on new data, the data needs to be processed in the same way it was trained. Machine learning model inference in the Elastic Stack does this automatically, so the automatically applied encodings are used in each call for inference. Refer to inference for classification and regression.

Feature importance is calculated for the original categorical fields, not the automatically encoded features.

« Working with data frame analytics at scale Data frame analytics feature processors »

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Feature encoding

Feature encoding

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards