- Machine Learning: other versions:
- What is Elastic Machine Learning?
- Setup and security
- Anomaly detection
- Finding anomalies
- Tutorial: Getting started with anomaly detection
- Advanced concepts
- API quick reference
- How-tos
- Generating alerts for anomaly detection jobs
- Aggregating data for faster performance
- Altering data in your datafeed with runtime fields
- Customizing detectors with custom rules
- Detecting anomalous categories of data
- Reverting to a model snapshot
- Detecting anomalous locations in geographic data
- Mapping anomalies by location
- Adding custom URLs to machine learning results
- Anomaly detection jobs from visualizations
- Exporting and importing machine learning jobs
- Resources
- Data frame analytics
- Natural language processing
Hyperparameter optimization
editHyperparameter optimization
editWhen you create a data frame analytics job for classification or regression analysis, there are advanced configuration options known as hyperparameters. The ideal hyperparameter values vary from one data set to another. Therefore, by default the job calculates the best combination of values through a process of hyperparameter optimization.
Hyperparameter optimization involves multiple rounds of analysis. Each round involves a different combination of hyperparameter values, which are determined through a combination of random search and Bayesian optimization techniques. If you explicitly set a hyperparameter, that value is not optimized and remains the same in each round. To determine which round produces the best results, stratified K-fold cross-validation methods are used to split the data set, train a model, and calculate its performance on validation data.
You can view the hyperparameter values that were ultimately chosen by expanding the job details in Kibana or by using the get trained models API. You can also see the specific type of validation loss (such as mean squared error or binomial cross entropy) that was used to compare each round of optimization using the get data frame analytics job stats API.
Different hyperparameters may affect the model performance to a different
degree. To estimate the importance of the optimized hyperparameters, analysis of
variance decomposition is used. The resulting absolute importance
shows how
much the variation of a hyperparameter impacts the variation in the validation
loss. Additionally, relative importance
is also computed which gives the
importance of the hyperparameter compared to the rest of the tuneable
hyperparameters. The sum of all relative importances is 1. You can check these
results in the response of the
get data frame analytics job stats API.
Unless you fully understand the purpose of a hyperparameter, it is highly recommended that you leave it unset and allow hyperparameter optimization to occur.