Feature importance

edit

Feature importance values indicate which fields had the biggest impact on each prediction that is generated by classification or regression analysis. The features of the data points are responsible for a particular prediction to varying degrees. Feature importance shows to what degree a given feature of a data point contributes to the prediction. The feature importance value can be either positive or negative depending on its effect on the prediction. If the feature reduces the prediction value, the feature importance is negative, if it increases the prediction, then the feature importance is positive. The magnitude of feature importance shows how significantly the feature affects the prediction for a given data point.

Feature importance in the Elastic Stack is calculated using the SHAP (SHapley Additive exPlanations) method as described in Lundberg, S. M., & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In NeurIPS 2017.

By default, feature importance values are not calculated when you configure the job via the API. To generate this information, when you create a data frame analytics job you must specify the num_top_feature_importance_values property. When you configure the job in Kibana, feature importance values are calculated automatically. The feature importance values are stored in the machine learning results field for each document in the destination index.

The number of feature importance values for each document might be less than the num_top_feature_importance_values property value. For example, it returns only features that had a positive or negative effect on the prediction.