Evaluate data frame analytics API

edit

Evaluates the data frame analytics for an annotated index.

Request

edit

POST _ml/data_frame/_evaluate

Prerequisites

edit

Requires the following privileges:

  • cluster: monitor_ml (the machine_learning_user built-in role grants this privilege)
  • destination index: read

Description

edit

The API packages together commonly used evaluation metrics for various types of machine learning features. This has been designed for use on indexes created by data frame analytics. Evaluation requires both a ground truth field and an analytics result field to be present.

Request body

edit
evaluation

(Required, object) Defines the type of evaluation you want to perform. See Data frame analytics evaluation resources.

Available evaluation types:

  • outlier_detection
  • regression
  • classification
index
(Required, object) Defines the index in which the evaluation will be performed.
query
(Optional, object) A query clause that retrieves a subset of data from the source index. See Query DSL.

Data frame analytics evaluation resources

edit

Outlier detection evaluation objects

edit

Outlier detection evaluates the results of an outlier detection analysis which outputs the probability that each document is an outlier.

actual_field
(Required, string) The field of the index which contains the ground truth. The data type of this field can be boolean or integer. If the data type is integer, the value has to be either 0 (false) or 1 (true).
predicted_probability_field
(Required, string) The field of the index that defines the probability of whether the item belongs to the class in question or not. It’s the field that contains the results of the analysis.
metrics

(Optional, object) Specifies the metrics that are used for the evaluation. If no metrics are specified, the following are returned by default:

  • auc_roc (include_curve: false),
  • precision (at: [0.25, 0.5, 0.75]),
  • recall (at: [0.25, 0.5, 0.75]),
  • confusion_matrix (at: [0.25, 0.5, 0.75]).

    auc_roc
    (Optional, object) The AUC ROC (area under the curve of the receiver operating characteristic) score and optionally the curve. Default value is {"include_curve": false}.
    confusion_matrix
    (Optional, object) Set the different thresholds of the outlier score at where the metrics (tp - true positive, fp - false positive, tn - true negative, fn - false negative) are calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
    precision
    (Optional, object) Set the different thresholds of the outlier score at where the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
    recall
    (Optional, object) Set the different thresholds of the outlier score at where the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.

Regression evaluation objects

edit

Regression evaluation evaluates the results of a regression analysis which outputs a prediction of values.

actual_field
(Required, string) The field of the index which contains the ground truth. The data type of this field must be numerical.
predicted_field
(Required, string) The field in the index that contains the predicted value, in other words the results of the regression analysis.
metrics

(Optional, object) Specifies the metrics that are used for the evaluation. For more information on mse, msle, and huber, consult the Jupyter notebook on regression loss functions. If no metrics are specified, the following are returned by default:

  • mse,
  • r_squared,
  • huber (delta: 1.0).

    mse
    (Optional, object) Average squared difference between the predicted values and the actual (ground truth) value. For more information, read this wiki article.
    msle

    (Optional, object) Average squared difference between the logarithm of the predicted values and the logarithm of the actual (ground truth) value.

    offset
    (Optional, double) Defines the transition point at which you switch from minimizing quadratic error to minimizing quadratic log error. Defaults to 1.
    huber

    (Optional, object) Pseudo Huber loss function. For more information, read this wiki article.

    delta
    (Optional, double) Approximates 1/2 (prediction - actual)2 for values much less than delta and approximates a straight line with slope delta for values much larger than delta. Defaults to 1. Delta needs to be greater than 0.
    r_squared
    (Optional, object) Proportion of the variance in the dependent variable that is predictable from the independent variables. For more information, read this wiki article.

Classification evaluation objects

edit

Classification evaluation evaluates the results of a classification analysis which outputs a prediction that identifies to which of the classes each document belongs.

actual_field
(Required, string) The field of the index which contains the ground truth. The data type of this field must be categorical.
predicted_field
(Optional, string) The field in the index which contains the predicted value, in other words the results of the classification analysis.
top_classes_field
(Optional, string) The field of the index which is an array of documents of the form { "class_name": XXX, "class_probability": YYY }. This field must be defined as nested in the mappings.
metrics

(Optional, object) Specifies the metrics that are used for the evaluation. If no metrics are specificed, the following are returned by default:

  • accuracy,
  • multiclass_confusion_matrix,
  • precision,
  • recall.

    accuracy
    (Optional, object) Accuracy of predictions (per-class and overall).
    auc_roc

    (Optional, object) The AUC ROC (area under the curve of the receiver operating characteristic) score and optionally the curve. It is calculated for a specific class (provided as "class_name") treated as positive.

    class_name
    (Required, string) Name of the only class that is treated as positive during AUC ROC calculation. Other classes are treated as negative ("one-vs-all" strategy). All the evaluated documents must have class_name in the list of their top classes.
    include_curve
    (Optional, Boolean) Whether or not the curve should be returned in addition to the score. Default value is false.
    multiclass_confusion_matrix

    (Optional, object) Multiclass confusion matrix.

    size
    (Optional, double) Specifies the size of the multiclass confusion matrix. Defaults to 10 which results in a matrix of size 10x10.
    precision
    (Optional, object) Precision of predictions (per-class and average).
    recall
    (Optional, object) Recall of predictions (per-class and average).

Examples

edit

Outlier detection

edit
resp = client.ml.evaluate_data_frame(
    index="my_analytics_dest_index",
    evaluation={
        "outlier_detection": {
            "actual_field": "is_outlier",
            "predicted_probability_field": "ml.outlier_score"
        }
    },
)
print(resp)
response = client.ml.evaluate_data_frame(
  body: {
    index: 'my_analytics_dest_index',
    evaluation: {
      outlier_detection: {
        actual_field: 'is_outlier',
        predicted_probability_field: 'ml.outlier_score'
      }
    }
  }
)
puts response
const response = await client.ml.evaluateDataFrame({
  index: "my_analytics_dest_index",
  evaluation: {
    outlier_detection: {
      actual_field: "is_outlier",
      predicted_probability_field: "ml.outlier_score",
    },
  },
});
console.log(response);
POST _ml/data_frame/_evaluate
{
  "index": "my_analytics_dest_index",
  "evaluation": {
    "outlier_detection": {
      "actual_field": "is_outlier",
      "predicted_probability_field": "ml.outlier_score"
    }
  }
}

The API returns the following results:

{
  "outlier_detection": {
    "auc_roc": {
      "value": 0.92584757746414444
    },
    "confusion_matrix": {
      "0.25": {
          "tp": 5,
          "fp": 9,
          "tn": 204,
          "fn": 5
      },
      "0.5": {
          "tp": 1,
          "fp": 5,
          "tn": 208,
          "fn": 9
      },
      "0.75": {
          "tp": 0,
          "fp": 4,
          "tn": 209,
          "fn": 10
      }
    },
    "precision": {
        "0.25": 0.35714285714285715,
        "0.5": 0.16666666666666666,
        "0.75": 0
    },
    "recall": {
        "0.25": 0.5,
        "0.5": 0.1,
        "0.75": 0
    }
  }
}

Regression

edit
resp = client.ml.evaluate_data_frame(
    index="house_price_predictions",
    query={
        "bool": {
            "filter": [
                {
                    "term": {
                        "ml.is_training": False
                    }
                }
            ]
        }
    },
    evaluation={
        "regression": {
            "actual_field": "price",
            "predicted_field": "ml.price_prediction",
            "metrics": {
                "r_squared": {},
                "mse": {},
                "msle": {
                    "offset": 10
                },
                "huber": {
                    "delta": 1.5
                }
            }
        }
    },
)
print(resp)
response = client.ml.evaluate_data_frame(
  body: {
    index: 'house_price_predictions',
    query: {
      bool: {
        filter: [
          {
            term: {
              'ml.is_training' => false
            }
          }
        ]
      }
    },
    evaluation: {
      regression: {
        actual_field: 'price',
        predicted_field: 'ml.price_prediction',
        metrics: {
          r_squared: {},
          mse: {},
          msle: {
            offset: 10
          },
          huber: {
            delta: 1.5
          }
        }
      }
    }
  }
)
puts response
const response = await client.ml.evaluateDataFrame({
  index: "house_price_predictions",
  query: {
    bool: {
      filter: [
        {
          term: {
            "ml.is_training": false,
          },
        },
      ],
    },
  },
  evaluation: {
    regression: {
      actual_field: "price",
      predicted_field: "ml.price_prediction",
      metrics: {
        r_squared: {},
        mse: {},
        msle: {
          offset: 10,
        },
        huber: {
          delta: 1.5,
        },
      },
    },
  },
});
console.log(response);
POST _ml/data_frame/_evaluate
{
  "index": "house_price_predictions", 
  "query": {
      "bool": {
        "filter": [
          { "term":  { "ml.is_training": false } } 
        ]
      }
  },
  "evaluation": {
    "regression": {
      "actual_field": "price", 
      "predicted_field": "ml.price_prediction", 
      "metrics": {
        "r_squared": {},
        "mse": {},
        "msle": {"offset": 10},
        "huber": {"delta": 1.5}
      }
    }
  }
}

The output destination index from a data frame analytics regression analysis.

In this example, a test/train split (training_percent) was defined for the regression analysis. This query limits evaluation to be performed on the test split only.

The ground truth value for the actual house price. This is required in order to evaluate results.

The predicted value for house price calculated by the regression analysis.

The following example calculates the training error:

resp = client.ml.evaluate_data_frame(
    index="student_performance_mathematics_reg",
    query={
        "term": {
            "ml.is_training": {
                "value": True
            }
        }
    },
    evaluation={
        "regression": {
            "actual_field": "G3",
            "predicted_field": "ml.G3_prediction",
            "metrics": {
                "r_squared": {},
                "mse": {},
                "msle": {},
                "huber": {}
            }
        }
    },
)
print(resp)
response = client.ml.evaluate_data_frame(
  body: {
    index: 'student_performance_mathematics_reg',
    query: {
      term: {
        'ml.is_training' => {
          value: true
        }
      }
    },
    evaluation: {
      regression: {
        actual_field: 'G3',
        predicted_field: 'ml.G3_prediction',
        metrics: {
          r_squared: {},
          mse: {},
          msle: {},
          huber: {}
        }
      }
    }
  }
)
puts response
const response = await client.ml.evaluateDataFrame({
  index: "student_performance_mathematics_reg",
  query: {
    term: {
      "ml.is_training": {
        value: true,
      },
    },
  },
  evaluation: {
    regression: {
      actual_field: "G3",
      predicted_field: "ml.G3_prediction",
      metrics: {
        r_squared: {},
        mse: {},
        msle: {},
        huber: {},
      },
    },
  },
});
console.log(response);
POST _ml/data_frame/_evaluate
{
  "index": "student_performance_mathematics_reg",
  "query": {
    "term": {
      "ml.is_training": {
        "value": true 
      }
    }
  },
  "evaluation": {
    "regression": {
      "actual_field": "G3", 
      "predicted_field": "ml.G3_prediction", 
      "metrics": {
        "r_squared": {},
        "mse": {},
        "msle": {},
        "huber": {}
      }
    }
  }
}

In this example, a test/train split (training_percent) was defined for the regression analysis. This query limits evaluation to be performed on the train split only. It means that a training error will be calculated.

The field that contains the ground truth value for the actual student performance. This is required in order to evaluate results.

The field that contains the predicted value for student performance calculated by the regression analysis.

The next example calculates the testing error. The only difference compared with the previous example is that ml.is_training is set to false this time, so the query excludes the train split from the evaluation.

resp = client.ml.evaluate_data_frame(
    index="student_performance_mathematics_reg",
    query={
        "term": {
            "ml.is_training": {
                "value": False
            }
        }
    },
    evaluation={
        "regression": {
            "actual_field": "G3",
            "predicted_field": "ml.G3_prediction",
            "metrics": {
                "r_squared": {},
                "mse": {},
                "msle": {},
                "huber": {}
            }
        }
    },
)
print(resp)
response = client.ml.evaluate_data_frame(
  body: {
    index: 'student_performance_mathematics_reg',
    query: {
      term: {
        'ml.is_training' => {
          value: false
        }
      }
    },
    evaluation: {
      regression: {
        actual_field: 'G3',
        predicted_field: 'ml.G3_prediction',
        metrics: {
          r_squared: {},
          mse: {},
          msle: {},
          huber: {}
        }
      }
    }
  }
)
puts response
const response = await client.ml.evaluateDataFrame({
  index: "student_performance_mathematics_reg",
  query: {
    term: {
      "ml.is_training": {
        value: false,
      },
    },
  },
  evaluation: {
    regression: {
      actual_field: "G3",
      predicted_field: "ml.G3_prediction",
      metrics: {
        r_squared: {},
        mse: {},
        msle: {},
        huber: {},
      },
    },
  },
});
console.log(response);
POST _ml/data_frame/_evaluate
{
  "index": "student_performance_mathematics_reg",
  "query": {
    "term": {
      "ml.is_training": {
        "value": false 
      }
    }
  },
  "evaluation": {
    "regression": {
      "actual_field": "G3", 
      "predicted_field": "ml.G3_prediction", 
      "metrics": {
        "r_squared": {},
        "mse": {},
        "msle": {},
        "huber": {}
      }
    }
  }
}

In this example, a test/train split (training_percent) was defined for the regression analysis. This query limits evaluation to be performed on the test split only. It means that a testing error will be calculated.

The field that contains the ground truth value for the actual student performance. This is required in order to evaluate results.

The field that contains the predicted value for student performance calculated by the regression analysis.

Classification

edit
resp = client.ml.evaluate_data_frame(
    index="animal_classification",
    evaluation={
        "classification": {
            "actual_field": "animal_class",
            "predicted_field": "ml.animal_class_prediction",
            "metrics": {
                "multiclass_confusion_matrix": {}
            }
        }
    },
)
print(resp)
response = client.ml.evaluate_data_frame(
  body: {
    index: 'animal_classification',
    evaluation: {
      classification: {
        actual_field: 'animal_class',
        predicted_field: 'ml.animal_class_prediction',
        metrics: {
          multiclass_confusion_matrix: {}
        }
      }
    }
  }
)
puts response
const response = await client.ml.evaluateDataFrame({
  index: "animal_classification",
  evaluation: {
    classification: {
      actual_field: "animal_class",
      predicted_field: "ml.animal_class_prediction",
      metrics: {
        multiclass_confusion_matrix: {},
      },
    },
  },
});
console.log(response);
POST _ml/data_frame/_evaluate
{
   "index": "animal_classification",
   "evaluation": {
      "classification": { 
         "actual_field": "animal_class", 
         "predicted_field": "ml.animal_class_prediction", 
         "metrics": {
           "multiclass_confusion_matrix" : {} 
         }
      }
   }
}

The evaluation type.

The field that contains the ground truth value for the actual animal classification. This is required in order to evaluate results.

The field that contains the predicted value for animal classification by the classification analysis.

Specifies the metric for the evaluation.

The API returns the following result:

{
   "classification" : {
      "multiclass_confusion_matrix" : {
         "confusion_matrix" : [
         {
            "actual_class" : "cat", 
            "actual_class_doc_count" : 12, 
            "predicted_classes" : [ 
              {
                "predicted_class" : "cat",
                "count" : 12 
              },
              {
                "predicted_class" : "dog",
                "count" : 0 
              }
            ],
            "other_predicted_class_doc_count" : 0 
          },
          {
            "actual_class" : "dog",
            "actual_class_doc_count" : 11,
            "predicted_classes" : [
              {
                "predicted_class" : "dog",
                "count" : 7
              },
              {
                "predicted_class" : "cat",
                "count" : 4
              }
            ],
            "other_predicted_class_doc_count" : 0
          }
        ],
        "other_actual_class_count" : 0
      }
    }
  }

The name of the actual class that the analysis tried to predict.

The number of documents in the index that belong to the actual_class.

This object contains the list of the predicted classes and the number of predictions associated with the class.

The number of cats in the dataset that are correctly identified as cats.

The number of cats in the dataset that are incorrectly classified as dogs.

The number of documents that are classified as a class that is not listed as a predicted_class.

resp = client.ml.evaluate_data_frame(
    index="animal_classification",
    evaluation={
        "classification": {
            "actual_field": "animal_class",
            "metrics": {
                "auc_roc": {
                    "class_name": "dog"
                }
            }
        }
    },
)
print(resp)
response = client.ml.evaluate_data_frame(
  body: {
    index: 'animal_classification',
    evaluation: {
      classification: {
        actual_field: 'animal_class',
        metrics: {
          auc_roc: {
            class_name: 'dog'
          }
        }
      }
    }
  }
)
puts response
const response = await client.ml.evaluateDataFrame({
  index: "animal_classification",
  evaluation: {
    classification: {
      actual_field: "animal_class",
      metrics: {
        auc_roc: {
          class_name: "dog",
        },
      },
    },
  },
});
console.log(response);
POST _ml/data_frame/_evaluate
{
   "index": "animal_classification",
   "evaluation": {
      "classification": { 
         "actual_field": "animal_class", 
         "metrics": {
            "auc_roc" : { 
              "class_name": "dog" 
            }
         }
      }
   }
}

The evaluation type.

The field that contains the ground truth value for the actual animal classification. This is required in order to evaluate results.

Specifies the metric for the evaluation.

Specifies the class name that is treated as positive during the evaluation, all the other classes are treated as negative.

The API returns the following result:

{
  "classification" : {
    "auc_roc" : {
      "value" : 0.8941788639536681
    }
  }
}