- Java REST Client (deprecated): other versions:
- Overview
- Java Low Level REST Client
- Java High Level REST Client
- Getting started
- Document APIs
- Search APIs
- Async Search APIs
- Miscellaneous APIs
- Index APIs
- Analyze API
- Create Index API
- Delete Index API
- Index Exists API
- Open Index API
- Close Index API
- Shrink Index API
- Split Index API
- Clone Index API
- Refresh API
- Flush API
- Flush Synced API
- Clear Cache API
- Force Merge API
- Rollover Index API
- Update mapping API
- Get Mappings API
- Get Field Mappings API
- Index Aliases API
- Delete Alias API
- Exists Alias API
- Get Alias API
- Update Indices Settings API
- Get Settings API
- Create or update index template API
- Validate Query API
- Get Templates API
- Templates Exist API
- Get Index API
- Freeze Index API
- Unfreeze Index API
- Delete Template API
- Reload Search Analyzers API
- Get Composable Index Templates API
- Create or update composable index template API
- Delete Composable Index Template API
- Optional arguments
- Simulate Index Template API
- Cluster APIs
- Ingest APIs
- Snapshot APIs
- Tasks APIs
- Script APIs
- Licensing APIs
- Machine Learning APIs
- Close anomaly detection jobs API
- Delete anomaly detection jobs API
- Delete anomaly detection jobs from calendar API
- Delete calendar events API
- Delete calendars API
- Delete data frame analytics jobs API
- Delete datafeeds API
- Delete expired data API
- Delete filters API
- Delete forecasts API
- Delete model snapshots API
- Delete trained models API
- Delete trained model alias API
- Estimate anomaly detection job model memory API
- Evaluate data frame analytics API
- Explain data frame analytics API
- Flush jobs API
- Forecast jobs API
- Get anomaly detection jobs API
- Get anomaly detection job stats API
- Get buckets API
- Get calendar events API
- Get calendars API
- Get categories API
- Get data frame analytics jobs API
- Get data frame analytics jobs stats API
- Get datafeeds API
- Get datafeed stats API
- Get filters API
- Get influencers API
- Get machine learning info API
- Get model snapshots API
- Get overall buckets API
- Get records API
- Get trained models API
- Get trained models stats API
- Open anomaly detection jobs API
- Post calendar events API
- Post data API
- Preview datafeeds API
- Create anomaly detection jobs API
- Add anomaly detection jobs to calendar API
- Create calendars API
- Create data frame analytics jobs API
- Create datafeeds API
- Create filters API
- Create trained models API
- Create or update trained model alias API
- Reset anomaly detection jobs API
- Revert model snapshots API
- Set upgrade mode API
- Start data frame analytics jobs API
- Start datafeeds API
- Stop data frame analytics jobs API
- Stop datafeeds API
- Update anomaly detection jobs API
- Update data frame analytics jobs API
- Update datafeeds API
- Update filters API
- Update model snapshots API
- Upgrade job snapshot API
- Migration APIs
- Rollup APIs
- Security APIs
- Create or update user API
- Get Users API
- Delete User API
- Enable User API
- Disable User API
- Change Password API
- Create or update role API
- Get Roles API
- Delete Role API
- Delete Privileges API
- Get Builtin Privileges API
- Get Application Privileges API
- Clear Roles Cache API
- Clear Privileges Cache API
- Clear Realm Cache API
- Clear API Key Cache API
- Clear Service Account Token Cache API
- Authenticate API
- Has Privileges API
- Get User Privileges API
- SSL Certificate API
- Create or update role mapping API
- Get Role Mappings API
- Delete Role Mapping API
- Create Token API
- Invalidate Token API
- Create or update privileges API
- Create API Key API
- Grant API key API
- Get API Key information API
- Invalidate API Key API
- Get Service Accounts API
- Create Service Account Token API
- Delete Service Account Token API
- Get Service Account Credentials API
- Text Structure APIs
- Watcher APIs
- Graph APIs
- CCR APIs
- Index Lifecycle Management APIs
- Snapshot Lifecycle Management APIs
- Create or update snapshot lifecycle policy API
- Delete Snapshot Lifecycle Policy API
- Get Snapshot Lifecycle Policy API
- Start Snapshot Lifecycle Management API
- Stop Snapshot Lifecycle Management API
- Snapshot Lifecycle Management Status API
- Execute Snapshot Lifecycle Policy API
- Execute Snapshot Lifecycle Retention API
- Searchable Snapshots APIs
- Transform APIs
- Enrich APIs
- Using Java Builders
- Migration Guide
- License
Create data frame analytics jobs API
editCreate data frame analytics jobs API
editCreates a new data frame analytics job.
The API accepts a PutDataFrameAnalyticsRequest
object as a request and returns a PutDataFrameAnalyticsResponse
.
Request
editA PutDataFrameAnalyticsRequest
requires the following argument:
Data frame analytics configuration
editThe DataFrameAnalyticsConfig
object contains all the details about the data frame analytics job
configuration and contains the following arguments:
DataFrameAnalyticsConfig config = DataFrameAnalyticsConfig.builder() .setId("my-analytics-config") .setSource(sourceConfig) .setDest(destConfig) .setAnalysis(outlierDetection) .setAnalyzedFields(analyzedFields) .setModelMemoryLimit(new ByteSizeValue(5, ByteSizeUnit.MB)) .setDescription("this is an example description") .setMaxNumThreads(1) .build();
The data frame analytics job ID |
|
The source index and query from which to gather data |
|
The destination index |
|
The analysis to be performed |
|
The fields to be included in / excluded from the analysis |
|
The memory limit for the model created as part of the analysis process |
|
Optionally, a human-readable description |
|
The maximum number of threads to be used by the analysis. Defaults to 1. |
SourceConfig
editThe index and the query from which to collect data.
DataFrameAnalyticsSource sourceConfig = DataFrameAnalyticsSource.builder() .setIndex("put-test-source-index") .setQueryConfig(queryConfig) .setRuntimeMappings(runtimeMappings) .setSourceFiltering(new FetchSourceContext(true, new String[] { "included_field_1", "included_field_2" }, new String[] { "excluded_field" })) .build();
Constructing a new DataFrameAnalyticsSource |
|
The source index |
|
The query from which to gather the data. If query is not set, a |
|
Runtime mappings that will be added to the destination index mapping. |
|
Source filtering to select which fields will exist in the destination index. |
QueryConfig
editThe query with which to select data from the source.
QueryConfig queryConfig = new QueryConfig(new MatchAllQueryBuilder());
DestinationConfig
editThe index to which data should be written by the data frame analytics job.
Analysis
editThe analysis to be performed.
Currently, the supported analyses include: OutlierDetection
, Classification
, Regression
.
Outlier detection
editOutlierDetection
analysis can be created in one of two ways:
DataFrameAnalysis outlierDetection = org.elasticsearch.client.ml.dataframe.OutlierDetection.createDefault();
or
DataFrameAnalysis outlierDetectionCustomized = org.elasticsearch.client.ml.dataframe.OutlierDetection.builder() .setMethod(org.elasticsearch.client.ml.dataframe.OutlierDetection.Method.DISTANCE_KNN) .setNNeighbors(5) .setFeatureInfluenceThreshold(0.1) .setComputeFeatureInfluence(true) .setOutlierFraction(0.05) .setStandardizationEnabled(true) .build();
Constructing a new OutlierDetection object |
|
The method used to perform the analysis |
|
Number of neighbors taken into account during analysis |
|
The min |
|
Whether to compute feature influence |
|
The proportion of the data set that is assumed to be outlying prior to outlier detection |
|
Whether to apply standardization to feature values |
Classification
editClassification
analysis requires to set which is the dependent_variable
and
has a number of other optional parameters:
DataFrameAnalysis classification = Classification.builder("my_dependent_variable") .setLambda(1.0) .setGamma(5.5) .setEta(5.5) .setMaxTrees(50) .setFeatureBagFraction(0.4) .setNumTopFeatureImportanceValues(3) .setPredictionFieldName("my_prediction_field_name") .setTrainingPercent(50.0) .setRandomizeSeed(1234L) .setClassAssignmentObjective(Classification.ClassAssignmentObjective.MAXIMIZE_ACCURACY) .setNumTopClasses(1) .setFeatureProcessors(Arrays.asList(OneHotEncoding.builder("categorical_feature") .addOneHot("cat", "cat_column") .build())) .setAlpha(1.0) .setEtaGrowthRatePerTree(1.0) .setSoftTreeDepthLimit(1.0) .setSoftTreeDepthTolerance(1.0) .setDownsampleFactor(0.5) .setMaxOptimizationRoundsPerHyperparameter(3) .setEarlyStoppingEnabled(true) .build();
Constructing a new Classification builder object with the required dependent variable |
|
The lambda regularization parameter. A non-negative double. |
|
The gamma regularization parameter. A non-negative double. |
|
The applied shrinkage. A double in [0.001, 1]. |
|
The maximum number of trees the forest is allowed to contain. An integer in [1, 2000]. |
|
The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1]. |
|
If set, feature importance for the top most important features will be computed. |
|
The name of the prediction field in the results object. |
|
The percentage of training-eligible rows to be used in training. Defaults to 100%. |
|
The seed to be used by the random generator that picks which rows are used in training. |
|
The optimization objective to target when assigning class labels. Defaults to maximize_minimum_recall. |
|
The number of top classes (or -1 which denotes all classes) to be reported in the results. Defaults to 2. |
|
Custom feature processors that will create new features for analysis from the included document fields. Note, automatic categorical feature encoding still occurs for all features. |
|
The alpha regularization parameter. A non-negative double. |
|
The growth rate of the shrinkage parameter. A double in [0.5, 2.0]. |
|
The soft tree depth limit. A non-negative double. |
|
The soft tree depth tolerance. Controls how much the soft tree depth limit is respected. A double greater than or equal to 0.01. |
|
The amount by which to downsample the data for stochastic gradient estimates. A double in (0, 1.0]. |
|
The maximum number of optimisation rounds we use for hyperparameter optimisation per parameter. An integer in [0, 20]. |
|
Whether to enable early stopping to finish training process if it is not finding better models. |
Regression
editRegression
analysis requires to set which is the dependent_variable
and
has a number of other optional parameters:
DataFrameAnalysis regression = org.elasticsearch.client.ml.dataframe.Regression.builder("my_dependent_variable") .setLambda(1.0) .setGamma(5.5) .setEta(5.5) .setMaxTrees(50) .setFeatureBagFraction(0.4) .setNumTopFeatureImportanceValues(3) .setPredictionFieldName("my_prediction_field_name") .setTrainingPercent(50.0) .setRandomizeSeed(1234L) .setLossFunction(Regression.LossFunction.MSE) .setLossFunctionParameter(1.0) .setFeatureProcessors(Arrays.asList(OneHotEncoding.builder("categorical_feature") .addOneHot("cat", "cat_column") .build())) .setAlpha(1.0) .setEtaGrowthRatePerTree(1.0) .setSoftTreeDepthLimit(1.0) .setSoftTreeDepthTolerance(1.0) .setDownsampleFactor(0.5) .setMaxOptimizationRoundsPerHyperparameter(3) .setEarlyStoppingEnabled(true) .build();
Constructing a new Regression builder object with the required dependent variable |
|
The lambda regularization parameter. A non-negative double. |
|
The gamma regularization parameter. A non-negative double. |
|
The applied shrinkage. A double in [0.001, 1]. |
|
The maximum number of trees the forest is allowed to contain. An integer in [1, 2000]. |
|
The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1]. |
|
If set, feature importance for the top most important features will be computed. |
|
The name of the prediction field in the results object. |
|
The percentage of training-eligible rows to be used in training. Defaults to 100%. |
|
The seed to be used by the random generator that picks which rows are used in training. |
|
The loss function used for regression. Defaults to |
|
An optional parameter to the loss function. |
|
Custom feature processors that will create new features for analysis from the included document fields. Note, automatic categorical feature encoding still occurs for all features. |
|
The alpha regularization parameter. A non-negative double. |
|
The growth rate of the shrinkage parameter. A double in [0.5, 2.0]. |
|
The soft tree depth limit. A non-negative double. |
|
The soft tree depth tolerance. Controls how much the soft tree depth limit is respected. A double greater than or equal to 0.01. |
|
The amount by which to downsample the data for stochastic gradient estimates. A double in (0, 1.0]. |
|
The maximum number of optimisation rounds we use for hyperparameter optimisation per parameter. An integer in [0, 20]. |
|
Whether to enable early stopping to finish training process if it is not finding better models. |
Analyzed fields
editFetchContext object containing fields to be included in / excluded from the analysis
FetchSourceContext analyzedFields = new FetchSourceContext( true, new String[] { "included_field_1", "included_field_2" }, new String[] { "excluded_field" });
Synchronous execution
editWhen executing a PutDataFrameAnalyticsRequest
in the following manner, the client waits
for the PutDataFrameAnalyticsResponse
to be returned before continuing with code execution:
PutDataFrameAnalyticsResponse response = client.machineLearning().putDataFrameAnalytics(request, RequestOptions.DEFAULT);
Synchronous calls may throw an IOException
in case of either failing to
parse the REST response in the high-level REST client, the request times out
or similar cases where there is no response coming back from the server.
In cases where the server returns a 4xx
or 5xx
error code, the high-level
client tries to parse the response body error details instead and then throws
a generic ElasticsearchException
and adds the original ResponseException
as a
suppressed exception to it.
Asynchronous execution
editExecuting a PutDataFrameAnalyticsRequest
can also be done in an asynchronous fashion so that
the client can return directly. Users need to specify how the response or
potential failures will be handled by passing the request and a listener to the
asynchronous put-data-frame-analytics method:
The |
The asynchronous method does not block and returns immediately. Once it is
completed the ActionListener
is called back using the onResponse
method
if the execution successfully completed or using the onFailure
method if
it failed. Failure scenarios and expected exceptions are the same as in the
synchronous execution case.
A typical listener for put-data-frame-analytics
looks like:
Response
editThe returned PutDataFrameAnalyticsResponse
contains the newly created data frame analytics job.
DataFrameAnalyticsConfig createdConfig = response.getConfig();
On this page