Create a transform Added in 7.2.0

PUT /_transform/{transform_id}

Creates a transform.

A transform copies data from source indices, transforms it, and persists it into an entity-centric destination index. You can also think of the destination index as a two-dimensional tabular data structure (known as a data frame). The ID for each document in the data frame is generated from a hash of the entity, so there is a unique row per entity.

You must choose either the latest or pivot method for your transform; you cannot use both in a single transform. If you choose to use the pivot method for your transform, the entities are defined by the set of group_by fields in the pivot object. If you choose to use the latest method, the entities are defined by the unique_key field values in the latest object.

You must have create_index, index, and read privileges on the destination index and read and view_index_metadata privileges on the source indices. When Elasticsearch security features are enabled, the transform remembers which roles the user that created it had at the time of creation and uses those same roles. If those roles do not have the required privileges on the source and destination indices, the transform fails when it attempts unauthorized operations.

NOTE: You must use Kibana or this API to create a transform. Do not add a transform directly into any .transform-internal* indices using the Elasticsearch index API. If Elasticsearch security features are enabled, do not give users any privileges on .transform-internal* indices. If you used transforms prior to 7.5, also do not give users any privileges on .data-frame-internal* indices.

Path parameters

  • transform_id string Required

    Identifier for the transform. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It has a 64 character limit and must start and end with alphanumeric characters.

Query parameters

  • When the transform is created, a series of validations occur to ensure its success. For example, there is a check for the existence of the source indices and a check that the destination index is not part of the source index pattern. You can use this parameter to skip the checks, for example when the source index does not exist until after the transform is created. The validations are always run when you start the transform, however, with the exception of privilege checks.

  • timeout string

    Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.

application/json

Body Required

  • dest object Required

    Additional properties are allowed.

    Hide dest attributes Show dest attributes object
  • Free text description of the transform.

  • A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

  • latest object

    Additional properties are allowed.

    Hide latest attributes Show latest attributes object
    • sort string Required

      Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

    • unique_key array[string] Required

      Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

  • _meta object
    Hide _meta attribute Show _meta attribute object
    • * object Additional properties

      Additional properties are allowed.

  • pivot object

    Additional properties are allowed.

    Hide pivot attributes Show pivot attributes object
    • Defines how to aggregate the grouped data. The following aggregations are currently supported: average, bucket script, bucket selector, cardinality, filter, geo bounds, geo centroid, geo line, max, median absolute deviation, min, missing, percentiles, rare terms, scripted metric, stats, sum, terms, top metrics, value count, weighted average.

    • group_by object

      Defines how to group the data. More than one grouping can be defined per pivot. The following groupings are currently supported: date histogram, geotile grid, histogram, terms.

      Hide group_by attribute Show group_by attribute object
      • * object Additional properties

        Additional properties are allowed.

        Hide * attributes Show * attributes object
        • Additional properties are allowed.

          Hide date_histogram attributes Show date_histogram attributes object
          • Values are second, 1s, minute, 1m, hour, 1h, day, 1d, week, 1w, month, 1M, quarter, 1q, year, or 1y.

          • Additional properties are allowed.

            Hide extended_bounds attributes Show extended_bounds attributes object
          • Additional properties are allowed.

            Hide hard_bounds attributes Show hard_bounds attributes object
          • field string

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

          • format string

            The date format used to format key_as_string in the response. If no format is specified, the first date format specified in the field mapping is used.

          • interval string

            A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

          • Only returns buckets that have min_doc_count number of documents. By default, all buckets between the first bucket that matches documents and the last one are returned.

          • missing string
          • offset string

            A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

          • params object
            Hide params attribute Show params attribute object
            • * object Additional properties

              Additional properties are allowed.

          • script object

            Additional properties are allowed.

            Hide script attributes Show script attributes object
            • source string

              The script source.

            • id string
            • params object

              Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

            • options object
          • keyed boolean

            Set to true to associate a unique string key with each bucket and return the ranges as a hash rather than an array.

        • Additional properties are allowed.

          Hide geotile_grid attributes Show geotile_grid attributes object
        • Additional properties are allowed.

          Hide histogram attributes Show histogram attributes object
          • Additional properties are allowed.

            Hide extended_bounds attributes Show extended_bounds attributes object
            • max number

              Maximum value for the bound.

            • min number

              Minimum value for the bound.

          • Additional properties are allowed.

            Hide hard_bounds attributes Show hard_bounds attributes object
            • max number

              Maximum value for the bound.

            • min number

              Minimum value for the bound.

          • field string

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • interval number

            The interval for the buckets. Must be a positive decimal.

          • Only returns buckets that have min_doc_count number of documents. By default, the response will fill gaps in the histogram with empty buckets.

          • missing number

            The value to apply to documents that do not have a value. By default, documents without a value are ignored.

          • offset number

            By default, the bucket keys start with 0 and then continue in even spaced steps of interval. The bucket boundaries can be shifted by using the offset option.

          • script object

            Additional properties are allowed.

            Hide script attributes Show script attributes object
            • source string

              The script source.

            • id string
            • params object

              Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

            • options object
          • format string
          • keyed boolean

            If true, returns buckets as a hash instead of an array, keyed by the bucket keys.

  • Additional properties are allowed.

    Hide retention_policy attribute Show retention_policy attribute object
    • time object

      Additional properties are allowed.

      Hide time attributes Show time attributes object
      • field string Required

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • max_age string Required

        A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

  • settings object

    Additional properties are allowed.

    Hide settings attributes Show settings attributes object
    • Specifies whether the transform checkpoint ranges should be optimized for performance. Such optimization can align checkpoint ranges with the date histogram interval when date histogram is specified as a group source in the transform config. As a result, less document updates in the destination index will be performed thus improving overall performance.

    • Defines if dates in the ouput should be written as ISO formatted string or as millis since epoch. epoch_millis was the default for transforms created before version 7.11. For compatible output set this value to true.

    • Specifies whether the transform should deduce the destination index mappings from the transform configuration.

    • Specifies a limit on the number of input documents per second. This setting throttles the transform by adding a wait time between search requests. The default value is null, which disables throttling.

    • Defines the initial page size to use for the composite aggregation for each checkpoint. If circuit breaker exceptions occur, the page size is dynamically adjusted to a lower value. The minimum value is 10 and the maximum is 65,536.

    • unattended boolean

      If true, the transform runs in unattended mode. In unattended mode, the transform retries indefinitely in case of an error which means the transform never fails. Setting the number of retries other than infinite fails in validation.

  • source object Required

    Additional properties are allowed.

    Hide source attributes Show source attributes object
    • index string | array[string] Required
    • Hide runtime_mappings attribute Show runtime_mappings attribute object
      • * object Additional properties

        Additional properties are allowed.

        Hide * attributes Show * attributes object
        • fields object

          For type composite

          Hide fields attribute Show fields attribute object
          • * object Additional properties

            Additional properties are allowed.

            Hide * attribute Show * attribute object
            • type string Required

              Values are boolean, composite, date, double, geo_point, ip, keyword, long, or lookup.

        • fetch_fields array[object]

          For type lookup

          Hide fetch_fields attributes Show fetch_fields attributes object
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • format string
        • format string

          A custom format for date type runtime fields.

        • Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

        • Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

        • script object

          Additional properties are allowed.

          Hide script attributes Show script attributes object
          • source string

            The script source.

          • id string
          • params object

            Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

            Hide params attribute Show params attribute object
            • * object Additional properties

              Additional properties are allowed.

          • lang string

            Any of:

            Values are painless, expression, mustache, or java.

          • options object
            Hide options attribute Show options attribute object
            • * string Additional properties
        • type string Required

          Values are boolean, composite, date, double, geo_point, ip, keyword, long, or lookup.

    • query object

      A query clause that retrieves a subset of data from the source index.

      Additional properties are allowed.

  • sync object

    Additional properties are allowed.

    Hide sync attribute Show sync attribute object
    • time object

      Additional properties are allowed.

      Hide time attributes Show time attributes object
      • delay string

        A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

      • field string Required

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

PUT /_transform/{transform_id}
curl \
 -X PUT http://api.example.com/_transform/{transform_id} \
 -H "Content-Type: application/json" \
 -d '{"dest":{"index":"kibana_sample_data_ecommerce_transform1","pipeline":"add_timestamp_pipeline"},"sync":{"time":{"delay":"60s","field":"order_date"}},"pivot":{"group_by":{"customer_id":{"terms":{"field":"customer_id","missing_bucket":true}}},"aggregations":{"max_price":{"max":{"field":"taxful_total_price"}}}},"source":{"index":"kibana_sample_data_ecommerce","query":{"term":{"geoip.continent_name":{"value":"Asia"}}}},"frequency":"5m","description":"Maximum priced ecommerce data by customer_id in Asia","retention_policy":{"time":{"field":"order_date","max_age":"30d"}}}'
{
  "dest": {
    "index": "kibana_sample_data_ecommerce_transform1",
    "pipeline": "add_timestamp_pipeline"
  },
  "sync": {
    "time": {
      "delay": "60s",
      "field": "order_date"
    }
  },
  "pivot": {
    "group_by": {
      "customer_id": {
        "terms": {
          "field": "customer_id",
          "missing_bucket": true
        }
      }
    },
    "aggregations": {
      "max_price": {
        "max": {
          "field": "taxful_total_price"
        }
      }
    }
  },
  "source": {
    "index": "kibana_sample_data_ecommerce",
    "query": {
      "term": {
        "geoip.continent_name": {
          "value": "Asia"
        }
      }
    }
  },
  "frequency": "5m",
  "description": "Maximum priced ecommerce data by customer_id in Asia",
  "retention_policy": {
    "time": {
      "field": "order_date",
      "max_age": "30d"
    }
  }
}
{
  "dest": {
    "index": "kibana_sample_data_ecommerce_transform2"
  },
  "sync": {
    "time": {
      "delay": "60s",
      "field": "order_date"
    }
  },
  "latest": {
    "sort": "order_date",
    "unique_key": [
      "customer_id"
    ]
  },
  "source": {
    "index": "kibana_sample_data_ecommerce"
  },
  "frequency": "5m",
  "description": "Latest order for each customer"
}
Response examples (200)
{
  "acknowledged": true
}