IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Ingest Attachment Processor Plugin Limit the number of extracted chars »

› › ›

Using the Attachment Processor in a Pipeline

edit

Using the Attachment Processor in a Pipeline

edit

Table 1. Attachment options

Name	Required	Default	Description
`field`	yes	-	The field to get the base64 encoded field from
`target_field`	no	attachment	The field that will hold the attachment information
`indexed_chars`	no	100000	The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit.
`indexed_chars_field`	no	`null`	Field name from which you can overwrite the number of chars being used for extraction. See `indexed_chars`.
`properties`	no	all properties	Array of properties to select to be stored. Can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language`
`ignore_missing`	no	`false`	If `true` and `field` does not exist, the processor quietly exits without modifying the document

For example, this:

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data"
      }
    }
  ]
}
PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
GET my_index/_doc/my_id

Copy as curl Try in Elastic

Returns this:

{
  "found": true,
  "_index": "my_index",
  "_type": "_doc",
  "_id": "my_id",
  "_version": 1,
  "_seq_no": 22,
  "_primary_term": 1,
  "_source": {
    "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
    "attachment": {
      "content_type": "application/rtf",
      "language": "ro",
      "content": "Lorem ipsum dolor sit amet",
      "content_length": 28
    }
  }
}

To specify only some fields to be extracted:

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "properties": [ "content", "title" ]
      }
    }
  ]
}

Copy as curl Try in Elastic

Extracting contents from binary data is a resource intensive operation and consumes a lot of resources. It is highly recommended to run pipelines using this processor in a dedicated ingest node.

« Ingest Attachment Processor Plugin Limit the number of extracted chars »

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Using the Attachment Processor in a Pipeline

Using the Attachment Processor in a Pipeline

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards