IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Ingest Attachment Processor Plugin Use the attachment processor with CBOR »

› › ›

Using the Attachment Processor in a Pipeline

edit

Using the Attachment Processor in a Pipeline

edit

Table 1. Attachment options

Name	Required	Default	Description
`field`	yes	-	The field to get the base64 encoded field from
`target_field`	no	attachment	The field that will hold the attachment information
`indexed_chars`	no	100000	The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit.
`indexed_chars_field`	no	`null`	Field name from which you can overwrite the number of chars being used for extraction. See `indexed_chars`.
`properties`	no	all properties	Array of properties to select to be stored. Can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language`
`ignore_missing`	no	`false`	If `true` and `field` does not exist, the processor quietly exits without modifying the document
`resource_name`	no		Field containing the name of the resource to decode. If specified, the processor passes this resource name to the underlying Tika library to enable Resource Name Based Detection.

Example

edit

If attaching files to JSON documents, you must first encode the file as a base64 string. On Unix-like systems, you can do this using a base64 command:

base64 -in myfile.rtf

The command returns the base64-encoded string for the file. The following base64 string is for an .rtf file containing the text Lorem ipsum dolor sit amet: e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=.

Use an attachment processor to decode the string and extract the file’s properties:

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data"
      }
    }
  ]
}
PUT my-index-000001/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
GET my-index-000001/_doc/my_id

Copy as curl Try in Elastic

The document’s attachment object contains extracted properties for the file:

{
  "found": true,
  "_index": "my-index-000001",
  "_type": "_doc",
  "_id": "my_id",
  "_version": 1,
  "_seq_no": 22,
  "_primary_term": 1,
  "_source": {
    "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
    "attachment": {
      "content_type": "application/rtf",
      "language": "ro",
      "content": "Lorem ipsum dolor sit amet",
      "content_length": 28
    }
  }
}

To extract only certain attachment fields, specify the properties array:

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "properties": [ "content", "title" ]
      }
    }
  ]
}

Copy as curl Try in Elastic

Extracting contents from binary data is a resource intensive operation and consumes a lot of resources. It is highly recommended to run pipelines using this processor in a dedicated ingest node.

« Ingest Attachment Processor Plugin Use the attachment processor with CBOR »

On this page

Example

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Using the Attachment Processor in a Pipeline

Using the Attachment Processor in a Pipeline

Example

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards