Using the Attachment Processor in a Pipeline
editUsing the Attachment Processor in a Pipeline
editTable 1. Attachment options
Name | Required | Default | Description |
---|---|---|---|
|
yes |
- |
The field to get the base64 encoded field from |
|
no |
attachment |
The field that will hold the attachment information |
|
no |
100000 |
The number of chars being used for extraction to prevent huge fields. Use |
|
no |
|
Field name from which you can overwrite the number of chars being used for extraction. See |
|
no |
all properties |
Array of properties to select to be stored. Can be |
|
no |
|
If |
For example, this:
PUT _ingest/pipeline/attachment { "description" : "Extract attachment information", "processors" : [ { "attachment" : { "field" : "data" } } ] } PUT my_index/_doc/my_id?pipeline=attachment { "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=" } GET my_index/_doc/my_id
Returns this:
{ "found": true, "_index": "my_index", "_type": "_doc", "_id": "my_id", "_version": 1, "_seq_no": 22, "_primary_term": 1, "_source": { "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=", "attachment": { "content_type": "application/rtf", "language": "ro", "content": "Lorem ipsum dolor sit amet", "content_length": 28 } } }
To specify only some fields to be extracted:
PUT _ingest/pipeline/attachment { "description" : "Extract attachment information", "processors" : [ { "attachment" : { "field" : "data", "properties": [ "content", "title" ] } } ] }
Extracting contents from binary data is a resource intensive operation and consumes a lot of resources. It is highly recommended to run pipelines using this processor in a dedicated ingest node.