Usage
editUsage
editUsing the attachment type is simple, in your mapping JSON, simply set a certain JSON element as attachment, for example:
PUT /test { "mappings": { "person" : { "properties" : { "my_attachment" : { "type" : "attachment" } } } } }
In this case, the JSON to index can be:
PUT /test/person/1 { "my_attachment" : "... base64 encoded attachment ..." }
Or it is possible to use more elaborated JSON if content type, resource name or language need to be set explicitly:
PUT /test/person/1 { "my_attachment" : { "_content_type" : "application/pdf", "_name" : "resource/name/of/my.pdf", "_language" : "en", "_content" : "... base64 encoded attachment ..." } }
The attachment
type not only indexes the content of the doc in content
sub field, but also automatically adds meta
data on the attachment as well (when available).
The metadata supported are:
-
date
-
title
-
name
only available if you set_name
see above -
author
-
keywords
-
content_type
-
content_length
is the original content_length before text extraction (aka file size) -
language
They can be queried using the "dot notation", for example: my_attachment.author
.
Both the meta data and the actual content are simple core type mappers (text, date, …), thus, they can be controlled in the mappings. For example:
PUT /test { "settings": { "index": { "analysis": { "analyzer": { "my_analyzer": { "type": "custom", "tokenizer": "standard", "filter": ["standard"] } } } } }, "mappings": { "person" : { "properties" : { "file" : { "type" : "attachment", "fields" : { "content" : {"index" : true}, "title" : {"store" : true}, "date" : {"store" : true}, "author" : {"analyzer" : "my_analyzer"}, "keywords" : {"store" : true}, "content_type" : {"store" : true}, "content_length" : {"store" : true}, "language" : {"store" : true} } } } } } }
In the above example, the actual content indexed is mapped under fields
name content
, and we decide not to index it, so
it will only be available in the _all
field. The other fields map to their respective metadata names, but there is no
need to specify the type
(like text
or date
) since it is already known.