Ingest processor reference
editIngest processor reference
editAn ingest pipeline is made up of a sequence of processors that are applied to documents as they are ingested into an index. Each processor performs a specific task, such as filtering, transforming, or enriching data.
Each successive processor depends on the output of the previous processor, so the order of processors is important. The modified documents are indexed into Elasticsearch after all processors are applied.
Elasticsearch includes over 40 configurable processors. The subpages in this section contain reference documentation for each processor. To get a list of available processors, use the nodes info API.
response = client.nodes.info( node_id: 'ingest', filter_path: 'nodes.*.ingest.processors' ) puts response
GET _nodes/ingest?filter_path=nodes.*.ingest.processors
Ingest processors by category
editWe’ve categorized the available processors on this page and summarized their functions. This will help you find the right processor for your use case.
Data enrichment processors
editGeneral outcomes
edit-
append
processor - Appends a value to a field.
-
date_index_name
processor - Points documents to the right time-based index based on a date or timestamp field.
-
enrich
processor - Enriches documents with data from another index.
Refer to Enrich your data for detailed examples of how to use the enrich
processor to add data from your existing indices to incoming documents during ingest.
-
inference
processor - Uses machine learning to classify and tag text fields.
Specific outcomes
edit-
attachment
processor - Parses and indexes binary data, such as PDFs and Word documents.
-
circle
processor - Converts a location field to a Geo-Point field.
-
community_id
processor - Computes the Community ID for network flow data.
-
fingerprint
processor - Computes a hash of the document’s content.
-
geo_grid
processor - Converts geo-grid definitions of grid tiles or cells to regular bounding boxes or polygons which describe their shape.
-
geoip
processor - Adds information about the geographical location of an IPv4 or IPv6 address.
-
network_direction
processor - Calculates the network direction given a source IP address, destination IP address, and a list of internal networks.
-
registered_domain
processor - Extracts the registered domain (also known as the effective top-level domain or eTLD), sub-domain, and top-level domain from a fully qualified domain name (FQDN).
-
set_security_user
processor -
Sets user-related details (such as
username
,roles
,email
,full_name
,metadata
,api_key
,realm
andauthentication_type
) from the current authenticated user to the current document by pre-processing the ingest. -
uri_parts
processor - Parses a Uniform Resource Identifier (URI) string and extracts its components as an object.
-
urldecode
processor - URL-decodes a string.
-
user_agent
processor - Parses user-agent strings to extract information about web clients.
Data transformation processors
editGeneral outcomes
edit-
convert
processor - Converts a field in the currently ingested document to a different type, such as converting a string to an integer.
-
dissect
processor - Extracts structured fields out of a single text field within a document. Unlike the grok processor, dissect does not use regular expressions. This makes the dissect’s a simpler and often faster alternative.
-
grok
processor - Extracts structured fields out of a single text field within a document, using the Grok regular expression dialect that supports reusable aliased expressions.
-
gsub
processor - Converts a string field by applying a regular expression and a replacement.
-
redact
processor - Uses the Grok rules engine to obscure text in the input document matching the given Grok patterns.
-
rename
processor - Renames an existing field.
-
set
processor - Sets a value on a field.
Specific outcomes
edit-
bytes
processor -
Converts a human-readable byte value to its value in bytes (for example
1kb
becomes1024
). -
csv
processor - Extracts a single line of CSV data from a text field.
-
date
processor - Extracts and converts date fields.
-
dot_expand
processor - Expands a field with dots into an object field.
-
html_strip
processor - Removes HTML tags from a field.
-
join
processor - Joins each element of an array into a single string using a separator character between each element.
-
kv
processor - Parse messages (or specific event fields) containing key-value pairs.
-
lowercase
processor anduppercase
processor - Converts a string field to lowercase or uppercase.
-
split
processor - Splits a field into an array of values.
-
trim
processor - Trims whitespace from field.
Data filtering processors
edit-
drop
processor - Drops the document without raising any errors.
-
remove
processor - Removes fields from documents.
Pipeline handling processors
edit-
fail
processor - Raises an exception. Useful for when you expect a pipeline to fail and want to relay a specific message to the requester.
-
pipeline
processor - Executes another pipeline.
-
reroute
processor - Reroutes documents to another target index or data stream.
Array/JSON handling processors
edit-
for_each
processor - Runs an ingest processor on each element of an array or object.
-
json
processor - Converts a JSON string into a structured JSON object.
-
script
processor -
Runs an inline or stored script on incoming documents.
The script runs in the painless
ingest
context. -
sort
processor - Sorts the elements of an array in ascending or descending order.
Add additional processors
editYou can install additional processors as plugins.
You must install any plugin processors on all nodes in your cluster. Otherwise, Elasticsearch will fail to create pipelines containing the processor.
Mark a plugin as mandatory by setting plugin.mandatory
in
elasticsearch.yml
. A node will fail to start if a mandatory plugin is not
installed.
plugin.mandatory: my-ingest-plugin