Data securityedit
When setting up Elastic APM, it’s essential to review all captured data carefully to ensure it does not contain sensitive information. When it does, we offer several different ways to filter, manipulate, or obfuscate this data.
Built-in data filters
APM agents provide built-in support for filtering the following types of data:
Data type | Common sensitive data |
---|---|
Passwords, credit card numbers, authorization, etc. |
|
Passwords, credit card numbers, etc. |
|
URLs visited, click events, user browser errors, resources used, etc. |
|
Sensitive user or business information |
Custom filters
There are two ways to filter other types APM data:
Applied at ingestion time. All agents and fields are supported. Data leaves the instrumented service. There are no performance overhead implications on the instrumented service. |
|
Not supported by all agents. Data is sanitized before leaving the instrumented service. Potential overhead implications on the instrumented service |
Built-in data filteringedit
APM agents provide built-in support for filtering or obfuscating the following types of data.
HTTP headersedit
By default, APM agents capture HTTP request and response headers (including cookies).
Most Elastic APM agents provide the ability to sanitize HTTP header fields,
including cookies and application/x-www-form-urlencoded
data (POST form fields).
Query string and captured request bodies, like application/json
data, are not sanitized.
The default list of sanitized fields attempts to target common field names for data relating to passwords, credit card numbers, authorization, etc., but can be customized to fit your data. This sensitive data never leaves the instrumented service.
This setting supports Central configuration, which means the list of sanitized fields can be updated without needing to redeploy your services:
-
Go:
ELASTIC_APM_SANITIZE_FIELD_NAMES
-
Java:
sanitize_field_names
-
.NET:
sanitizeFieldNames
-
Node.js:
sanitizeFieldNames
-
Python:
sanitize_field_names
-
Ruby:
sanitize_field_names
Alternatively, you can completely disable the capturing of HTTP headers. This setting also supports Central configuration:
-
Go:
ELASTIC_APM_CAPTURE_HEADERS
-
Java:
capture_headers
-
.NET:
CaptureHeaders
-
Node.js:
captureHeaders
-
Python:
capture_headers
-
Ruby:
capture_headers
HTTP bodiesedit
By default, the body of HTTP requests is not recorded. Request bodies often contain sensitive data like passwords or credit card numbers, so use care when enabling this feature.
This setting supports Central configuration, which means the list of sanitized fields can be updated without needing to redeploy your services:
-
Go:
ELASTIC_APM_CAPTURE_BODY
-
Java:
capture_body
-
.NET:
CaptureBody
-
Node.js:
captureBody
-
Python:
capture_body
-
Ruby:
capture_body
Real user monitoring dataedit
Protecting user data is important.
For that reason, individual RUM instrumentations can be disabled in the RUM agent with the
disableInstrumentations
configuration variable.
Disabled instrumentations produce no spans or transactions.
Disable | Configuration value |
---|---|
HTTP requests |
|
Page load metrics including static resources |
|
JavaScript errors on the browser |
|
User click events including URLs visited, mouse clicks, and navigation events |
|
Single page application route changes |
|
Database statementsedit
For SQL databases, APM agents do not capture the parameters of prepared statements. Note that Elastic APM currently does not make an effort to strip parameters of regular statements. Not using prepared statements makes your code vulnerable to SQL injection attacks, so be sure to use prepared statements.
For non-SQL data stores, such as Elasticsearch or MongoDB, Elastic APM captures the full statement for queries. For inserts or updates, the full document is not stored. To filter or obfuscate data in non-SQL database statements, or to remove the statement entirely, you can set up an ingest node pipeline.
Agent-specific optionsedit
Certain agents offer additional filtering and obfuscating options:
Agent configuration options
- (Node.js) Remove errors raised by the server-side process: Disable with captureExceptions.
- (Java) Remove process arguments from transactions:
-
Disabled by default with
include_process_args
.
Custom filtersedit
There are two ways to filter or obfuscate other types of APM data:
Create an ingest node pipeline filteredit
Ingest node pipelines specify a series of processors that transform data in a specific way. Transformation happens prior to indexing–inflicting no performance overhead on the monitored application. Pipelines are a flexible and easy way to filter or obfuscate Elastic APM data.
Example
Say you decide to enable the capturing of HTTP request bodies,
but quickly notice that sensitive information is being collected in the
http.request.body.original
field:
{ "email": "test@abc.com", "password": "hunter2" }
To obfuscate the passwords stored in the request body, use a series of ingest processors. To start, create a pipeline with a simple description and an empty array of processors:
Add the first processor to the processors array. Because the agent captures the request body as a string, use the JSON processor to convert the original field value into a structured JSON object. Save this JSON object in a new field:
{ "json": { "field": "http.request.body.original", "target_field": "http.request.body.original_json", "ignore_failure": true } }
If body.original_json
is not null
, redact the password
with the set processor,
by setting the value of body.original_json.password
to "redacted"
:
{ "set": { "field": "http.request.body.original_json.password", "value": "redacted", "if": "ctx?.http?.request?.body?.original_json != null" } }
Use the convert processor to convert the JSON value of body.original_json
to a string and set it as the body.original
value:
{ "convert": { "field": "http.request.body.original_json", "target_field": "http.request.body.original", "type": "string", "if": "ctx?.http?.request?.body?.original_json != null", "ignore_failure": true } }
Finally, use the remove processor to remove the body.original_json
field:
{ "remove": { "field": "http.request.body.original", "if": "ctx?.http?.request?.body?.original_json != null", "ignore_failure": true } }
Now that the pipeline has been defined,
use the create or update pipeline API to register the new pipeline in Elasticsearch.
Name the pipeline apm_redacted_body_password
:
PUT _ingest/pipeline/apm_redacted_body_password { "description": "redact http.request.body.original.password", "processors": [ { "json": { "field": "http.request.body.original", "target_field": "http.request.body.original_json", "ignore_failure": true } }, { "set": { "field": "http.request.body.original_json.password", "value": "redacted", "if": "ctx?.http?.request?.body?.original_json != null" } }, { "convert": { "field": "http.request.body.original_json", "target_field": "http.request.body.original", "type": "string", "if": "ctx?.http?.request?.body?.original_json != null", "ignore_failure": true } }, { "remove": { "field": "http.request.body.original_json", "if": "ctx?.http?.request?.body?.original_json != null", "ignore_failure": true } } ] }
To make sure the apm_redacted_body_password
pipeline works correctly,
test it with the simulate pipeline API.
This API allows you to run multiple documents through a pipeline to ensure it is working correctly.
The request below simulates running three different documents through the pipeline:
POST _ingest/pipeline/apm_redacted_body_password/_simulate { "docs": [ { "_source": { "http": { "request": { "body": { "original": """{"email": "test@abc.com", "password": "hunter2"}""" } } } } }, { "_source": { "some-other-field": true } }, { "_source": { "http": { "request": { "body": { "original": """["invalid json" """ } } } } } ] }
This document features the same sensitive data from the original example above |
|
This document only contains an unrelated field |
|
This document contains invalid JSON |
The API response should be similar to this:
{ "docs" : [ { "doc" : { "_source" : { "http" : { "request" : { "body" : { "original" : { "password" : "redacted", "email" : "test@abc.com" } } } } } } }, { "doc" : { "_source" : { "nobody" : true } } }, { "doc" : { "_source" : { "http" : { "request" : { "body" : { "original" : """["invalid json" """ } } } } } } ] }
As you can see, only the first simulated document has a redacted password field. As expected, all other documents are unaffected.
The final step in this process is to add the newly created apm_redacted_body_password
pipeline
to the default apm
pipeline. This ensures that all APM data ingested into Elasticsearch runs through the pipeline.
Get the current list of apm
pipelines:
GET _ingest/pipeline/apm
Append the newly created pipeline to the end of the processors array and register the apm
pipeline.
Your request will look similar to this:
{ "apm" : { "processors" : [ { "pipeline" : { "name" : "apm_user_agent" } }, { "pipeline" : { "name" : "apm_user_geo" } }, { "pipeline": { "name": "apm_redacted_body_password" } ], "description" : "Default enrichment for APM events" } }
That’s it! Sit back and relax–passwords have been redacted from your APM HTTP body data.
See parse data using ingest node pipelines
to learn more about the default apm
pipeline.
APM agent filtersedit
Some APM agents offer a way to manipulate or drop APM events before they are sent to the APM Server. Please see the relevant agent’s documentation for more information and examples:
- .NET: Filter API.
-
Node.js:
addFilter()
. - Python: custom processors.