Data security

edit

When setting up Elastic APM, it’s essential to review all captured data carefully to ensure it does not contain sensitive information. When it does, we offer several different ways to filter, manipulate, or obfuscate this data.

Built-in data filters

APM agents provide built-in support for filtering the following types of data:

Data type Common sensitive data

HTTP headers

Passwords, credit card numbers, authorization, etc.

HTTP bodies

Passwords, credit card numbers, etc.

Real user monitoring data

URLs visited, click events, user browser errors, resources used, etc.

Database statements

Sensitive user or business information

Custom filters

There are two ways to filter other types APM data:

Ingest node pipeline

Applied at ingestion time. All agents and fields are supported. Data leaves the instrumented service. There are no performance overhead implications on the instrumented service.

APM agent filters

Not supported by all agents. Data is sanitized before leaving the instrumented service. Potential overhead implications on the instrumented service

Built-in data filtering

edit

APM agents provide built-in support for filtering or obfuscating the following types of data.

HTTP headers

edit

By default, APM agents capture HTTP request and response headers (including cookies). Most Elastic APM agents provide the ability to sanitize HTTP header fields, including cookies and application/x-www-form-urlencoded data (POST form fields). Query string and captured request bodies, like application/json data, are not sanitized.

The default list of sanitized fields attempts to target common field names for data relating to passwords, credit card numbers, authorization, etc., but can be customized to fit your data. This sensitive data never leaves the instrumented service.

This setting supports Central configuration, which means the list of sanitized fields can be updated without needing to redeploy your services:

Alternatively, you can completely disable the capturing of HTTP headers. This setting also supports Central configuration:

HTTP bodies

edit

By default, the body of HTTP requests is not recorded. Request bodies often contain sensitive data like passwords or credit card numbers, so use care when enabling this feature.

This setting supports Central configuration, which means the list of sanitized fields can be updated without needing to redeploy your services:

Real user monitoring data

edit

Protecting user data is important. For that reason, individual RUM instrumentations can be disabled in the RUM agent with the disableInstrumentations configuration variable. Disabled instrumentations produce no spans or transactions.

Disable Configuration value

HTTP requests

fetch and xmlhttprequest

Page load metrics including static resources

page-load

JavaScript errors on the browser

error

User click events including URLs visited, mouse clicks, and navigation events

eventtarget

Single page application route changes

history

Database statements

edit

For SQL databases, APM agents do not capture the parameters of prepared statements. Note that Elastic APM currently does not make an effort to strip parameters of regular statements. Not using prepared statements makes your code vulnerable to SQL injection attacks, so be sure to use prepared statements.

For non-SQL data stores, such as Elasticsearch or MongoDB, Elastic APM captures the full statement for queries. For inserts or updates, the full document is not stored. To filter or obfuscate data in non-SQL database statements, or to remove the statement entirely, you can set up an ingest node pipeline.

Agent-specific options

edit

Certain agents offer additional filtering and obfuscating options:

Agent configuration options

  • (Node.js) Remove errors raised by the server-side process: Disable with captureExceptions.
  • (Java) Remove process arguments from transactions:
  • Disabled by default with include_process_args.

Custom filters

edit

There are two ways to filter or obfuscate other types of APM data:

Create an ingest node pipeline filter

edit

Ingest node pipelines specify a series of processors that transform data in a specific way. Transformation happens prior to indexing–inflicting no performance overhead on the monitored application. Pipelines are a flexible and easy way to filter or obfuscate Elastic APM data.

Example

Say you decide to enable the capturing of HTTP request bodies, but quickly notice that sensitive information is being collected in the http.request.body.original field:

{
  "email": "test@abc.com",
  "password": "hunter2"
}

To obfuscate the passwords stored in the request body, use a series of ingest processors. To start, create a pipeline with a simple description and an empty array of processors:

{
  "pipeline": {
    "description": "redact http.request.body.original.password",
    "processors": [] 
  }
}

The processors defined below will go in this array

Add the first processor to the processors array. Because the agent captures the request body as a string, use the JSON processor to convert the original field value into a structured JSON object. Save this JSON object in a new field:

{
  "json": {
    "field": "http.request.body.original",
    "target_field": "http.request.body.original_json",
    "ignore_failure": true
  }
}

If body.original_json is not null, redact the password with the set processor, by setting the value of body.original_json.password to "redacted":

{
  "set": {
    "field": "http.request.body.original_json.password",
    "value": "redacted",
    "if": "ctx?.http?.request?.body?.original_json != null"
  }
}

Use the convert processor to convert the JSON value of body.original_json to a string and set it as the body.original value:

{
  "convert": {
    "field": "http.request.body.original_json",
    "target_field": "http.request.body.original",
    "type": "string",
    "if": "ctx?.http?.request?.body?.original_json != null",
    "ignore_failure": true
  }
}

Finally, use the remove processor to remove the body.original_json field:

{
  "remove": {
    "field": "http.request.body.original",
    "if": "ctx?.http?.request?.body?.original_json != null",
    "ignore_failure": true
  }
}

Now that the pipeline has been defined, use the create or update pipeline API to register the new pipeline in Elasticsearch. Name the pipeline apm_redacted_body_password:

PUT _ingest/pipeline/apm_redacted_body_password
{
  "description": "redact http.request.body.original.password",
  "processors": [
    {
      "json": {
        "field": "http.request.body.original",
        "target_field": "http.request.body.original_json",
        "ignore_failure": true
      }
    },
    {
      "set": {
        "field": "http.request.body.original_json.password",
        "value": "redacted",
        "if": "ctx?.http?.request?.body?.original_json != null"
      }
    },
    {
      "convert": {
        "field": "http.request.body.original_json",
        "target_field": "http.request.body.original",
        "type": "string",
        "if": "ctx?.http?.request?.body?.original_json != null",
        "ignore_failure": true
      }
    },
    {
      "remove": {
        "field": "http.request.body.original_json",
        "if": "ctx?.http?.request?.body?.original_json != null",
        "ignore_failure": true
      }
    }
  ]
}

To make sure the apm_redacted_body_password pipeline works correctly, test it with the simulate pipeline API. This API allows you to run multiple documents through a pipeline to ensure it is working correctly.

The request below simulates running three different documents through the pipeline:

POST _ingest/pipeline/apm_redacted_body_password/_simulate
{
  "docs": [
    {
      "_source": { 
        "http": {
          "request": {
            "body": {
              "original": """{"email": "test@abc.com", "password": "hunter2"}"""
            }
          }
        }
      }
    },
    {
      "_source": { 
        "some-other-field": true
      }
    },
    {
      "_source": { 
        "http": {
          "request": {
            "body": {
              "original": """["invalid json" """
            }
          }
        }
      }
    }
  ]
}

This document features the same sensitive data from the original example above

This document only contains an unrelated field

This document contains invalid JSON

The API response should be similar to this:

{
  "docs" : [
    {
      "doc" : {
        "_source" : {
          "http" : {
            "request" : {
              "body" : {
                "original" : {
                  "password" : "redacted",
                  "email" : "test@abc.com"
                }
              }
            }
          }
        }
      }
    },
    {
      "doc" : {
        "_source" : {
          "nobody" : true
        }
      }
    },
    {
      "doc" : {
        "_source" : {
          "http" : {
            "request" : {
              "body" : {
                "original" : """["invalid json" """
              }
            }
          }
        }
      }
    }
  ]
}

As you can see, only the first simulated document has a redacted password field. As expected, all other documents are unaffected.

The final step in this process is to add the newly created apm_redacted_body_password pipeline to the default apm pipeline. This ensures that all APM data ingested into Elasticsearch runs through the pipeline.

Get the current list of apm pipelines:

GET _ingest/pipeline/apm

Append the newly created pipeline to the end of the processors array and register the apm pipeline. Your request will look similar to this:

{
  "apm" : {
    "processors" : [
      {
        "pipeline" : {
          "name" : "apm_user_agent"
        }
      },
      {
        "pipeline" : {
          "name" : "apm_user_geo"
        }
      },
      {
        "pipeline": {
        "name": "apm_redacted_body_password"
      }
    ],
    "description" : "Default enrichment for APM events"
  }
}

That’s it! Sit back and relax–passwords have been redacted from your APM HTTP body data.

See parse data using ingest node pipelines to learn more about the default apm pipeline.

APM agent filters

edit

Some APM agents offer a way to manipulate or drop APM events before they are sent to the APM Server. Please see the relevant agent’s documentation for more information and examples: