Monitor Amazon Web Services (AWS) with Amazon Data Firehose

Amazon Data Firehose is a popular service that allows you to send your service logs and monitoring metrics to Elastic in minutes without a single line of code and without building or managing your own data ingestion and delivery infrastructure.

What you’ll learn

In this tutorial, you’ll learn how to:

Install AWS integration in Kibana
Create a delivery stream in Amazon Data Firehose
Specify the destination settings for your Firehose stream
Send data to the Firehose delivery stream

Before you begin

Create an Elastic Cloud Hosted deployment in AWS regions (including gov cloud) using. The deployment includes an Elasticsearch cluster for storing and searching your data, and Kibana for visualizing and managing your data.

Step 1: Install AWS integration in Kibana

Install AWS integrations to load index templates, ingest pipelines, and dashboards into Kibana. Find Integrations in the main menu or use the global search field. Find the AWS Integration by browsing the catalog.
Navigate to the Settings tab and click Install AWS assets. Confirm by clicking Install AWS in the popup.
Install Amazon Data Firehose integration assets in Kibana.

Step 2: Create a delivery stream in Amazon Data Firehose

Go to the AWS console and navigate to Amazon Data Firehose.
Click Create Firehose stream and choose the source and destination of your Firehose stream. Unless you are streaming data from Kinesis Data Streams, set source to Direct PUT and destination to Elastic.
Provide a meaningful Firehose stream name that will allow you to identify this delivery stream later.

Note

For advanced use cases, source records can be transformed by invoking a custom Lambda function. When using Elastic integrations, this should not be required.

Step 3: Specify the destination settings for your Firehose stream

From the Destination settings panel, specify the following settings:

Elastic endpoint URL: Enter the Elastic endpoint URL of your Elasticsearch cluster. To find the Elasticsearch endpoint, go to the Elastic Cloud Console and select Connection details. Make sure the endpoint is in the following format: https://<deployment_name>.es.<region>.<csp>.elastic-cloud.com.
API key: Enter the encoded Elastic API key. This can be created in Kibana by following the instructions under API Keys. If you are using an API key with Restrict privileges, make sure to review the Indices privileges to provide at least "auto_configure" & "write" permissions for the indices you will be using with this delivery stream.
Content encoding: To reduce the data transfer costs, use GZIP encoding.
Retry duration: Determines how long Firehose continues retrying the request in the event of an error. A duration between 60 and 300 seconds should be suitable for most use cases.
Parameters:
- es_datastream_name: This parameter is optional and can be used to set which data stream documents will be stored. If not specified, logs are stored in logs-awsfirehose-default data stream and metrics are stored in metrics-aws.cloudwatch-default data stream.
- include_cw_extracted_fields: This parameter is optional and can be set when using a CloudWatch logs subscription filter as the Firehose data source. When set to true, extracted fields generated by the filter pattern in the subscription filter will be collected. Setting this parameter can add many fields to each record and may significantly increase data volume in Elasticsearch. Therefore, use this parameter carefully and only when the extracted fields are required for specific filtering and/or aggregation.
- set_es_document_id: This parameter is optional and can be set to allow Elasticsearch to assign each document a random ID or use a calculated unique ID for each document. The default is true. When set to false, a random ID is used for each document, which helps indexing performance.
Backup settings: It is recommended to configure S3 backup for failed records. These backups can be used to restore data losses caused by unforeseen service outages.

Step 4: Send data to the Firehose delivery stream

You can configure a variety of log sources to send data to Firehose streams directly for example VPC flow logs. Some services don’t support publishing logs directly to Firehose but they do support publishing logs to CloudWatch logs, such as CloudTrail and Lambda. Refer to the AWS documentation for more information.

For example, a typical workflow for sending CloudTrail logs to Firehose would be the following:

Publish CloudTrail logs to a Cloudwatch log group. Refer to the AWS documentation about publishing CloudTrail logs.
Create a subscription filter in the CloudWatch log group to the Firehose stream. Refer to the AWS documentation about using subscription filters.

We also added support for sending CloudWatch monitoring metrics to Elastic using Firehose. For example, you can configure metrics ingestion by creating a metric stream through CloudWatch. You can select an existing Firehose stream by choosing the option Custom setup with Firehose. For more information, refer to the AWS documentation about the custom setup with Firehose.

For more information on Amazon Data Firehose, you can also check the Amazon Data Firehose Integrations documentation.