Elastic SIEM for home and small business: GeoIP data and Beats config review

Note: The “SIEM for home and small business” blog series contains configurations relevant to the beta release of Elastic SIEM using Elastic Stack 7.4. We recommend using Elastic Stack 7.6 and newer, as Elastic SIEM was made generally available in 7.6. Please also note the Elastic SIEM solution mentioned in this post is now referred to as Elastic Security.

Hey, there. This is part three of the Elastic SIEM for home and small business blog series. In the Getting started blog, we created our Elasticsearch Service deployment and started collecting data from one of our computers using Winlogbeat. In the Securing cluster access blog, we secured access to our cluster by restricting privileges for users and Beats. If you haven’t read the first and second blogs, you may want to before going any further. In this blog, we will create an ingest pipeline for GeoIP data and review our Beats configurations.

Configuring our Elasticsearch Service deployment

Before we configure all of our systems to send data to our Elasticsearch Service deployment, we need to configure an ingest pipeline that will enrich our data with GeoIP information. We also will review the common configurations for Beats, so we have time to decide and standardize our settings before we start collecting data from our systems. Since we will be making administrative changes to our cluster, we will need to sign into the Kibana instance of our Elasticsearch Service deployment using our elastic superuser account.

Configuring an ingest pipeline for GeoIP data

The way we will enrich our data from Beats is using the GeoIP processor in an ingest pipeline. This configuration allows us to add GeoIP data when it is sent to Elasticsearch. For this section of the blog I will reference Packetbeat, even though each Beat supports using the GeoIP processor in an ingest pipeline. We can accomplish this using either the Packetbeat guide or the SIEM guide. Read through the Packetbeat “Enrich events with GeoIP information” guide before getting started.

In Kibana click Dev Tools. Then in the console issue the PUT _ingest/pipeline/geoip-info command (from the guide and shown below):

PUT geoip-info pipeline command

If you would prefer to copy and paste, this is the command to issue in Dev Tools:

PUT _ingest/pipeline/geoip-info
{
  "description": "Add geoip info",
  "processors": [
    {
      "geoip": {
        "field": "client.ip",
        "target_field": "client.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "source.ip",
        "target_field": "source.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "destination.ip",
        "target_field": "destination.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "server.ip",
        "target_field": "server.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "host.ip",
        "target_field": "host.geo",
        "ignore_missing": true
      }
    }
  ]
}

Once we issue the command, we should receive an "acknowledged" : true response in the Dev Tools console window:

Acknowledged true in DevTools

While still in the Dev Tools console, issue the GET _ingest/pipeline/geoip-info command to validate that our ingest pipeline settings match what is listed in the guide. Now that we have our ingest pipeline created, we will review our common Beats configurations.

Reviewing common Beats configurations

The reason we want to review our common Beats configurations is so we can standardize the data (and settings) across the different Beats we plan on deploying. While I encourage you to read through the documentation for each of the Beats we will configure (Auditbeat, Filebeat, Packetbeat, and Winlogbeat), we will highlight a few of the configuration settings and why we will (or won't) use them.

For this section, we'll use the Configuring Winlogbeat guide to review the general settings that are common across each Beat. Based on the configuration we used in part one and part two of this blog series, the common settings we want to review are in the General, Top Level Processor, Elastic Cloud, and Xpack Monitoring sections. We will also review a few other settings to implement in our configurations. I highly encourage you to read through the documentation for each of these configuration items, so you can decide how you want to enrich the data being shipped from your systems.

Beats general settings

In the General section of our configuration we have defined name, tags, and fields (fields.env). The host.name setting is very important as it is the primary field SIEM uses to pivot on data. While we can leave these settings blank, let’s review how we can filter this data and why it is a good idea to define these settings.

#=== General === 
name: KidsPC 
tags: ["Home", "KidsPC"] 
fields: 
  env: home

With this configuration, we will be able to group all transactions using agent.name: KidsPC:

Kibana Discover for agent.name

Some other ways we can filter through this data in Kibana are using the following:

  • tags: home
  • tags: KidsPC
  • host.name: KidsPC
  • fields.env: home

Beats top-level processor settings

While there are different ways to define processors, we are using the top-level configuration to enrich our data before it is shipped to our Elastic cluster. Even though there are many processors available, the example in the blog uses add_host_metadata, add_locale, add_cloud_metadata, and add_fields. When we define a processor with a tilde (~), we are enabling the processor in the respective Beat:

#=== Top Level Processor === 
processors: 
  - add_host_metadata: 
      # netinfo.enabled should be set to `false` until GitHub issue
      # https://github.com/elastic/elasticsearch/issues/46193 is resolved
      netinfo.enabled: false 
      Geo: # These Geo configurations are optional 
        location: 40.7128, -74.0060 
        continent_name: North America 
        country_iso_code: US 
        region_name: New York 
        region_iso_code: US-NY 
        city_name: New York City 
        name: myLocation 
  - add_locale: ~ 
  - add_cloud_metadata: ~ 
  - add_fields: 
      #when.network.source.ip: 10.101.101.0/24 
      when.network.source.ip: private 
      fields: 
        source.geo.location: 
          lat: 40.7128 
          lon: -74.0060 
        source.geo.continent_name: North America 
        source.geo.country_iso_code: US 
        source.geo.region_name: New York 
        source.geo.region_iso_code: US-NY 
        source.geo.city_name: New York City 
        source.geo.name: myLocation 
      target: '' 
  - add_fields: 
      #when.network.destination.ip: 10.101.101.0/24 
      when.network.destination.ip: private 
      fields: 
        destination.geo.location: 
          lat: 40.7128 
          lon: -74.0060 
        destination.geo.continent_name: North America 
        destination.geo.country_iso_code: US 
        destination.geo.region_name: New York 
        destination.geo.region_iso_code: US-NY 
        destination.geo.city_name: New York City 
        destination.geo.name: myLocation 
      target: ''

I highly encourage you to read through each of these configuration items and define them to match your use case. Here is an overview of how these configurations enrich our data:

  • add_host_metadata allows us to collect information like IP and MAC addresses, as well as define geo information. Since we have defined geo information, we can visualize our data on a map in Kibana.
  • add_locale enriches each event with the machine’s time zone offset from UTC and adds the event.timezone value to each event.
  • add_cloud_metadata is used to enrich data from supported cloud providers. This configuration would be useful for a small business that has servers hosted by a cloud provider. In our example of running Beats on a physical workstation, no data is being added to events.
  • add_fields has two configurations in our example:
    • Check for a private source IP address (when.network.source.ip: private) and then add predefined geo data.
    • Check for a private destination IP address (when.network.destination.ip: private) and then add predefined geo data.

In the add_fields configuration, we can map our internal network by defining our known internal subnet instead of private (10.101.101.0/24 as shown in the commented example). This is useful if you don’t want to always define private addresses with your location. I personally define my known subnets instead of private on my laptop, so I don’t add geo location data when I am not at home.

Beats Elastic Cloud settings

Since we will be shipping data from Beats to our Elasticsearch Service deployment, we need to use the Elastic Cloud output along with our Cloud ID. The cloud.id is used by the respective Beat to resolve the Elasticsearch and Kibana URLs. The cloud.auth setting is used for authentication to our Elasticsearch Service deployment.

Instead of moving to the next configuration section, we’re going to add some more settings to our Elastic Cloud section before we finalize our output. We will tell each Beat to use our new geoip-info ingest pipeline by configuring the Elasticsearch output pipeline (output.elasticsearch.pipeline: geoip-info). Note that the output.elasticsearch.pipeline: geoip-info setting needs to be commented out in the configuration file until Elasticsearch GitHub issue 46193 is resolved.

We will also add the Elasticsearch output max_retries setting here (output.elasticsearch.max_retries: 5) for Beats that do not retry indefinitely. Note that in the Winlogbeat documentation, we see that Winlogbeat retries indefinitely; however, Packetbeat and Auditbeat will drop events after the specified number of retries. The default value is 3, so use caution when adjusting this setting.

For systems that will only ship data, we need to disable the required setup options so we do not have a conflict publishing events when using the beats_writer role we created in the previous blog. With that said, we are also going to add setup.template.enabled: false, setup.ilm.check_exists: false, and setup.ilm.overwrite: false to this section, but leave them commented out for now.

Now our Elastic Cloud configuration section looks like this:

#=== Elastic Cloud ===
cloud.id: “My_Elastic_Cloud_Deployment:abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ”
# For production, we should NOT use the elastic superuser
cloud.auth: “home_beats:abcDEF0987654321” # example: “username:password”
# The geoip-info pipeline is used to enrich GeoIP information in Elasticsearch
# You must configure the pipeline in Elasticsearch before enabling the pipeline in Beats.
# The `output.elasticsearch.pipeline: geoip-info` setting should be commented out until
# until GitHub issue https://github.com/elastic/elasticsearch/issues/46193 is resolved
#output.elasticsearch.pipeline: geoip-info
# The `max_retries` setting is the number of times to retry publishing an event after 
# a publishing failure. After the specified number of retries, the events are typically dropped.
output.elasticsearch.max_retries: 5
# When deploying Beats to multiple systems or locations, uncomment the following 
# setup.template.* and setup.ilm.* configurations after running the Beats setup command
# Otherwise, uncomment the following configurations for Beats that will only publish data
#setup.template.enabled: false
#setup.ilm.check_exists: false
#setup.ilm.overwrite: false

I know I’ve stated this a few times, but I highly encourage you to read through each of these configuration items and define them to match your use case.

Beats Xpack monitoring settings

Enabling monitoring gives you the ability to look at what is happening with each individual Beat in Stack Monitoring in Kibana. By setting monitoring.enabled: true, you are telling the Beat to ship monitoring data to your cluster.

#=== Xpack Monitoring === 
monitoring.enabled: true

You can look at the overall performance of the specific Beat, like this:

Winlogbeat monitoring overview 1

Winlogbeat monitoring overview - memory and CPU

Winlogbeat monitoring overview 3 - system load and open handles

We will cover one new section to finish our configuration review.

Beats queue settings

If our systems are offline or are unable to connect to our cluster, we need a way for Beats to retain events until reconnecting to our cluster. We will configure internal queue settings, in case our systems are offline or if we have an internet outage. We can choose to configure either the memory queue or the file spool queue, but only one queue type can be configured. The memory queue is useful to cache events to memory in case there is an internet outage, but the file spool queue is useful in case a system is offline. Note that for version 7.4, the file spool queue is a beta functionality.

At home, my computers are set to turn off after a specific amount of inactivity, so I will configure the file spool queue. You should configure the queue type that is appropriate for your use case. With that said, we are also going to add both the queue.mem and queue.spool settings to our configuration file, but leave them commented out for now. You need to uncomment the specific queue type you decide to use. With queue.mem commented out and queue.spool set, my configuration is:

#=== Queue ===
# See the 'Configure the internal queue' documentation for each Beat before 
# configuring the queue.  Note that only one queue type can be configured.
# You need to uncomment the specific queue type you decide to use.
# The `queue.mem` settings will cache events to memory in case access to the
# Elasticsearch cluster, via the internet, is unavailable (internet outage)
#queue.mem:
#  events: 4096
#  flush.min_events: 512
#  flush.timeout: 1s
# The `queue.spool` settings will cache events to disk in case the system is offline
# NOTE: The file spool queue is a beta functionality in 7.4
queue.spool:
  file:
    size: 512MiB
    page_size: 16KiB
    prealloc: ~
  write:
    buffer_size: 10MiB
    flush.timeout: 1s
    flush.min.events: 1024

Instead of posting our full configuration here, you can access the common Beats configuration in the SIEM-at-Home folder of the elastic/examples GitHub repo. Once again, I highly encourage you to read through each of the configuration items and define them to match your use case.

Coming up in Part 4: Data collection with Beats

Now that we have completed the prerequisites, our next steps are to collect data from our systems and network devices. We will accomplish this by installing and configuring Beats on our systems. Keep in mind that we need to retain a copy of our Beats common configuration settings from the General, Top Level Processor, Elastic Cloud, Xpack Monitoring, and Queue sections for use when we configure Beats on our other systems.

Follow along with this Elastic SIEM for home and small business blog series as we develop a powerful, yet simple, security solution at home (or for your small business). Remember that once we install and configure Beats on our systems, we can go to the SIEM app to see what data is available.

A few last things...

If you run into any issues, the first place we’d recommend turning is to our documentation. It can help with many common issues. If you still have outstanding questions, check out our Elastic forums for additional help. Or, if you want to talk to the Elastic Support team directly, you have direct access to a team of experts if you’ve deployed on Elasticsearch Service. If you are self-hosting, you can start an Elastic subscription today and have direct access to a team of experts. Be safe out there!