GeoIP processor

edit

The geoip processor adds information about the geographical location of an IPv4 or IPv6 address.

By default, the processor uses the GeoLite2 City, GeoLite2 Country, and GeoLite2 ASN GeoIP2 databases from MaxMind, shared under the CC BY-SA 4.0 license. It automatically downloads these databases if your nodes can connect to storage.googleapis.com domain and either:

  • ingest.geoip.downloader.eager.download is set to true
  • your cluster has at least one pipeline with a geoip processor

Elasticsearch automatically downloads updates for these databases from the Elastic GeoIP endpoint: https://geoip.elastic.co/v1/database. To get download statistics for these updates, use the GeoIP stats API.

If your cluster can’t connect to the Elastic GeoIP endpoint or you want to manage your own updates, see Manage your own GeoIP2 database updates.

If Elasticsearch can’t connect to the endpoint for 30 days all updated databases will become invalid. Elasticsearch will stop enriching documents with geoip data and will add tags: ["_geoip_expired_database"] field instead.

Using the geoip Processor in a Pipeline

edit

Table 22. geoip options

Name Required Default Description

field

yes

-

The field to get the ip address from for the geographical lookup.

target_field

no

geoip

The field that will hold the geographical information looked up from the MaxMind database.

database_file

no

GeoLite2-City.mmdb

The database filename referring to a database the module ships with (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or a custom database in the ingest-geoip config directory.

properties

no

[continent_name, country_iso_code, country_name, region_iso_code, region_name, city_name, location] *

Controls what properties are added to the target_field based on the geoip lookup.

ignore_missing

no

false

If true and field does not exist, the processor quietly exits without modifying the document

first_only

no

true

If true only first found geoip data will be returned, even if field contains array

download_database_on_pipeline_creation

no

true

If true (and if ingest.geoip.downloader.eager.download is false), the missing database is downloaded when the pipeline is created. Else, the download is triggered by when the pipeline is used as the default_pipeline or final_pipeline in an index.

*Depends on what is available in database_file:

  • If the GeoLite2 City database is used, then the following fields may be added under the target_field: ip, country_iso_code, country_name, continent_name, region_iso_code, region_name, city_name, timezone, latitude, longitude and location. The fields actually added depend on what has been found and which properties were configured in properties.
  • If the GeoLite2 Country database is used, then the following fields may be added under the target_field: ip, country_iso_code, country_name and continent_name. The fields actually added depend on what has been found and which properties were configured in properties.
  • If the GeoLite2 ASN database is used, then the following fields may be added under the target_field: ip, asn, organization_name and network. The fields actually added depend on what has been found and which properties were configured in properties.

Here is an example that uses the default city database and adds the geographical information to the geoip field based on the ip field:

response = client.ingest.put_pipeline(
  id: 'geoip',
  body: {
    description: 'Add geoip info',
    processors: [
      {
        geoip: {
          field: 'ip'
        }
      }
    ]
  }
)
puts response

response = client.index(
  index: 'my-index-000001',
  id: 'my_id',
  pipeline: 'geoip',
  body: {
    ip: '89.160.20.128'
  }
)
puts response

response = client.get(
  index: 'my-index-000001',
  id: 'my_id'
)
puts response
PUT _ingest/pipeline/geoip
{
  "description" : "Add geoip info",
  "processors" : [
    {
      "geoip" : {
        "field" : "ip"
      }
    }
  ]
}
PUT my-index-000001/_doc/my_id?pipeline=geoip
{
  "ip": "89.160.20.128"
}
GET my-index-000001/_doc/my_id

Which returns:

{
  "found": true,
  "_index": "my-index-000001",
  "_id": "my_id",
  "_version": 1,
  "_seq_no": 55,
  "_primary_term": 1,
  "_source": {
    "ip": "89.160.20.128",
    "geoip": {
      "continent_name": "Europe",
      "country_name": "Sweden",
      "country_iso_code": "SE",
      "city_name" : "Linköping",
      "region_iso_code" : "SE-E",
      "region_name" : "Östergötland County",
      "location": { "lat": 58.4167, "lon": 15.6167 }
    }
  }
}

Here is an example that uses the default country database and adds the geographical information to the geo field based on the ip field. Note that this database is included in the module. So this:

response = client.ingest.put_pipeline(
  id: 'geoip',
  body: {
    description: 'Add geoip info',
    processors: [
      {
        geoip: {
          field: 'ip',
          target_field: 'geo',
          database_file: 'GeoLite2-Country.mmdb'
        }
      }
    ]
  }
)
puts response

response = client.index(
  index: 'my-index-000001',
  id: 'my_id',
  pipeline: 'geoip',
  body: {
    ip: '89.160.20.128'
  }
)
puts response

response = client.get(
  index: 'my-index-000001',
  id: 'my_id'
)
puts response
PUT _ingest/pipeline/geoip
{
  "description" : "Add geoip info",
  "processors" : [
    {
      "geoip" : {
        "field" : "ip",
        "target_field" : "geo",
        "database_file" : "GeoLite2-Country.mmdb"
      }
    }
  ]
}
PUT my-index-000001/_doc/my_id?pipeline=geoip
{
  "ip": "89.160.20.128"
}
GET my-index-000001/_doc/my_id

returns this:

{
  "found": true,
  "_index": "my-index-000001",
  "_id": "my_id",
  "_version": 1,
  "_seq_no": 65,
  "_primary_term": 1,
  "_source": {
    "ip": "89.160.20.128",
    "geo": {
      "continent_name": "Europe",
      "country_name": "Sweden",
      "country_iso_code": "SE"
    }
  }
}

Not all IP addresses find geo information from the database, When this occurs, no target_field is inserted into the document.

Here is an example of what documents will be indexed as when information for "80.231.5.0" cannot be found:

response = client.ingest.put_pipeline(
  id: 'geoip',
  body: {
    description: 'Add geoip info',
    processors: [
      {
        geoip: {
          field: 'ip'
        }
      }
    ]
  }
)
puts response

response = client.index(
  index: 'my-index-000001',
  id: 'my_id',
  pipeline: 'geoip',
  body: {
    ip: '80.231.5.0'
  }
)
puts response

response = client.get(
  index: 'my-index-000001',
  id: 'my_id'
)
puts response
PUT _ingest/pipeline/geoip
{
  "description" : "Add geoip info",
  "processors" : [
    {
      "geoip" : {
        "field" : "ip"
      }
    }
  ]
}

PUT my-index-000001/_doc/my_id?pipeline=geoip
{
  "ip": "80.231.5.0"
}

GET my-index-000001/_doc/my_id

Which returns:

{
  "_index" : "my-index-000001",
  "_id" : "my_id",
  "_version" : 1,
  "_seq_no" : 71,
  "_primary_term": 1,
  "found" : true,
  "_source" : {
    "ip" : "80.231.5.0"
  }
}

Recognizing Location as a Geopoint

edit

Although this processor enriches your document with a location field containing the estimated latitude and longitude of the IP address, this field will not be indexed as a geo_point type in Elasticsearch without explicitly defining it as such in the mapping.

You can use the following mapping for the example index above:

response = client.indices.create(
  index: 'my_ip_locations',
  body: {
    mappings: {
      properties: {
        geoip: {
          properties: {
            location: {
              type: 'geo_point'
            }
          }
        }
      }
    }
  }
)
puts response
PUT my_ip_locations
{
  "mappings": {
    "properties": {
      "geoip": {
        "properties": {
          "location": { "type": "geo_point" }
        }
      }
    }
  }
}

Manage your own GeoIP2 database updates

edit

If you can’t automatically update your GeoIP2 databases from the Elastic endpoint, you have a few other options:

Use a reverse proxy endpoint

If you can’t connect directly to the Elastic GeoIP endpoint, consider setting up a secure reverse proxy. You can then specify the reverse proxy endpoint URL in the ingest.geoip.downloader.endpoint setting of each node’s elasticsearch.yml file.

In a strict setup the following domains may need to be added to the allowed domains list:

  • geoip.elastic.co
  • storage.googleapis.com

Use a custom endpoint

You can create a service that mimics the Elastic GeoIP endpoint. You can then get automatic updates from this service.

  1. Download your .mmdb database files from the MaxMind site.
  2. Copy your database files to a single directory.
  3. From your Elasticsearch directory, run:

    ./bin/elasticsearch-geoip -s my/source/dir [-t target/directory]
  4. Serve the static database files from your directory. For example, you can use Docker to serve the files from an nginx server:

    docker run -v my/source/dir:/usr/share/nginx/html:ro nginx
  5. Specify the service’s endpoint URL in the ingest.geoip.downloader.endpoint setting of each node’s elasticsearch.yml file.

    By default, Elasticsearch checks the endpoint for updates every three days. To use another polling interval, use the cluster update settings API to set ingest.geoip.downloader.poll.interval.

Manually update your GeoIP2 databases

  1. Use the cluster update settings API to set ingest.geoip.downloader.enabled to false. This disables automatic updates that may overwrite your database changes. This also deletes all downloaded databases.
  2. Download your .mmdb database files from the MaxMind site.

    You can also use custom city, country, and ASN .mmdb files. These files must be uncompressed. The type (city, country, or ASN) will be pulled from the file metadata, so the filename does not matter.

  3. On Elasticsearch Service deployments upload database using a custom bundle.
  4. On self-managed deployments copy the database files to $ES_CONFIG/ingest-geoip.
  5. In your geoip processors, configure the database_file parameter to use a custom database file.

Node Settings

edit

The geoip processor supports the following setting:

ingest.geoip.cache_size
The maximum number of results that should be cached. Defaults to 1000.

Note that these settings are node settings and apply to all geoip processors, i.e. there is one cache for all defined geoip processors.

Cluster settings

edit
ingest.geoip.downloader.enabled
(Dynamic, Boolean) If true, Elasticsearch automatically downloads and manages updates for GeoIP2 databases from the ingest.geoip.downloader.endpoint. If false, Elasticsearch does not download updates and deletes all downloaded databases. Defaults to true.
ingest.geoip.downloader.eager.download
(Dynamic, Boolean) If true, Elasticsearch downloads GeoIP2 databases immediately, regardless of whether a pipeline exists with a geoip processor. If false, Elasticsearch only begins downloading the databases if a pipeline with a geoip processor exists or is added. Defaults to false.
ingest.geoip.downloader.endpoint
(Static, string) Endpoint URL used to download updates for GeoIP2 databases. For example, https://myDomain.com/overview.json. Defaults to https://geoip.elastic.co/v1/database. Elasticsearch stores downloaded database files in each node’s temporary directory at $ES_TMPDIR/geoip-databases/<node_id>. Note that Elasticsearch will make a GET request to ${ingest.geoip.downloader.endpoint}?elastic_geoip_service_tos=agree, expecting the list of metadata about databases typically found in overview.json.

The GeoIP downloader uses the JDK’s builtin cacerts. If you’re using a custom endpoint, add the custom https endpoint cacert(s) to the JDK’s truststore.

ingest.geoip.downloader.poll.interval
(Dynamic, time value) How often Elasticsearch checks for GeoIP2 database updates at the ingest.geoip.downloader.endpoint. Must be greater than 1d (one day). Defaults to 3d (three days).