Geospatial search with ES|QL

Elasticsearch has had powerful geospatial search and analytics capabilities for many years, but the API was quite different from what typical GIS users were used to. In the past year we've added the ES|QL query language, a piped query language as easy, or even easier, than SQL. It's particularly suited to the search, security, and observability use cases Elastic excels at. We're also adding support for geospatial search and analytics within ES|QL, making it far easier to use, especially for users coming from SQL or GIS communities.

Elasticsearch 8.12 and 8.13 brought basic support for geospatial types to ES|QL. This was dramatically enhanced with the addition of geospatial search capabilities in 8.14. More importantly, this support was designed to conform closely to the Simple Feature Access standard from the Open Geospatial Consortium (OGC) used by other spatial databases like PostGIS, making it much easier to use for GIS experts familiar with these standards.

In this blog, we'll show you how to use ES|QL to perform geospatial searches, and how it compares to the SQL and Query DSL equivalents. We'll also show you how to use ES|QL to perform spatial joins, and how to visualize the results in Kibana Maps. Note that all the features described here are in "technical preview", and we'd love to hear your feedback on how we can improve them.

Searching for geospatial data

Let's start with an example query:

FROM airport_city_boundaries
| WHERE ST_INTERSECTS(
      city_boundary,
      "POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))"::geo_shape
  )
| KEEP abbrev, airport, region, city, city_location

This performs a search for any city boundary polygons that intersect with a rectangular search polygon around the Sanya Phoenix International Airport (SYX).

ESQL Geospatial Search

In a sample dataset of airports, cities and city boundaries, this search finds the intersecting polygon and returns the desired fields from the matching document:

abbrevairportregioncitycity_location
SYXSanya Phoenix Int'l天涯区SanyaPOINT(109.5036 18.2533)

That was easy! Now compare this to the classic Elasticsearch Query DSL for the same query:

GET /airport_city_boundaries/_search
{
  "_source": ["abbrev", "airport", "region", "city", "city_location"],
  "query": {
    "geo_shape": {
      "city_boundary": {
        "shape": {
          "type": "polygon",
          "coordinates" : [[
            [109.4, 18.1],
            [109.6, 18.1],
            [109.6, 18.3],
            [109.4, 18.3],
            [109.4, 18.1]
          ]]
        }
      }
    }
  }
}

Both queries are reasonably clear in their intent, but the ES|QL query closely resembles SQL. The same query in PostGIS looks like this:

SELECT abbrev, airport, region, city, city_location
FROM airport_city_boundaries
WHERE ST_INTERSECTS(
    city_boundary,
    'SRID=4326;POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))'::geometry
);

Look back at the ES|QL example. So similar, right?

FROM airport_city_boundaries
| WHERE ST_INTERSECTS(
      city_boundary,
      "POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))"::geo_shape
  )
| KEEP abbrev, airport, region, city, city_location

We've found that existing users of the Elasticsearch API find ES|QL much easier to use. We now expect that existing SQL users, particularly Spatial SQL users, will find that ES|QL feels very familiar to what they are used to seeing.

Why not SQL?

What about Elasticsearch SQL? It has been around for a while and has some geospatial features. However, Elasticsearch SQL was written as a wrapper on top of the original Query API, which meant only queries that could be transpiled down to the original API were supported. ES|QL does not have this limitation. Being a completely new stack allows for many optimizations that were not possible in SQL. Our benchmarks show ES|QL is very often faster than the Query API, particularly with aggregations!

polygon-intersection-benchmark

Differences to SQL

Clearly, from the previous example, ES|QL is somewhat similar to SQL, but there are some important differences. For example, ES|QL is a piped query language, starting with a source command like FROM and then chaining all subsequent commands together with the pipe | character. This makes it very easy to understand how each command receives a table of data and performs some action on that table, such as filtering with WHERE, adding columns with EVAL, or performing aggregations with STATS. Rather than starting with SELECT to define the final output columns, there can be one or more KEEP commands, with the last one specifying the final output results. This structure simplifies reasoning about the query.

Focusing in on the WHERE command in the above example, we can see it looks quite similar to the PostGIS example:

ES|QL

WHERE ST_INTERSECTS(
    city_boundary,
    "POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))"::geo_shape
)

PostGIS

WHERE ST_INTERSECTS(
    city_boundary,
    'SRID=4326;POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))'::geometry
)

Aside from the difference in string quotation characters, the biggest difference is in how we type-cast the string to a spatial type. In PostGIS, we use the ::geometry suffix, while in ES|QL, we use the ::geo_shape suffix. This is because ES|QL runs within Elasticsearch, and the type-casting operator :: can be used to convert a string to any of the supported ES|QL types, in this case, a geo_shape. Additionally, the geo_shape and geo_point types in Elasticsearch imply the spatial coordinate system known as WGS84, more commonly referred to using the SRID number 4326. In PostGIS, this needs to be explicit, hence the use of the SRID=4326; prefix to the WKT string. If that prefix is removed, the SRID will be set to 0, which is more like the Elasticsearch types cartesian_point and cartesian_shape, which are not tied to any specific coordinate system.

Both ES|QL and PostGIS provide type conversion function syntax as well:

ES|QL

WHERE ST_INTERSECTS(
    city_boundary,
    TO_GEOSHAPE("POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))")
)

PostGIS

WHERE ST_INTERSECTS(
    city_boundary,
    ST_SetSRID(
      ST_GeomFromText('POLYGON((109.4 18.1, 109.6 18.1, 109.6 18.3, 109.4 18.3, 109.4 18.1))'),
      4326
    )
)

OGC Functions

Elasticsearch 8.14 introduces the following four OGC spatial search functions:

ES|QLPostGISDescription
ST_INTERSECTSST_IntersectsReturns true if two geometries intersect, and false otherwise.
ST_DISJOINTST_DisjointReturns true if two geometries do not intersect, and false otherwise. The inverse of ST_INTERSECTS.
ST_CONTAINSST_ContainsReturns true if one geometry contains another, and false otherwise.
ST_WITHINST_WithinReturns true if one geometry is within another, and false otherwise. The inverse of ST_CONTAINS.

These function behave similarly to their PostGIS counterparts, and are used in the same way. For example, ST_INTERSECTS returns true if two geometries intersect and false otherwise. If you follow the documentation links in the above table, you might notice that all the ES|QL examples are within a WHERE clause after a FROM clause, while all the PostGIS examples are using literal geometries. In fact, both platforms support using the functions in any part of the query where they make sense.

The first example in the PostGIS documentation for ST_INTERSECTS is:

SELECT ST_Intersects(
    'POINT(0 0)'::geometry,
    'LINESTRING ( 2 0, 0 2 )'::geometry
);

The ES|QL equivalent of this would be:

ROW ST_INTERSECTS(
    "POINT(0 0)"::geo_point,
    "LINESTRING ( 2 0, 0 2 )"::geo_shape
)

Note how we did not specify the SRID in the PostGIS example. This is because in PostGIS when using the geometry type, all calculations are done on a planar coordinate system, and so if both geometries have the same SRID, it does not matter what the SRID is. In Elasticsearch, this is also true for most functions, however, there are exceptions where geo_shape and geo_point use spherical calculations, as we'll see in the next blog about spatial distance search.

ES|QL Versatility

So, we've seen examples above for using spatial functions in WHERE clauses, and in ROW commands. Where else would they make sense? One very useful place is in the EVAL command. This command allows you to evaluate an expression and return the result. For example, let's determine if the centroids of all airports grouped by their country names are within a boundary outlining the country:

FROM airports
| EVAL in_uk = ST_INTERSECTS(location, TO_GEOSHAPE("POLYGON((1.2305 60.8449, -1.582 61.6899, -10.7227 58.4017, -7.1191 55.3291, -7.9102 54.2139, -5.4492 54.0078, -5.2734 52.3756, -7.8223 49.6676, -5.0977 49.2678, 0.9668 50.5134, 2.5488 52.1065, 2.6367 54.0078, -0.9668 56.4625, 1.2305 60.8449))"))
| EVAL in_iceland = ST_INTERSECTS(location, TO_GEOSHAPE("POLYGON ((-25.4883 65.5312, -23.4668 66.7746, -18.4131 67.4749, -13.0957 66.2669, -12.3926 64.4159, -20.1270 62.7346, -24.7852 63.3718, -25.4883 65.5312))"))
| EVAL within_uk = ST_WITHIN(location, TO_GEOSHAPE("POLYGON((1.2305 60.8449, -1.582 61.6899, -10.7227 58.4017, -7.1191 55.3291, -7.9102 54.2139, -5.4492 54.0078, -5.2734 52.3756, -7.8223 49.6676, -5.0977 49.2678, 0.9668 50.5134, 2.5488 52.1065, 2.6367 54.0078, -0.9668 56.4625, 1.2305 60.8449))"))
| EVAL within_iceland = ST_WITHIN(location, TO_GEOSHAPE("POLYGON ((-25.4883 65.5312, -23.4668 66.7746, -18.4131 67.4749, -13.0957 66.2669, -12.3926 64.4159, -20.1270 62.7346, -24.7852 63.3718, -25.4883 65.5312))"))
| STATS centroid = ST_CENTROID_AGG(location), count=COUNT() BY in_uk, in_iceland, within_uk, within_iceland
| SORT count ASC

The results are expected, the centroid of UK airports are within the UK boundary, and not within the Iceland boundary, and vice versa:

centroidcountin_ukin_icelandwithin_ukwithin_iceland
POINT (-21.946634463965893 64.13187285885215)1falsetruefalsetrue
POINT (-2.597342072712148 54.33551226578214)17truefalsetruefalse
POINT (0.04453958108176276 23.74658354606057)873falsefalsefalsefalse

In fact, these functions can be used in any part of the query where their signature makes sense. They all take two arguments, which are either a literal spatial object or a field of a spatial type, and they all return a boolean value. One important consideration is that the coordinate reference system (CRS) of the geometries must match, or an error will be returned. This means you cannot mix geo_shape and cartesian_shape types in the same function call. You can, however, mix geo_point and geo_shape types, as the geo_point type is a special case of the geo_shape type, and both share the same coordinate reference system. The documentation for each of the functions defined above lists the supported type combinations.

Additionally, either argument can be a spatial literal or a field, in either order. You can even specify two fields, two literals, a field and a literal, or a literal and a field. The only requirement is that the types are compatible. For example, this query compares two fields in the same index:

FROM airport_city_boundaries
| EVAL in_city = ST_INTERSECTS(city_location, city_boundary)
| STATS count=COUNT(*) BY in_city
| SORT count ASC
| EVAL cardinality = CASE(count < 10, "very few", count < 100, "few", "many")
| KEEP cardinality, count, in_city

The query basically asks if the city location is within the city boundary, which should generally be true, but there are always exceptions:

cardinalitycountin_city
few29false
many740true

A far more interesting question would be whether the airport location is within the boundary of the city that the airport serves. However, the airport location resides in a different index than the one containing the city boundaries. This requires a method to effectively query and correlate data from these two separate indexes.

Spatial joins

ES|QL does not support JOIN commands, but you can achieve a special case of a join using the ENRICH command, which behaves similarly to a 'left join' in SQL. This command operates akin to a 'left join' in SQL, allowing you to enrich results from one index with data from another index based on a spatial relationship between the two datasets.

For example, let's enrich the results from a table of airports with additional information about the city they serve by finding the city boundary that contains the airport location, and then perform some statistics on the results:

FROM airports
| ENRICH city_boundaries ON city_location WITH airport, region, city_boundary
| MV_EXPAND city_boundary
| EVAL boundary_wkt_length = LENGTH(TO_STRING(city_boundary))
| STATS centroid = ST_CENTROID_AGG(location), count = COUNT(city_location), min_wkt = MIN(boundary_wkt_length), max_wkt = MAX(boundary_wkt_length) BY region
| SORT count DESC
| LIMIT 5

This returns the top 5 regions with the most airports, along with the centroid of all the airports that have matching regions, and the range in length of the WKT representation of the city boundaries within those regions:

centroidcountmin_wktmax_wktregion
POINT (-32.56093470960719 32.598117914802714)90207207null
POINT (-73.94515332765877 40.70366442203522)9438438City of New York
POINT (-83.10398317873478 42.300230911932886)9473473Detroit
POINT (-156.3020245861262 20.176383580081165)5307803Hawaii
POINT (-73.88902732171118 45.57078813901171)4837837Montréal

So, what really happened here? Where did the supposed JOIN occur? The crux of the query lies in the ENRICH command:

FROM airports
| ENRICH city_boundaries ON city_location WITH airport, region, city_boundary

This command instructs Elasticsearch to enrich the results retrieved from the airports index, and perform an intersects join between the city_location field of the original index, and the city_boundary field of the airport_city_boundaries index, which we used in a few examples earlier. But some of this information is not clearly visible in this query. What we do see is the name of an enrich policy city_boundaries, and the missing information is encapsulated within that policy definition.

{
  "geo_match": {
    "indices": "airport_city_boundaries",
    "match_field": "city_boundary",
    "enrich_fields": ["city", "airport", "region", "city_boundary"]
  }
}

Here we can see that it will perform a geo_match query (intersects is the default), the field to match against is city_boundary, and the enrich_fields are the fields we want to add to the original document. One of those fields, the region was actually used as the grouping key for the STATS command, something we could not have done without this 'left join' capability. For more information on enrich policies, see the enrich documentation. While reading those documents, you will notice that they describe using the enrich indexes for enriching data at index time, by configuring ingest pipelines. This is not required for ES|QL, as the ENRICH command works at query time. It is sufficient to prepare the enrich index with the necessary data and enrich policy, and then use the ENRICH command in your ES|QL queries.

You may also notice that the most commonly found region was null. What could this imply? Recall that I likened this command to a 'left join' in SQL, meaning if no matching city boundary is found for an airport, the airport is still returned but with null values for the fields from the airport_city_boundaries index. It turns out there were 89 airports that found no matching city_boundary, and one airport with a match where the region field was null. This lead to a count of 90 airports with no region in the results. Another interesting detail is the need for the MV_EXPAND command. This is necessary because the ENRICH command may return multiple results for each input row, and MV_EXPAND helps to separate these results into multiple rows, one for each outcome. This also clarifies why "Hawaii" shows different min_wkt and max_wkt results: there were multiple regions with the same name but different boundaries.

Kibana Maps

Kibana has added support for Spatial ES|QL in the Maps application. This means that you can now use ES|QL to search for geospatial data in Elasticsearch, and visualize the results on a map.

Kibana Layers ES|QL

There is a new layer option in the add layers menu, called "ES|QL". Like all of the geospatial features described so far, this is in "technical preview". Selecting this option allows you to add a layer to the map based on the results of an ES|QL query. For example, you could add a layer to the map that shows all the airports in the world.

Kibana ES|QL - Airports

Or you could add a layer that shows the polygons from the airport_city_boundaries index, or even better, how about that complex ENRICH query above that generates statistics for how many airports are in each region?

Kibana ES|QL - Region Statistics

What's next

You might have noticed in two of the examples above we squeezed in yet another spatial function ST_CENTROID_AGG. This is an aggregating function used in the STATS command, and the first of many spatial analytics features we plan to add to ES|QL. We'll blog about it when we've got more to show!

Before that, we want to tell you more about a particularly exciting feature we've worked on: the ability to perform spatial distance searches, one of the most used spatial search features of Elasticsearch. Can you imagine what the syntax for distance searches might look like? Perhaps similar to an OGC function? Stay tuned for the next blog in this series to find out!

Spoiler alert: Elasticsearch 8.15 has just been released, and spatial distance search with ES|QL is included!

Ready to try this out on your own? Start a free trial.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

12 min read

Recommended Articles