Elastic Earth Science for Global Monitoring
Our living planet is continuously monitored by a growing number of Earth observation satellites that produce terabytes of data daily. Europe is taking a lead role in the Earth Observation mission and is stimulating the use of this large amount of public and private satellite data.
At Terradue we are developing platforms and applications for researchers and practitioners in Earth Science to help them extract information from these massive amounts of data. We are also bringing the community together with the European Space Agency to serve all the major research institutes in Europe, and aid international cooperation abroad.
Hereafter, we share a look at a specific domain, the geohazards, for which Terradue developed a query engine powered by Elasticsearch aimed at selecting the best satellite acquisitions for the InSAR application, a technique for mapping ground deformation using radar images of the Earth's surface.
In a few words, InSAR (Interferometric Synthetic Aperture Radar) consists in comparing phase information from images taken at different times, by the radar instrument on board of a satellite. This technique can measure the smallest terrestrial displacements down to a centimeter!
Not clear enough?
© DLR/EOC
The above image is a 3D view of an area near the boundary of the Indian and Eurasian tectonic plates over Nepal. The colored zone corresponds to displacement of the Earth ground after the 7.8 magnitude earthquake that struck Nepal on 25 April 2015.
To obtain this result, SAR experts compared two satellite images using a complex processing technique that depends on one crucial step - data selection. One image was taken just after the earthquake (post-event) and one before the earthquake (pre-event). The latter one was not selected randomly among the large archive of satellite acquisitions, but must be chosen based on analysing and minimizing the “baseline” between the two acquisitions. Because a satellite flies an orbital pattern the sensor is at a different position every time it images the same area. This is how SAR imagery is collected from slightly different viewing angles.
© U.S. Geological Survey
This is where Terradue's Elasticsearch based query engine is doing a fantastic job. We have thousands of archived satellite SAR imagery collected for any area at very specific times and positions in space. Based on this metadata we compute all possible pair combinations with the post-event image. This is done in near real-time of course because our users don't like to wait!
Metadata indexation
To be able to deliver this data search at a global scale, we record the metadata of all the acquisitions of a given satellite in near real time. This is a process that fetches from official servers the information about the scenes newly acquired by the satellites. For every single image, it indexes the following documents in Elasticsearch:
- One document with properties of the image itself, e.g. acquisition date, polygon defining the area and many more.
- Several documents representing the orbit state vectors, that define the position of the satellite in space and its velocity. Those vectors are sampled every 10 seconds for 2 minutes by acquisition, so there are around 12 documents per image.
- Several documents representing the tie-points grid which link a subsampled pixel of the image to coordinates (geopoints) on earth, as well as the slant range time necessary to compute the baseline. A typical image is gridded by about 250 tie points.
As of today, the magnitude of the indexed documents for the Copernicus Sentinel-1A satellite for which we already recorded more than 12 months of data, is:
- 250.000+ documents for the images
- 5.000.000+ documents for the orbit state vectors
- 70.000.000+ documents for the tie points
Knowing that the lifetime of such satellite is seven years, with consumables for up to twelve years, and that it now operates with a twin satellite, Sentinel-1B, that will deliver new data in the coming weeks, too… and doing the math, it sizes as a big challenge!
Searching
Now that we have the material, let’s do the query.
As introduced previously, we have to calculate the perpendicular baseline (Bp in the above figure) between two satellite acquisitions for every possible pair combination between one post-event image and several pre-event images. The figure below illustrates the calculations to be made concurrently in the system in order to extract the best pairs with the shortest baseline in time (in blue).
© Zhu, S.; Xu, C.; Wen, Y.; Liu, Y. Interseismic Deformation of the Altyn Tagh Fault Determined by Interferometric Synthetic Aperture Radar (InSAR) Measurements. Remote Sens. 2016, 8, 233
Each line represents the baseline value between two acquisitions in time and requires computing mathematical interpolations including dozens of orbit state vectors and tie points to obtain this value.
As a consequence, a pair calculation executes the following queries in Elasticsearch:
- 2 documents matching the spatial and time filters and eventually more narrowing the satellite acquisitions to search for.
- ~100 orbit state vectors (50 for each scenes) corresponding to the exact time the satellite was acquiring the scenes found before.
- ~20 tie points (10 for each scenes) corresponding to the footprint of the scene on the ground.
We also apply some post-query filtering applied to the pairs to discard the wrong candidates such as the percentage of land coverage or the percentage of overlapping area between the 2 scenes. This is a step where we have room for improvements, especially using embedded functions in Elasticsearch.
Currently, on our Elasticsearch cluster and with an average of 50 pairs computed concurrently (queries ~200ms + post filtering ~700ms + cubic spline interpolation ~600ms), we achieve the computation of more than 10 pairs per second!
This kind of search was applied after the recent major earthquake in Japan (Kuamamoto) to identify the best pair to generate the interferogram of the dramatic event. You can watch the screencast of the processing of this event here.
It is a big challenge to ensure that massive data streams can be exploited to their best potential, making the information they embed fully accessible and usable. Elasticsearch helps us to address this challenge in a very flexible and powerful way.
The Elasticsearch technology allows us to achieve important corporate goals in terms of our contribution to the amazing work done by the geohazards scientific community, and also in terms of our current efforts to serve other domains of earth science.
We hope this story engaged your interest and we invite you to visit our website to discover more about our activities.
Terradue was founded as an ESA spin-off in 2006 and celebrated its 10th anniversary this year. We are proud to deliver web and cloud computing technologies services for massive processing of Earth sciences data through our Terradue Cloud Platform solutions to the Earth science communities. Terradue Cloud Platform needs to scale up quickly and efficiently using proven technologies such as Elasticsearch. We have more applications for Elasticsearch in the pipeline and we are looking forward to telling you another success story soon!
Emmanuel Mathot is technical leader at Terradue in charge of the Cloud and Exploitation Platforms development. After a master degree in computer science master at UCL in Belgium, he headed to Italy in Rome as a long term mission consultant for European Space Agency for ground segment applications in GRID distributed systems. For the last 10 years in space industry facing the new challenges of the earth science community, he enjoys building the innovative science exploitation platforms using new technologies of big data and massive processing. twitter.com/emmanuelmathot // github.com/emmanuelmathot