Vincent du Sordet

Kibana: How to create impactful visualisations with magic formulas ? (part 1)

We will see how magic math formulas in the Kibana Lens editor can help to highlight high values.

14 min read
Kibana: How to create impactful visualisations with magic formulas ? (part 1)

Kibana: How to create impactful visualizations with magic formulas? (part 1)

Introduction

In the previous blog post, Designing Intuitive Kibana Dashboards as a non-designer, we highlighted the importance of creating intuitive dashboards. It demonstrated how simple changes (grouping themes, changing type charts, and more) can make a difference in understanding your data. When delivering courses like Data Analysis with Kibana or Elastic Observability Engineer courses, we emphasize this blog post and how these changes help bring essential information to the surface. I like a complementary approach to reach this goal: using two colors to separate the highest data values from the common ones.

To illustrate this idea, we will use the Sample flight data dataset. Now, let’s compare two visualizations ranking the top 10 destination countries per total number of flights. Which visualization has a higher impact?

If you chose the second one, you may be wondering how this was done with the Kibana Lens editor. While preparing for the certification last year, I found a way to achieve this result. The secret is using two different layers and some magic formulas. This post will explain how math in Lens formulas helps create two data-color visualizations.

We will start with the first example that emphasizes only the highest value of the dataset we are focusing on. The second example describes how to highlight other high values (as shown in the illustration above).

[Note: the tips explained in this blog post can be applied from v 7.15]

Only the highest value

To understand how math helps to separate high values from common ones, let’s start with this first example: emphasizing only the highest value.

We start with a bar horizontal chart:

We need to identify the highest value of the scope we are currently examining. We will use one proper overall_* function: the overall_max(), a pipeline function (equivalent to a pipeline aggregation in Query DSL). 

In our example, we group the flights by country(destination). This means we count the number of flights for each DestCountry (= 1 bucket). The overall_max() will select which bucket has the highest value. 

The math trick here is to divide the number of flights per bucket by the maximum value found among all buckets. Only one bucket will return 1: the bucket matching the max value found by overall_max(). All the other buckets will return a value < 1 and >0. We use floor() to ensure any 0.xxx values are rounded to 0. 

Now, we can multiple it with a count() and we have our formula for the 1st layer!

Layer 1:

count()*floor(count()/overall_max(count()))

From here, in Lens Editor, we duplicate the layer to adjust the formula of the second layer containing the rest of the data. We need to append another count() followed by the minus operator to the formula. This is the other trick. In this layer, we just need to ensure the highest value is not represented, which will happen only once. It is when count() = overall_max(), which is = 1 when we divide them.

Layer 2:

count() - count()*floor(count()/overall_max(count()))

To achieve a nice merge of these two layers, we need to do the following adjustments in both:

  • select bar horizontal stacked

  • Vertical axis: change”Rank by” to Custom and ensure Rank function is “Count”

Here is the final setup of the two layers:

Layer 1:

count()*floor(count()/overall_max(count()))

Layer 2:

count() - count()*floor(count()/overall_max(count()))

This visualization also works well for time series data where you need to quickly highlight which time period (12h in the example below) had the highest number of flights:

Above the surface

Building on what we have done earlier, we can extend the approach to get other high values above the surface. Let’s see which formula we used to create the visualization in the introduction:

For this visualization, we used a property of the round() function. This function brings in only the values greater than 50% of the highest value.

Let's duplicate our first visualization and swap out the floor() function with round().

Layer 1:

count()*round(count()/overall_max(count()))

Layer 2:

count() - count()*round(count()/overall_max(count()))

It was an easy fix.
What if we want to extend the first layer further by adding more high values?
For instance, we would like all the values above the average.

To do this, we use overall_average() as a new reference value instead of the overall_max () reference to separate the eligible values in Layer 1.

As we are comparing against the average value among all the buckets, the division might return values greater than 1.

Here, the clamp() function nicely solves this issue. 

According to the formula reference, clamp() "limits the value from a minimum to maximum". Combining clamp() and floor() ensures that there are only two possible output values: either the minimum value ( 0 ) or the maximum value ( 1 ) given as parameters.

Applied to our flights dataset, it highlights the country destinations that have more flights than the average:

Layer 1:

count()*clamp(floor(count()/overall_average(count())),0,1)

Layer 2:

count() - count()*clamp(floor(count()/overall_average(count())),0,1)

It also opens up options for using other dynamic references. For instance, we could place all the values greater than 60% of the highest above the surface ( >

0.6*overall_max(count())
). We can tune our formula as follow: 


count()*clamp(floor(count()/(0.6*overall_max(count()) ) ),0,1)

Conclusion

In the first part, we have seen the main tips allowing us to create a two-color histogram:

  • Two layers: one for the highest value and one for the remaining values

  • Visualization type: bar horizontal/vertical stacked

  • To separate the data we use a formula where only the highest value return 1 otherwise 0

 

Then in the second part, we have seen how we can extend this principle to embrace more high values above the surface. This approach can be summarized as follows:

  • Start with layer 1 focusing on the high value: count()*<formula returning 0 or 1>

  • Duplicate the layer and adjust the formula:
    ( count() - count()*<formula returning 0 or 1>)

Finally, we provide 4 generic formulas that are ready to use to spice up your dashboards:

1. Only the highest
Layer 1
count()*floor(count()/overall_max(count()))
Layer 2
count() - count()*floor(count()/overall_max(count()))
2.1. Above the surface : high values (above 50% of the max value)
Layer 1
count()*floor(count()/overall_max(count()))
Layer 2
count() - count()*floor(count()/overall_max(count()))
2.2. Above the surface : all values above the overall average
Layer 1
count()*clamp(floor(count()/overall_average(count())),0,1)
Layer 2
count() - count()*clamp(floor(count()/overall_average(count())),0,1)
2.2. Above the surface : all the values greater than 60% of the highest
Layer 1
count()*clamp(floor(count()/(0.6*overall_max(count()) ) ),0,1)
Layer 2
count() - count()*clamp(floor(count()/(0.6*overall_max(count()) ) ),0,1)

Try these examples out for yourself by signing up for a free trial of Elastic Cloud or download the self-managed version of the Elastic Stack for free. If you have additional questions about getting started, head on over to the Kibana forum or check out the Kibana documentation guide.
In the next blog post, we will see how the new function ifelse() (introduced in version 8.6) will greatly simplify the creation of visualizations with more advanced formulas.

References:

Share this article