Jan 22, 2017 / ELASTICSEARCH, ELASTIC, KIBANA, BIG DATA

Exploring data sets with Kibana

In this post, I’d like to explore a sample data set using Kibana.

This requires some data to start with: let’s index some tweets. It’s quite straightforward to achieve that by following explanations found in my good friend David’s blog post and wait for some time to fill the index with data.

Basic metric

Let’s start with something basic, the number of tweets indexed so far.

In Kibana, go to Visualize Metric, then choose the twitter index. For the Aggregation field, choose "Count"; then click on Save and name the visualization accordingly e.g. "Number of tweets".

Create a basic metric Display the number of tweets

Geo-map

Another simple visualization is to display the tweets based on their location on a world map.

In Kibana, go to Visualize Tile map, then choose the twitter index.

Select Geo Coordinates for the bucket type and keep default values,Geohash for Aggregation and coordinates.coordinates for Field.

Localized map of tweets

Bucket metric

For this kind of metric, suppose a business requirement is to display the top 5 users. Unfortunately, as some (most?) business requirements go, this is not deterministic enough. It misses both the range and the aggregation period. Let’s agree for range time to be a sliding window over the last day, and the period to be an hour.

In Kibana, go to Visualize Vertical bar chart, then choose the twitter index. Then:

For the Y-Axis, keep Count for the Aggregation field
Choose X-Axis for the buckets type
- Select Date histogram for the Aggregation field
- Keep the value @timestamp for the Field field
- Set the Interval field to Hourly
Click on Add sub-buckets
Choose Split bars for the buckets type
- Select Terms for the Sub Aggregation field
- user.screen.name for the Field field
- Keep the other fields default value
Don’t forget to click on the Apply changes
Click on Save and name the visualization accordingly e.g. "Top 5 users hourly".

Create a bucket metric Display the top 5 users hourly

Equivalent visualisations

Other visualizations can be used with the exact same configuration: Area chart and Data table.

The output of the Area chart is not as readable, regarding the explored data set, but the Data table offers interesting options.

From a visualization, click on the bottom right arrow icon to display a table view of the data instead of a graphic.

Alternative tabular metric display

Visualizations make use of Elasticsearch public API. From the tabular view, the JSON request can also be displayed by clicking on the Request button (oh, surprise…). This way, Kibana can be used as a playground to quickly prototype requests before using them in one’s own applications.

Executed API request

Changing requirements a bit

The above visualization picks out the 5 top users having the most tweeted during each hour and display them during the last day. That’s the reason why there are more than 5 users displayed. But the above requirement can be interpreted in another way: take the top 5 users over the course of the last day, and break their number of tweets by hour.

To do that, just move the X-Axis bucket below the Split bars bucket. This will change the output accordingly.

Display the top 5 users over the last day

Filtering irrelevant data

As can be seen in the above histogram, top users mostly are about recruiting and/or job offers. This is not really what is wanted in the first place. It’s possible to remove this noise by adding a filter: in the Split bars section, click on Advanced to display additional parameters and type the desired regex in the Exclude field.

Filter out a bucket metric

The new visualization is quite different:

Display the top 5 users hourly without any recruitment-related user

Putting it all together

With the above visualizations available and configured, it’s time to put them together on a dedicated dashboard. Go to Dashboard Add to list all available visualizations.

Add visualizations to a dashboard

It’s as simple as clicking on the desired one, laying it out on the board and resetting its size. Rinse and repeat until happy with the result and then click on Save.

Icing on the cake, using the Rectangle tool on the map visualization will automatically add a filter that only displays data bound by the rectangle coordinates for all visualizations found on the dashboard.

That trick is not limited to the map visualization (try playing with other ones) but filtering on location quickly gives insights when exploring data sets.

Conclusion

While this post only brushes off the surface of what Kibana has to offer, there are more visualizations available as well as Timelion, the new powerful (but sadly under-documented) the "time series expression interface". In all cases, even basic features as shown above already provide plenty of different options to make sense of one’s data sets.

Follow me Follow me