/ LOGGING, ANALYTICS, SLF4J

Logging additional metadata

In February, before the lockdown, I presented my Fast logs talk at ConFoo. At the end, I had an interesting question I had to cut short because of the timing. This blog post aims to describe the relevant points of the talk, the question, as well as some possible answer.

One improvement to have faster logs

In this talk, I highlight different ways to make logs faster. Nowadays, most logs are aggregated into a single place for later usage. For example, a widespread architecture is based on the Elastic stack:

Naive sample architecture

While this design works, it has a lot of components. In particular, Logstash is here to parse the original log lines in JSON format, so that Elasticsearch can index each field separately.

sample.log
2020-02-29 13:56:54.906  INFO 1 --- [  main] c.f.f.Main : Informative message
{
 "date": [[ "20-02-29" ]],
 "MONTHDAY": [[ "29" ]],
 "MONTHNUM": [[ "02" ]],
 "YEAR": [[ "20" ]],
 "time": [[ "13:56:54.906" ]],
 "HOUR": [[ "13" ]],
 "MINUTE": [[ "56" ]],
 "SECOND": [[ "54.906" ]],
 "level": [[ "INFO" ]],
 "threadName": [[ "main" ]],
 "class": [[ "c.f.f.Main" ]],
 "message":
   [[ "Informative message" ]]
}

One improvement is to configure the logging framework with a pattern so that it directly writes JSON-formatted one-liner logs. The following shows how to do that with Logback:

logback.xml
<configuration>
  <appender name="FILE" class="ch.qos.logback.core.FileAppender">
    <file>/var/log/sample.log</file>
    <encoder>
        <pattern>{ "date": "%d{dd-MM-yy}", "time": "%d{HH:mm:ss.SSS}", "thread": "%thread", "level": "%-5level", "className": "%logger{36}", "message": "%msg" }%n</pattern>
    </encoder>
  </appender>
  <root level="debug">
    <appender-ref ref="FILE" />
  </root>
</configuration>

The output is now the following:

/var/log/sample.log
{ "date": "29-02-20", "time": "23:13:53.640", "thread": "main", "level": "INFO", "className": "c.f.f.Main", "message": "Informative message" }

That allows to send logs directly to Elasticsearch, and thus to simplify the architecture by completely removing Logstash:

Simplified sample architecture

Adding additional meta-data

Now, suppose the business requires additional fields in addition to the log message for analytics purpose. With simple string messages, this is as easy as getting a reference to the value, and adding it to the field:

ch.frankel.fastlogs.Main
logger.info("Client-Id -> {} | Informative message", client.getId());

When executed under the correct log level, this will output the desired message:

sample.log
2020-02-29 13:56:54.906  INFO 1 --- [ main] ch.frankel.fastlogs.Main : Client-Id -> XYZ | Informative message

With Logstash, it’s possible to improve the groking pattern so that it successfully parses the Client-Id value and sends it as a dedicated field to Elasticsearch. Unfortunately, the JSON-formatted log option shown above doesn’t allow this:

sample.log
{ "date": "29-02-20", "time": "23:13:53.640", "thread": "main", "level": "INFO", "className": "ch.frankel.fastlogs.Main", "message": "Client-Id -> XYZ | Informative message" }

Notice the Client-Id is part of the message, along with separators. It’s possible to send this to Elasticsearch and parse the message with custom ingesters. Yet, this will take additional processing time on each message indexed. Is it back to square one and Logstash?

Mapped Diagnostics Context to the rescue

SLF4J - and similar frameworks - allow to add additional meta-data through the usage of MDC. In short, a MDC is a hashmap context holder bound to a specific thread.

MDC class diagram

MDC can be leveraged to solve the issue. Here’s a sample code snippet:

ch.frankel.fastlogs.Main
MDC.put("client-id", client.getId());
logger.info("Informative message");
MDC.remove("client-id");

Note that we explicitly set it before, and remove it after, the logging statement. Depending on the context, it would be a benefit to use servlet filters to do that automatically in order to avoid mistakes. For example, Logback provides an out-of-the-box MDC filter to set request-related data.

Let’s slightly update the pattern to use MDC:

logback.xml
<pattern>{ "date": "%d{dd-MM-yy}", "time": "%d{HH:mm:ss.SSS}", "thread": "%thread", "level": "%-5level", "className": "%logger{36}", "message": "%msg", "Client-Id": "%X{client-id}" }%n</pattern>

Notice the %X{key} syntax. With that, we can keep our simple Logstash-free architecture, and still get the desired JSON output:

{ "date": "29-02-20", "time": "23:13:53.640", "thread": "main", "level": "INFO", "className": "ch.frankel.fastlogs.Main", "message": "Informative message", "Client-Id": "XYZ" }

Conclusion

Logs can be made faster by removing the need to parse them. To achieve that, one can directly output single-line JSON messages. When additional meta-data is required, just use a MDC and the logging pattern appropriate to the logging implementation.

Nicolas Fränkel

Nicolas Fränkel

Nicolas Fränkel is a technologist focusing on cloud-native technologies, DevOps, CI/CD pipelines, and system observability. His focus revolves around creating technical content, delivering talks, and engaging with developer communities to promote the adoption of modern software practices. With a strong background in software, he has worked extensively with the JVM, applying his expertise across various industries. In addition to his technical work, he is the author of several books and regularly shares insights through his blog and open-source contributions.

Read More
Logging additional metadata
Share this