How to instrument a polyglot microservices application with Elastic APM

One of the challenges introduced by microservices architectures is the ability to understand how the application performs and where most time is spent. The Elastic Stack and Elastic APM can provide observability for modern, microservice-based solutions as well as monolithic applications.

Application Performance Monitoring (APM) combines different technologies to provide a deep, transparent and holistic view of what each service component is doing, where, when, and for how long. APM shows how services interact, follows transactions as they transverse the entire system, and lets you see bottlenecks. These rich analytics enable us to spot, debug, and fix issues quicker, resulting in improved user experience and business efficiency.

As a bonus, APM data stored in Elasticsearch is just another index — the combination and correlation of application metrics with other operational data becomes easy and accessible.

Observability and APM tools can be DevOps culture catalysts for organizations of different shapes and sizes. They bring together development, operations, security, product, and other teams trying to understand and improve their applications — each from their own point of view.

In this blog post we use an example application to demonstrate how to instrument a microservices application using Elastic APM, with a focus on distributed tracing.

Here’s a high-level view of the application architecture:

Application architecture

Although a contrived and basic example application, it goes beyond “hello world” examples built for “evaluating various instruments (monitoring, tracing, you name it)...” The original project was forked, and had a few modifications and updates applied. It was then instrumented using the Elastic APM.

Clearly, there’s more than one way to configure the agents and instrument an application with Elastic APM. What’s presented below is one option out of many — it does not claim to be the best or a unique way of setting things up, but rather demonstrates the flexibility a good observability solution provides.

Overview

The application is made up of five services:

  • Frontend is a VueJS application and provides the UI.
  • Auth API is written in Go and provides authorization functionality.
  • TODOs API is written with NodeJS using the express framework. It provides CRUD functionality for the todo records. Also, it logs "create" and "delete" operations to a Redis queue, so they can be later processed by Log Message Processor.
  • Users API is a Spring Boot project written in Java and provides user profiles.
  • Log Message Processor is a queue processor written in Python. It reads messages from Redis queue and prints them to stdout.

In total, the app comprises six docker containers:

docker ps
CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS              PORTS                    NAMES
6696f65b1759        frontend                "docker-entrypoint.s..."   6 hours ago         Up 58 seconds       0.0.0.0:8080->8080/tcp   microservice-app-example_frontend_1
6cc8a27f10f5        todos-api               "docker-entrypoint.s..."   6 hours ago         Up 58 seconds       8082/tcp                 microservice-app-example_todos-api_1
3e471f2163ca        log-message-processor   "python3 -u main.py"       6 hours ago         Up 58 seconds                                microservice-app-example_log-message-processor_1
bc1f09716288        auth-api                "/bin/sh -c /go/src/..."   6 hours ago         Up 58 seconds       8081/tcp                 microservice-app-example_auth-api_1
7144abac0d7c        users-api               "java -jar ./target/..."   6 hours ago         Up 59 seconds       8083/tcp                 microservice-app-example_users-api_1
84416563c37f        redis                   "docker-entrypoint.s..."   6 hours ago         Up 59 seconds       6379/tcp                 microservice-app-example_redis-queue_1

How to set up and run the lab

Prerequisites

The demo uses the Elastic Stack with APM on ESS. If you don’t have an Elasticsearch Service account you can set up a free trial. It comes with an APM server included as default.

This demo was run and tested on both 7.5.0 and 7.6.1. Here’s a view of the instances of the simple test cluster that was used.

View of the test cluster instances

The latest version of APM agents installed are being installed at build time.

The project is built and run using docker-compose, so you’d need that installed on your local machine.

Configuration

To get the example application running on our local machine and configures to send APM data to the Elasticsearch Service cluster, we need to:

Clone the project:

git clone https://github.com/nephel/microservice-app-example.git

Create a file called .env at the root directory of the project with the contents:

cd microservice-app-example
vi .env

Add the token and server URL values for the APM server, that correspond to our environment:

TOKEN=XXXX
SERVER=XXXX

The values can be obtained from this page on the Admin console:

Token and Server URL values

The values are passed as environment variables to each docker container from the docker-compose.yaml file, e.g.:

ELASTIC_APM_SECRET_TOKEN: "${TOKEN}"
ELASTIC_APM_SERVER_URL: "${SERVER}"

Running the project

To start the cloned project, run docker-compose up --build in the project’s root directory. This will create all the containers and their dependencies, and start sending application trace data to the APM server.

The first time the application with all its microservices and their dependencies is built with docker-compose, it will take a moment.

The configuration for APM and the microservices, has been left in verbose debug mode. This is intentional, as it was helpful when setting up and testing. If you find it too verbose, it can be changed in the docker-compose.yaml file.

On the APM server config side we need to enable RUM with

apm-server.rum.enabled: true

in the User setting overrides:

Enable RUM

To generate some APM data we need to access the application and generate some traffic and requests. To do that, browse to http://127.0.0.1:8080, log in, add some items to the to do list of each user and delete some items, then log out and re-log in with a different user and repeat.

Note that the username/passwords are:

  • admin/admin
  • johnd/foo
  • janed/ddd

Here is a short video showing how to bring up the app with docker-compose and the interaction with the app and Kibana APM UI:

The microservices and their instrumentation

Here is a view of the five services as they appear in the APM UI:

Services listed in the APM UI

Each service shows up in the list, with high-level transaction, environment, and language information. Now let’s take a look how each microservice was instrumented, and what we see at the Kibana APM UI for each.

Frontend Vue JS

The technology used for the frontend is https://vuejs.org/ which is supported by the Elastic Real User Monitoring (RUM) agent.

The @elastic/apm-rum-vue package is installed via npm by adding to the frontend/Dockerfile:

COPY package.json ./
RUN npm install
RUN npm install --save @elastic/apm-rum-vue
COPY . .
CMD ["sh", "-c", "npm start" ]

The package is imported and configured in the frontend/src/router/index.js, like so:

...
import { ApmVuePlugin } from '@elastic/apm-rum-vue'
Vue.use(Router)
const router = new Router({
  mode: 'history',
  routes: [
    {
      path: '/login',
      name: 'login',
      component: require('@/components/Login.vue')
    },
    {
      path: '/',
      alias: '/todos',
      name: 'todos',
      component: require('@/components/Todos.vue'),
      beforeEnter: requireLoggedIn
    }
  ]
})
const ELASTIC_APM_SERVER_URL = process.env.ELASTIC_APM_SERVER_URL
Vue.use(ApmVuePlugin, {
  router,
  config: {
    serviceName: 'frontend',
    serverUrl: ELASTIC_APM_SERVER_URL,
    serviceVersion: '',
    logLevel: 'debug'
  }
})
...
...
export default router

In addition to the auto instrumented spans, additional component specific custom spans are also added for the various components, e.g. in the frontend/src/components/Login.vue, the following was added:

...
      span: null
    }
  },
  created () {
    this.span = this.$apm.startSpan('component-login', 'custom')
  },
  mounted () {
    this.span && this.span.end()
  }
...

Both page load and route-change events are captured as transactions with the path of the route as its name.

This is what we see when inspecting the APM related indices and docs, filtering for page load transactions:

APM-related indices and docs

A page load event and transaction can be created at any time by reloading the page without cache (e.g. Command-Shift-R on Mac).

This is a view of the transactions we see for the frontend service — note the drop-down allowing us to filter on transaction type:

Transactions for frontend service

Auth API Golang Echo

This microservice is implemented in Golang using the echo framework version 3, which is supported by the Elastic APM Go agent.

The main changes and additions are done in the auth-api/main.go file, where the Elastic APM agent along with the two modules apmecho and apmhttp are imported:

...
        "go.elastic.co/apm"
        "go.elastic.co/apm/module/apmecho"
        "go.elastic.co/apm/module/apmhttp"
        "github.com/labstack/echo/middleware"
...

The instrumentation for the echo middleware is done in line 48, below:

...
 28 func main() {
 29         hostport := ":" + os.Getenv("AUTH_API_PORT")
 30         userAPIAddress := os.Getenv("USERS_API_ADDRESS")
 31
 32         envJwtSecret := os.Getenv("JWT_SECRET")
 33         if len(envJwtSecret) != 0 {
 34                 jwtSecret = envJwtSecret
 35         }
 36
 37         userService := UserService{
 38                 Client:         apmhttp.WrapClient(http.DefaultClient),
 39                 UserAPIAddress: userAPIAddress,
 40                 AllowedUserHashes: map[string]interface{}{
 41                         "admin_admin": nil,
 42                         "johnd_foo":   nil,
 43                         "janed_ddd":   nil,
 44                 },
 45         }
 46
 47         e := echo.New()
 48         e.Use(apmecho.Middleware())
 49
 50         e.Use(middleware.Logger())
 51         e.Use(middleware.Recover())
 52         e.Use(middleware.CORS())
 53
 54         // Route => handler
 55         e.GET("/version", func(c echo.Context) error {
 56                 return c.String(http.StatusOK, "Auth API, written in Go\n")
 57         })
 58
 59         e.POST("/login", getLoginHandler(userService)
...

This microservice makes outgoing HTTP requests to the users-api microservice, implemented in the auth-api/user.go file.

The wrapping of the HTTP client is done on line 38, above, tracing the outgoing request with distributed tracing as the request moves from one microservice to another.

More info can be found in this blog post and related documentation:

Tracing outgoing HTTP requests
To trace an outgoing HTTP request, you must instrument the HTTP client, and ensure the enclosing request context is propagated to the outgoing request. The instrumented client will use this to create a span, and also to inject headers into the outgoing HTTP request. Let’s see how that looks in practice
...
If this outgoing request is handled by another application instrumented with Elastic APM, you will end up with a “distributed trace” — a trace (collection of related transactions and spans) that crosses services. The instrumented client will inject a header identifying the span for the outgoing HTTP request, and the receiving service will extract that header and use it to correlate the client span with the transaction it records.

Additionally, spans for the echo transaction were added covering the login process:

...
func getLoginHandler(userService UserService) echo.HandlerFunc {
        f := func(c echo.Context) error {
                span, _ := apm.StartSpan(c.Request().Context(), "request-login", "app")
                requestData := LoginRequest{}
                decoder := json.NewDecoder(c.Request().Body)
                if err := decoder.Decode(&requestData); err != nil {
                        log.Printf("could not read credentials from POST body: %s", err.Error())
                        return ErrHttpGenericMessage
                }
                span.End()
                span, ctx := apm.StartSpan(c.Request().Context(), "login", "app")
                user, err := userService.Login(ctx, requestData.Username, requestData.Password)
                if err != nil {
                        if err != ErrWrongCredentials {
                                log.Printf("could not authorize user '%s': %s", requestData.Username, err.Error())
                                return ErrHttpGenericMessage
                        }
                        return ErrWrongCredentials
                }
                token := jwt.New(jwt.SigningMethodHS256)
                span.End()
                // Set claims
                span, _ = apm.StartSpan(c.Request().Context(), "generate-send-token", "custom")
                claims := token.Claims.(jwt.MapClaims)
                claims["username"] = user.Username
                claims["firstname"] = user.FirstName
                claims["lastname"] = user.LastName
                claims["role"] = user.Role
                claims["exp"] = time.Now().Add(time.Hour * 72).Unix()
                // Generate encoded token and send it as response.
                t, err := token.SignedString([]byte(jwtSecret))
                if err != nil {
                        log.Printf("could not generate a JWT token: %s", err.Error())
                        return ErrHttpGenericMessage
                }
                span.End()

Users-api SpringBoot

The service uses Spring Boot, which is supported by the Java agent. Spring Boot is very straightforward to instrument out of the box.

The agent is installed using the programmatic API setup for self attach, in users-api/src/main/java/com/elgris/usersapi/UsersApiApplication.java:

...
import co.elastic.apm.attach.ElasticApmAttacher;
@SpringBootApplication
public class UsersApiApplication {
        public static void main(String[] args) {
                ElasticApmAttacher.attach();
                SpringApplication.run(UsersApiApplication.class, args);
        }
}
...

In users-api pom.xml, the following was added:

                <dependency>
                        <groupId>co.elastic.apm</groupId>
                        <artifactId>apm-agent-attach</artifactId>
                        <version>[1.14.0,)</version>
               </dependency>

Below is an example of distributed tracing with the transactions and spans for the login process. We see the trace that comprises the transactions and spans as the flow though the microservices. From the frontend form, to the POST request to the auth-API for authentication, which in turn makes a GET request out to the users-API for the user profile retrieval:

Distributed tracing example

Todos-api NodeJS Express

The “todo” service uses the Express framework, which is supported by the Elastic APM Node.js agent.

The agent is installed by adding this in the Dockerfile:

RUN npm install elastic-apm-node --save

To initialize the agent, the following is added in the todos-api/server.js:

...
const apm = require('elastic-apm-node').start({
  // Override service name from package.json
  // Allowed characters: a-z, A-Z, 0-9, -, _, and space
  serviceName: 'todos-api',
  // Use if APM Server requires a token
  secretToken: process.env.ELASTIC_APM_SECRET_TOKEN,
  // Set custom APM Server URL (default: http://localhost:8200)
  serverUrl: process.env.ELASTIC_APM_SERVER_URL,
})
....

Custom spans are added in the todos-api/todoController.js for actions such as creating and deleting tasks in the todo app:

...
    create (req, res) {
        // TODO: must be transactional and protected for concurrent access, but
        // the purpose of the whole example app it's enough
        const span = apm.startSpan('creating-item')
        const data = this._getTodoData(req.user.username)
        const todo = {
            content: req.body.content,
            id: data.lastInsertedID
        }
        data.items[data.lastInsertedID] = todo
        data.lastInsertedID++
        this._setTodoData(req.user.username, data)
        if (span) span.end()
        this._logOperation(OPERATION_CREATE, req.user.username, todo.id)
        res.json(todo)
    }
    delete (req, res) {
        const data = this._getTodoData(req.user.username)
        const id = req.params.taskId
        const span = apm.startSpan('deleting-item')
        delete data.items[id]
        this._setTodoData(req.user.username, data)
        if (span) span.end()
        this._logOperation(OPERATION_DELETE, req.user.username, id)
        res.status(204)
        res.send()
    }
...

The logs are sent via the Redis queue to the Python microservice:

...
_logOperation(opName, username, todoId) {
 var span = apm.startSpan('logging-operation')
 this._redisClient.publish(
   this._logChannel,
   JSON.stringify({
     opName,
     username,
     todoId,
     spanTransaction: span.transaction
   }),
   function(err) {
     if (span) span.end()
     if (err) {
       apm.captureError(err)
     }
   }
 )
}
...

As can be seen above, the data sent to Redis includes the opName and username, along with the span.transaction information, which points to the parent transaction object.

Here’s a sample of the spanTransaction as captured in the verbose log output:

'spanTransaction': 
{
   "id":"4eb8801911b87fba",
   "trace_id":"5d9c555f6ef61f9d379e4a67270d2eb1",
   "parent_id":"a9f3ee75554369ab",
   "name":"GET unknown route (unnamed)",
   "type":"request",
   "subtype":"None",
   "action":"None",
   "duration":"None",
   "timestamp":1581287191572013,
   "result":"success",
   "sampled":True,
   "context":{
      "user":{
         "username":"johnd"
      },
      "tags":{
      },
      "custom":{
      },
      "request":{
         "http_version":"1.1",
         "method":"GET",
         "url":{
            "raw":"/todos",
            "protocol":"http:",
            "hostname":"127.0.0.1",
            "port":"8080",
            "pathname":"/todos",
            "full":"http://127.0.0.1:8080/todos"
         },
         "socket":{
            "remote_address":"::ffff:172.20.0.7",
            "encrypted":False
         },
         "headers":{
            "accept-language":"en-GB,en-US;q=0.9,en;q=0.8",
            "accept-encoding":"gzip, deflate, br",
            "referer":"http://127.0.0.1:8080/",
            "sec-fetch-mode":"cors",
            "sec-fetch-site":"same-origin",
            "elastic-apm-traceparent":"00-5d9c555f6ef61f9d379e4a67270d2eb1-a9f3ee75554369ab-01",
            "user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
            "authorization":"Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE1ODE1NDYzOTEsImZpcnN0bmFtZSI6IkpvaG4iLCJsYXN0bmFtZSI6IkRvZSIsInJvbGUiOiJVU0VSIiwidXNlcm5hbWUiOiJqb2huZCJ9.Kv2e7E70ysbVvP-hKlG-RJyfKSibmiy8kCO-xqm3P6g",
            "x-requested-with":"XMLHttpRequest",
            "accept":"application/json, text/plain, */*",
            "connection":"close",
            "host":"127.0.0.1:8080"
         }
      },
      "response":{
         "status_code":200,
         "headers":{
         }
      }
   },
   "sync":False,
   "span_count":{
      "started":5
   }
}

Among other things, it includes the 'elastic-apm-traceparent':'00-5d9c555f6ef61f9d379e4a67270d2eb1-a9f3ee75554369ab-01'.

This information will be used by the Python microservice to link this transaction to the parent trace. This will ensure that we have distributed tracing for the todo app requests as they traverse the frontend, todos-api, and are then logged via the Python processor.

Log-message-processor Python

The log-message-processor service is not using any framework e.g. django or flask, but rather is written in Python3 as a simple consumer that listens for new messages in Redis queue.

In the log-message-processor/requirements.txt this line is added:

elastic-apm

In the log-message-processor/main.py

The agent is imported along with TraceParent from the elasticapm.utils.disttracing

...
import elasticapm
from elasticapm.utils.disttracing import TraceParent
from elasticapm import Client
client = Client({'SERVICE_NAME': 'python'})
...

The decorator option was used in order to capture the spans of the log_message function:

...
@elasticapm.capture_span()
def log_message(message):
    time_delay = random.randrange(0, 2000)
    time.sleep(time_delay / 1000)
    print('message received after waiting for {}ms: {}'.format(time_delay, message))
...

The transactions start and end around the logging. The parent id of the trace is read by the elastic-apm-transparent header (sent via the Redis queue) using the TraceParent API:

...
if __name__ == '__main__':
    redis_host = os.environ['REDIS_HOST']
    redis_port = int(os.environ['REDIS_PORT'])
    redis_channel = os.environ['REDIS_CHANNEL']
    pubsub = redis.Redis(host=redis_host, port=redis_port, db=0).pubsub()
    pubsub.subscribe([redis_channel])
    for item in pubsub.listen():
        try:
            message = json.loads(str(item['data'].decode("utf-8")))
        except Exception as e:
            log_message(e)
            continue
        spanTransaction = message['spanTransaction']
        trace_parent1 = spanTransaction['context']['request']['headers']['elastic-apm-traceparent']
        print('trace_parent_log: {}'.format(trace_parent1))
        trace_parent = TraceParent.from_string(trace_parent1)
        client.begin_transaction("logger-transaction", trace_parent=trace_parent)
        log_message(message)
        client.end_transaction('logger-transaction')
...

Please note that while the TraceParent functionality is about to be formally documented and it is mentioned in this Discuss topic, and tested here.

The screenshots below are examples of distributed traces and the underlying transactions and spans. They show the creation and deletion process for the todo items, as it flows through the microservices. From the frontend to the POST request to the todos-api and its related spans for creation and logging, as well as subsequent log-message-processor transactions and spans.

Distributed tracing example


Distributed tracing example

Conclusion and next steps

We’ve witnessed first hand how to go about instrumenting a microservices application using different languages and frameworks.

There are many areas we did not explore, for example the Elastic APM app in Kibana also offers information of errors and metrics.

Furthermore, the Elastic APM allows you to correlate with the logs gathered from the applications. This means you can easily inject trace information into your logs, allowing you to explore logs in the Logs app, then jump straight into the corresponding APM traces — all while preserving the trace context.

Additionally, data collected by APM are also available in the SIEM app:

APM data in SIEM app

Another area of interest is using APM data with the machine learning capabilities of the Elastic stack in order to e.g. calculate anomaly scores on transaction response times.

Any questions on any of the above, feel free to get in touch with Elastic or submit them on the Discuss forum platform.