Bahubali Shetti

NGNIX log analytics with GenAI in Elastic

Elastic has a set of embedded capabilities such as a GenAI RAG-based AI Assistant and a machine learning platform as part of the product baseline. These make analyzing the vast number of logs you get from NGINX easier.

NGNIX log analytics with GenAI in Elastic

Elastic Observability provides a full observability solution, supporting metrics, traces, and logs for applications and infrastructure. NGINX, which is highly used for web serving, load balancing, http caching, and reverse proxy, is the key to many applications and outputs a large volume of logs. NGINX’s access logs, which detail all requests made to the NGINX server, and error logs which record server-related issues and problems are key to managing and analyzing NGINX issues along with understanding what is happening to your application. 

In managing NGINX Elastic provides several capabilities:

  1. Easy ingest, parsing, and out-of-the-box dashboards. Check out the simple how-to in our docs. Based on logs, these dashboards show several items over time, response codes, errors, top pages, data volume, browsers used, active connections, drop rates, and much more.

  2. Out-of-the-box ML-based anomaly detection jobs for your NGINX logs. These jobs help pinpoint anomalies against request rates, IP address request rates, URL access, status codes, and visitor rate anomalies.

  3. ES|QL which helps work through logs and build out charts during analysis.

  4. Elastic’s GenAI Assistant provides a simple natural language interface that helps analyze all the logs and can pull out issues from ML jobs and even create dashboards. The Elastic AI Assistant also automatically uses ES|QL.

  5. NGINX SLOs - Finally Elastic provides the ability to define and monitor SLOs for your NGINX logs. While most SLOs are metrics-based, Elastic allows you to create logs-based SLOs. We detailed this in a previous blog.

NGINX logs are another example of why logs are great.  Logging is an important part of Observability, for which we generally think of metrics and tracing. However, the amount of logs an application and the underlying infrastructure output can be significantly daunting and NGINX is usually the starting point for most analyses. 

In today’s blog, we’ll cover how the out-of-the-box ML-based anomaly detection jobs can help RCA, and how Elastic’s GenAI Assistant helps easily work through logs to pinpoint issues in minutes. 

Prerequisites and config

If you plan on following this blog, here are some of the components and details we used to set up this demonstration:

  • Ensure you have an account on Elastic Cloud and a deployed stack (see instructions here).

  • Bring up an NGINX server on a host. OR run an application with NGINX as a front end and drive traffic.

  • Install the NGINX integration and assets and review the dashboards as noted in the docs.

  • Ensure you have an ML node configured in your Elastic stack

  • To use the AI Assistant you will need a trial or upgrade to Platinum.

In our scenario, we use data from 3 months from our Elastic environment to help highlight the features. Hence you might need to run your application with traffic for a specific time frame to follow along.

Analyzing the issues with AI Assistant

As detailed in a previous blog, you can get alerted on issues via SLO monitoring against NGINX logs. Let’s assume you have an SLO based on status codes as we outlined in the previous blog. You can immediately analyze the issue via the AI Assistant. Because it's a chat interface we simply open the AI Assistant and work through some simple analysis: (See Animated GIF for a demo)

AI Assistant analysis:

  • Using lens graph all http response status codes < 400 and > =400 from filebeat-nginx-elasticco-anon-2017. http.response.status.code is not an integer - We wanted to simply understand the amount of requests resulting in status code >= 400 and graph the results. We see that 15% of the requests were not successful, hence an SLO alert being triggered.

  • Which ip address (field source.adress) has the highest number of http.response.status.code >= 400 from filebeat-nginx-elasticco-anon-2017. http.response.status.code is not an integer  - We were curious is there was a specific IP address not having successful requests. 72.57.0.53, with a count of 25,227 occurrences is daily high but not the ensure 2 failed requests.

  • What country (source.geo.country_iso_code) is source.address=72.57.0.53 coming from. Use filebeat-nginx-elasticco-anon-2017. - Again we were curious if this came from a specific country. And the IP address 72.57.0.53 is coming from the country with the ISO code IN, which corresponds to India. Nothing out of the ordinary.

  • Did source.address=72.57.0.53 have any (http.response.status.code < 400) from filebeat-nginx-elasticco-anon-2017. http.response.status.code is not an integer -  Oddly the IP address in question only had 4000+ successful responses. Meaning its not malicious, and points to something else.

  • What are the different status codes (http.response.status.code>=400), from source.address=72.57.0.53. Use filebeat-nginx-elasticco-anon-2017. http.response.status.code is not an integer. Provide counts for each status code - We are curious whether or not we see any 502, which there were none, but most of the failures were 404. 

  • What are the different status codes (http.response.status.code>=400). Use filebeat-nginx-elasticco-anon-2017. http.response.status.code is not an integer. Provide counts for each status code - Regardless of a specific address, what is the largest number of status code occurrences > 400. This also points to 404. 

  • What does a high 404 count from a specific IP address mean from NGINX logs? - Asking this question, we need to understand the potential causes of this from our application. From the answers, we can rule out security probing and web scraping, as we validated that a specific address 72.57.0.53 has a low non-success request status code. It also rules out User error. Hence this points potentially to Broken Links or Missing Resources.

Watch the flow:

Video Thumbnail

Potential issue:

It seems that we potentially have an issue with the backend serving specific answers or having issues with resources (database, or broken links). This is cursing the higher-than-normal non-successful status codes>=400.

Key highlights from AI Assistant:

As you watched this video you will notice a few things:

  1. We analyzed millions of logs in a matter of minutes using a set of simple natural language queries. 

  2. We didn’t need to know any special query language. The AI Assistant used Elastic’s ES|QL but can similarly use KQL also. 

  3. The AI Assistant easily builds out graphs

  4. The AI Assistant is accessing and using internal information stored in Elastic’s indices. Vs a simple “google foo” based AI Assistant. This is enabled through RAG, and the AI Assistant can also bring up known issues in github, runbooks, and other useful internal information.

Check out the following blog on how the AI Assistant uses RAG to retrieve internal information. Specifically using github and runbooks.

Locating anomalies with ML

While using the AI Assistant is great for analyzing information, another important aspect of NGINX log management is to ensure you can manage log spikes and anomalies. Elastic has a machine learning platform that allows you to develop jobs to analyze specific metrics or multiple metrics to look for anomalies.When using NGINX, there are several out-of-the-box anomaly detection jobs. These work specifically on NGINX access logs.

  • Low_request_rate_nginx - Detect low request rates

  • Source_ip_request_rate_nginx - Detect unusual source IPs - high request rates

  • Source_ip_url_count_nginx - Detect unusual source IPs - high distinct count of URLs

  • Status_code_rate_nginx - Detect unusual status code rates

  • Visitor_rate_nginx - Detect unusual visitor rates

Being right out of the box, lets look at the job - Status_code_rate_nginx, which is related to our previous analysis.

With a few simple clicks we immediately get an analysis showing a specific IP address - 72.57.0.53, having higher than normal non-successful requests. Oddly we also found this is using the AI Assistant.

We can take this further with conversations with the AI Assistant, look at the logs, and/or even look at the other ML anomaly jobs.

Conclusion:

You’ve now seen how easily Elastic’s RAG-based AI Assistant can help analyze NGINX logs without even the need to know query syntax, understand where the data is, and understand even the fields. Additionally, you’ve also seen how we can alert you when a potential issue or degradation in service (SLO). 

Check out other resources on NGINX logs:

Out-of-the-box anomaly detection jobs for NGINX

Using the NGINX integration to ingest and analyze NGINX Logs

NGINX Logs based SLOs in Elastic

Using GitHub issues, runbooks, and other internal information for RCAs with Elastic’s RAG based AI Assistant

Try it out

Existing Elastic Cloud customers can access many of these features directly from the Elastic Cloud console. Not taking advantage of Elastic on the cloud? Start a free trial.

All of this is also possible in your environment. Learn how to get started today.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.

Elastic, Elasticsearch, ESRE, Elasticsearch Relevance Engine and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.

Share this article