Example: Parse logs in the Common Log Format

edit

Example: Parse logs in the Common Log Format

edit

In this example tutorial, you’ll use an ingest pipeline to parse server logs in the Common Log Format before indexing. Before starting, check the prerequisites for ingest pipelines.

The logs you want to parse look similar to this:

212.87.37.154 - - [30/May/2099:16:21:15 +0000] \"GET /favicon.ico HTTP/1.1\"
200 3638 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\"

These logs contain an IP address, timestamp, and user agent. You want to give these three items their own field in Elasticsearch for faster searches and visualizations. You also want to know where the request is coming from.

  1. In Kibana, open the main menu and click Stack Management > Ingest Node Pipelines.

    Kibana’s Ingest Node Pipelines list view
  2. Click Create a pipeline.
  3. Provide a name and description for the pipeline.
  4. Add a grok processor to parse the log message:

    1. Click Add a processor and select the Grok processor type.
    2. Set the field input to message and enter the following grok pattern:

      %{IPORHOST:client.ip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:@timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:user_agent}
    3. Click Add to save the processor.
  5. Add processors to map the date, IP, and user agent fields. Map the appropriate field to each processor type:

    In the Date processor, specify the date format you want to use: dd/MMM/yyyy:HH:mm:ss Z.

    Your form should look similar to this:

    Processors for Ingest Node Pipelines

    The four processors will run sequentially:
    Grok > Date > GeoIP > User agent
    You can reorder processors using the arrow icons.

    Alternatively, you can click the Import processors link and define the processors as JSON:

    {
      "processors": [
        {
          "grok": {
            "field": "message",
            "patterns": ["%{IPORHOST:client.ip} %{USER:ident} %{USER:auth} \\[%{HTTPDATE:@timestamp}\\] \"%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:user_agent}"]
          }
        },
        {
          "date": {
            "field": "@timestamp",
            "formats": [ "dd/MMM/yyyy:HH:mm:ss Z" ]
          }
        },
        {
          "geoip": {
            "field": "client.ip"
          }
        },
        {
          "user_agent": {
            "field": "user_agent"
          }
        }
      ]
    }
  6. To test the pipeline, click Add documents.
  7. In the Documents tab, provide a sample document for testing:

    [
      {
        "_source": {
          "message": "212.87.37.154 - - [05/May/2099:16:21:15 +0000] \"GET /favicon.ico HTTP/1.1\" 200 3638 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\""
        }
      }
    ]
  8. Click Run the pipeline and verify the pipeline worked as expected.
  9. If everything looks correct, close the panel, and then click Create pipeline.

    You’re now ready to load the logs data using the index API.

  10. Index a document with the pipeline you created.

    PUT my-index/_doc/1?pipeline=my-pipeline
    {
      "message": "212.87.37.154 - - [05/May/2099:16:21:15 +0000] \"GET /favicon.ico HTTP/1.1\" 200 3638 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\""
    }
  11. To verify, run:

    GET my-index/_doc/1