HTTP Poller, Opening up a New World for Logstash
I’m pleased to announce the release of a brand new Logstash input: HTTP Poller. With this new input you’ll be able to repeatedly poll one or more HTTP endpoints and turn the response into Logstash events. There are a number of practical uses for this plugin, like:
- Monitoring a daemon such as HAProxy or Apache’s HTTP stats end points for metrics such as total open connections, or the number of busy workers
- Checking that your website is up and responding in a timely manner
- Hitting a custom metrics endpoint in a webapp to gain deep insight into some process not exposed through logs
The syntax for this plugin is dead simple to boot, as seen in the example below:
input { http_poller { # List of urls to hit # URLs can either have a simple format for a get request # Or use more complex HTTP features urls => { some_service => "http://localhost:8000" some_other_service => { method => "POST" url => "http://localhost:8000/foo" } } # Maximum amount of time to wait for a request to complete request_timeout => 30 # How far apart requests should be interval => 60 # Decode the results as JSON codec => "json" # Store metadata about the request in this key metadata_target => "http_poller_metadata" } } output { stdout { codec => rubydebug } }
This request will put the HTTP responses of the polled endpoints into the message
field and provide metadata like response timing and HTTP response headers in the http_poller_metadata
field.
Using HTTP Poller to monitor website status
We’ll start with a simple example; using HTTP Poller to monitor whether a given URL is up, down, or responding slowly. The following Logstash config will hit a webserver on http://localhost:8000
. If you don’t have one up, you can start one that takes a variable length of time to return a JSON response by running ruby -rsinatra -e 'set :port, 8000; get("/") { n= rand(10) / 10.0; sleep n; "{\"t\": #{n}}" }'
in your console (assuming you have ruby installed and have installed the sinatra gem with gem install sinatra
). After that, try running Logstash with the sample config below. The sample config has been heavily annotated to make reading it easy, even for a complete logstash novice.
Using the config below, you can generate Kibana charts like the one just underneath this paragraph, showing the ratio of slow to fast requests to your service over time. If you have Elastic's Watcher set up, you can use that to automatically send you alerts when you receive slow requests as well
input { http_poller { urls => { "localhost" => "http://localhost:8000" } automatic_retries => 0 # Check the site every 10s interval => 10 # Wait no longer than 8 seconds for the request to complete request_timeout => 8 # Store metadata about the request in this field metadata_target => http_poller_metadata # Tag this request so that we can throttle it in a filter tags => website_healthcheck } } filter { # The poller doesn't set an '@host' field because it may or may not have meaning # In this case we can set it to the 'name' of the host which will be 'localhost' # The name is the key used in the poller's 'url' config if [http_poller_metadata] { mutate { add_field => { "@host" => "%{http_poller_metadata[name]}" } } } # Classify slow requests if [http_poller_metadata][runtime_seconds] and [http_poller_metadata][runtime_seconds] > 0.5 { mutate { add_tag => "slow_request" } } # Classify requests that can't connect or have an unexpected response code if [http_request_failure] or [http_poller_metadata][code] != 200 { # Tag all these events as being bad mutate { add_tag => "bad_request" } } if "bad_request" in [tags] { # Tag all but the first message every 10m as "_throttled_poller_alert" # We will later drop messages tagged as such. throttle { key => "%{@host}-RequestFailure" period => 600 before_count => -1 after_count => 1 add_tag => "throttled_poller_alert" } # Drop all throttled events if "throttled_poller_alert" in [tags] { drop {} } # The SNS output plugin requires special fields to send its messages # This should be fixed soon, but for now we need to set them here # For a more robust and flexible solution (tolerant of logstash restarts) # Logging to elasticsearch and using the Watcher plugin is advised mutate { add_field => { sns_subject => "%{@host} is not so healthy! %{@tags}" sns_message => '%{http_request_failure}' codec => json } } } } output { # Catch throttled messages for request failures # If we hit one of these, send the output to stdout # as well as an AWS SNS Topic # UNCOMMENT THIS TO ENABLE SNS SUPPORT #if "http_request_failure" in [tags] { # sns { # codec => json # access_key_id => "YOURKEY" # secret_access_key => "YOURSECRET" # arn => "arn:aws:sns:us-east-1:773216979769:logstash-test-topic" # } #} elasticsearch { protocol => http } stdout { codec => rubydebug } }
Using HTTP Poller to monitor HAProxy stats and Apache server-status pages
Both HAProxy and Apache HTTPD support stats API endpoints for to get information like the number of open connections. In the following example I’ll show how to setup Logstash to record this information to elasticsearch.
The key takeaway here is that you can use the HTTP Poller to monitor the health of HAProxy and Apache with greater insight than you’d get with logs alone. Additionally, you can use it to trigger alerts via AWS SNS topics when those thresholds are passed. Those SNS topics can be configured to send texts or emails to alert an operator.
Implementing this requires you to enable the stats port on HAProxy as well as enable mod_status on apache. To make this easier to try out I’ve prepared a script that will launch a set of docker machines with this stuff all setup. To run it you’ll just need bash, docker, and docker-machine. Try checking out all the code in this directory. After you have the code run buildit.sh
, which will launch the docker machines and write out a sample logstash.conf
file. After you’ve done that, just run logstash -f logstash.conf
to see it with action. If you hit the haproxy server (whose address will be printed out by buildit) with traffic and load the the kibana.elasticdump
file into .kibana
with elasticdump, you should see something like the kibana dashboard below.
Notice that we can graph such things as HAProxy sessions, the response times of polling requests (which rise as the server is more and more saturated, and which HAProxy services are active. All things that cannot be exposed via plain log data, but can be reached via HTTP polling.
If you’d rather not run the examples to see the config used, I’ve reproduced a well commented version of it below:
input { # Setup one poller for httpd, we keep these separate to tag them differently http_poller { urls => { "custom_httpd_t1" => { url => "http://192.168.99.100:8001/server-status?auto"} "custom_httpd_t2" => { url => "http://192.168.99.100:8002/server-status?auto"} "custom_httpd_t3" => { url => "http://192.168.99.100:8003/server-status?auto"} } tags => apache_stats codec => plain metadata_target => http_poller_metadata interval => 1 } # Another poller, this time for haproxy http_poller { urls => { ha_proxy_stats => "http://statsguy:statspass@192.168.99.100:1936/;csv" } tags => haproxy_stats codec => plain metadata_target => http_poller_metadata interval => 1 } # Pull the regular Apache/HAProxy logs via docker commands # This is a hack for the purposes of this example pipe { command => "docker logs -f custom_httpd_t1" tags => [ "apache" ] add_field => { "@host" => "custom_httpd_t1" } } pipe { command => "docker logs -f custom_httpd_t2" tags => [ "apache" ] add_field => { "@host" => "custom_httpd_t2" } } pipe { command => "docker logs -f custom_httpd_t3" tags => [ "apache" ] add_field => { "@host" => "custom_httpd_t3" } } pipe { command => "docker logs -f custom_haproxy" tags => [ "haproxy" ] add_field => { "@host" => "custom_haproxy" } } } filter { if [http_poller_metadata] { # Properly set the '@host' field based on the poller's metadat mutate { add_field => { "@host" => "%{http_poller_metadata[name]}" } } } # Processed polled apache data if "apache_stats" in [tags] { # Apache stats uses inconsistent key names. Make sure all fields are camel cased, no spaces mutate { gsub => ["message", "^Total ", "Total"] } # Parse the keys/values in the apache stats, they're separated by ": ' kv { source => message target => apache_stats field_split => "\n" value_split => ":\ " trim => " " } # We can make educated guesses that strings with mixes of numbers and dots # are numbers, cast them for better behavior in Elasticsearch/Kibana ruby { code => "h=event['apache_stats']; h.each {|k,v| h[k] = v.to_f if v =~ /\A-?[0-9\.]+\Z/}" } } # Process polled HAProxy data if "haproxy_stats" in [tags] { split {} # We can't read the haproxy csv header, so we define it statically # This is because we're working line by line, and so have no header context csv { target => "haproxy_stats" columns => [ pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime ] } # Drop the haproxy CSV header, which always has this special value if [haproxy_stats][pxname] == "# pxname" { drop{} } # We no longer need the message field as the CSV filter has created separate # fields for data. mutate { remove_field => message } # Same as the cast we did for apache ruby { code => "h=event['haproxy_stats']; h.each {|k,v| h[k] = v.to_f if v =~ /\A-?[0-9\.]+\Z/}" } } # Process the regular apache logs we captured from the docker pipes if "apache" in [tags] { grok { match => [ "message", "%{COMMONAPACHELOG:apache}" ] } } # We're going to email ourselves on error, but we want to throttle the emails # so we don't get so many. This says only send one every 5 minutes if "_http_request_failure" in [tags] { throttle { key => "%{@host}-RequestFailure" period => 600 before_count => -1 after_count => 1 add_tag => "_throttled_poller_alert" } # Drop all throttled events if "_throttled_poller_alert" in [tags] { drop {} } # The SNS output plugin requires special fields to send its messages # This should be fixed soon, but for now we need to set them here mutate { add_field => { sns_subject => "%{@host} unreachable via HTTP" sns_message => "%{http_request_failure}" } } } } output { # Store everything in the local elasticsearch elasticsearch { protocol => http } # Catch throttled messages for request failures # If we hit one of these, send the output to stdout # as well as an AWS SNS Topic # UNCOMMENT TO ENABLE SNS #if "_http_request_failure" in [tags] { # sns { # codec => json # access_key_id => "YOURKEY" # secret_access_key => "YOURSECRET" # arn => "arn:aws:sns:us-east-1:773216979769:logstash-test-topic" # } stdout { codec => rubydebug } } }
Using the HTTP Client Mixin in Your Own Plugin
HTTP Poller is the first plugin to use logstash-mixin-http_client. If you need to add an HTTP client to a plugin you’re writing consider using the HttpClient
mixin. This mixin will add a bunch of well validated configuration options and sane defaults to your plugin for free. Using it is as simple as adding include LogStash::PluginMixins::HttpClient
to the body of your plugin. This will expose a new client
method in your plugin class, which is an instance of the Manticore http client. Manticore is a well written and performant client is based on Apache Commons HTTP. Of note is Manticore’s ability to execute requests asynchronously using thread pools with a simple API
Wrapping Up
I hope these examples have been useful! If you find any other uses for the http input poller, let us know! If you think you’ve found a bug in it, please submit an issue.