IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Gelf output plugin Graphite output plugin »

› ›

Google_bigquery output plugin

edit

Google_bigquery output plugin

edit

Plugin version: v3.2.4
Released on: 2018-04-06
Changelog

For other versions, see the Versioned plugin docs.

Installation

edit

For plugins not bundled by default, it is easy to install by running bin/logstash-plugin install logstash-output-google_bigquery. See Working with plugins for more details.

Getting Help

edit

For questions about the plugin, open a topic in the Discuss forums. For bugs or feature requests, open an issue in Github. For the list of Elastic supported plugins, please consult the Elastic Support Matrix.

Description

edit

Author: Rodrigo De Castro <rdc@google.com>
Date: 2013-09-20

Copyright 2013 Google Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Summary: plugin to upload log events to Google BigQuery (BQ), rolling files based on the date pattern provided as a configuration setting. Events are written to files locally and, once file is closed, this plugin uploads it to the configured BigQuery dataset.

VERY IMPORTANT: . To make good use of BigQuery, your log events should be parsed and structured. Consider using grok to parse your events into fields that can be uploaded to BQ. . You must configure your plugin so it gets events with the same structure, so the BigQuery schema suits them. In case you want to upload log events with different structures, you can utilize multiple configuration blocks, separating different log events with Logstash conditionals. More details on Logstash conditionals can be found here: http://logstash.net/docs/1.2.1/configuration#conditionals

For more info on Google BigQuery, please go to: https://developers.google.com/bigquery/

In order to use this plugin, a Google service account must be used. For more information, please refer to: https://developers.google.com/storage/docs/authentication#service_accounts

Recommendations:

Experiment with the settings depending on how much log data you generate, your needs to see "fresh" data, and how much data you could lose in the event of crash. For instance, if you want to see recent data in BQ quickly, you could configure the plugin to upload data every minute or so (provided you have enough log events to justify that). Note also, that if uploads are too frequent, there is no guarantee that they will be imported in the same order, so later data may be available before earlier data.
BigQuery charges for storage and for queries, depending on how much data it reads to perform a query. These are other aspects to consider when considering the date pattern which will be used to create new tables and also how to compose the queries when using BQ. For more info on BigQuery Pricing, please access: https://developers.google.com/bigquery/pricing

USAGE: This is an example of logstash config:

output {
   google_bigquery {
     project_id => "folkloric-guru-278"                        (required)
     dataset => "logs"                                         (required)
     csv_schema => "path:STRING,status:INTEGER,score:FLOAT"    (required) 
     key_path => "/path/to/privatekey.p12"                     (required)
     key_password => "notasecret"                              (optional)
     service_account => "1234@developer.gserviceaccount.com"   (required)
     temp_directory => "/tmp/logstash-bq"                      (optional)
     temp_file_prefix => "logstash_bq"                         (optional)
     date_pattern => "%Y-%m-%dT%H:00"                          (optional)
     flush_interval_secs => 2                                  (optional)
     uploader_interval_secs => 60                              (optional)
     deleter_interval_secs => 60                               (optional)
   }
}

Specify either a csv_schema or a json_schema.

Refactor common code between Google BQ and GCS plugins.
Turn Google API code into a Plugin Mixin (like AwsConfig).
There’s no recover method, so if logstash/plugin crashes, files may not be uploaded to BQ.

Google_bigquery Output Configuration Options

edit

This plugin supports the following configuration options plus the Common Options described later.

Setting	Input type	Required
`csv_schema`	string	No
`dataset`	string	Yes
`date_pattern`	string	No
`deleter_interval_secs`	number	No
`flush_interval_secs`	number	No
`ignore_unknown_values`	boolean	No
`json_schema`	hash	No
`key_password`	string	No
`key_path`	string	Yes
`project_id`	string	Yes
`service_account`	string	Yes
`table_prefix`	string	No
`table_separator`	string	No
`temp_directory`	string	No
`temp_file_prefix`	string	No
`uploader_interval_secs`	number	No

Also see Common Options for a list of options supported by all output plugins.

`csv_schema`

edit

Value type is string
Default value is nil

Schema for log data. It must follow this format: <field1-name>:<field1-type>,<field2-name>:<field2-type>,… Example: path:STRING,status:INTEGER,score:FLOAT

`dataset`

edit

This is a required setting.
Value type is string
There is no default value for this setting.

BigQuery dataset to which these events will be added to.

`date_pattern`

edit

Value type is string
Default value is "%Y-%m-%dT%H:00"

Time pattern for BigQuery table, defaults to hourly tables. Must Time.strftime patterns: www.ruby-doc.org/core-2.0/Time.html#method-i-strftime

`deleter_interval_secs`

edit

Value type is number
Default value is 60

Deleter interval when checking if upload jobs are done for file deletion. This only affects how long files are on the hard disk after the job is done.

`flush_interval_secs`

edit

Value type is number
Default value is 2

Flush interval in seconds for flushing writes to log files. 0 will flush on every message.

`ignore_unknown_values`

edit

Value type is boolean
Default value is false

Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false.

`json_schema`

edit

Value type is hash
Default value is nil

Schema for log data, as a hash. Example: json_schema ⇒ { fields ⇒ [{ name ⇒ "timestamp" type ⇒ "TIMESTAMP" }, { name ⇒ "host" type ⇒ "STRING" }, { name ⇒ "message" type ⇒ "STRING" }] }

`key_password`

edit

Value type is string
Default value is "notasecret"

Private key password for service account private key.

`key_path`

edit

This is a required setting.
Value type is string
There is no default value for this setting.

Path to private key file for Google Service Account.

`project_id`

edit

This is a required setting.
Value type is string
There is no default value for this setting.

Google Cloud Project ID (number, not Project Name!).

`service_account`

edit

This is a required setting.
Value type is string
There is no default value for this setting.

Service account to access Google APIs.

`table_prefix`

edit

Value type is string
Default value is "logstash"

BigQuery table ID prefix to be used when creating new tables for log data. Table name will be <table_prefix><table_separator><date>

`table_separator`

edit

Value type is string
Default value is "_"

BigQuery table separator to be added between the table_prefix and the date suffix.

`temp_directory`

edit

Value type is string
Default value is ""

Directory where temporary files are stored. Defaults to /tmp/logstash-bq-<random-suffix>

`temp_file_prefix`

edit

Value type is string
Default value is "logstash_bq"

Temporary local file prefix. Log file will follow the format: <prefix>_hostname_date.part?.log

`uploader_interval_secs`

edit

Value type is number
Default value is 60

Uploader interval when uploading new files to BigQuery. Adjust time based on your time pattern (for example, for hourly files, this interval can be around one hour).

Common Options

edit

The following configuration options are supported by all output plugins:

Setting	Input type	Required
`codec`	codec	No
`enable_metric`	boolean	No
`id`	string	No

`codec`

edit

Value type is codec
Default value is "plain"

The codec used for output data. Output codecs are a convenient method for encoding your data before it leaves the output without needing a separate filter in your Logstash pipeline.

`enable_metric`

edit

Value type is boolean
Default value is true

Disable or enable metric logging for this specific plugin instance. By default we record all the metrics we can, but you can disable metrics collection for a specific plugin.

`id`

edit

Value type is string
There is no default value for this setting.

Add a unique ID to the plugin configuration. If no ID is specified, Logstash will generate one. It is strongly recommended to set this ID in your configuration. This is particularly useful when you have two or more plugins of the same type. For example, if you have 2 google_bigquery outputs. Adding a named ID in this case will help in monitoring Logstash when using the monitoring APIs.

output {
  google_bigquery {
    id => "my_plugin_id"
  }
}

« Gelf output plugin Graphite output plugin »