Logstash Reference: other versions:
Logstash Introduction
Getting Started with Logstash
- Installing Logstash
- Stashing Your First Event
- Parsing Logs with Logstash
- Stitching Together Multiple Input and Output Plugins
How Logstash Works
- Execution Model
- ECS in Logstash
- Processing Details
Setting Up and Running Logstash
- Logstash Directory Layout
- Logstash Configuration Files
- logstash.yml
- Secrets keystore for secure settings
- Running Logstash from the Command Line
- Running Logstash as a Service on Debian or RPM
- Running Logstash on Docker
- Configuring Logstash for Docker
- Running Logstash on Windows
- Logging
- Shutting Down Logstash
Upgrading Logstash
- Upgrading using package managers
- Upgrading using a direct download
- Upgrading between minor versions
- Upgrading Logstash to 8.0
Creating a Logstash pipeline
- Structure of a pipeline
- Accessing event data and fields
- Using environment variables
- Sending data to Elastic Cloud (hosted Elasticsearch Service)
- Logstash configuration examples
Secure your connection
Advanced Logstash Configurations
- Multiple Pipelines
- Pipeline-to-pipeline communication
- Reloading the Config File
- Managing Multiline Events
- Glob Pattern Support
- Converting Ingest Node Pipelines
Logstash-to-Logstash communication
- Logstash-to-Logstash: Lumberjack output to Beats input
- Logstash-to-Logstash: HTTP output to HTTP input
Managing Logstash
- Centralized Pipeline Management
- Configure Centralized Pipeline Management
Working with Logstash Modules
- ArcSight Module
- Netflow Module (deprecated)
- Azure Module (deprecated)
Working with Filebeat Modules
- Use ingest pipelines for parsing
- Example: Set up Filebeat modules to work with Kafka and Logstash
Working with Winlogbeat Modules
Queues and data resiliency
- Memory queue
- Persistent queues (PQ)
- Dead letter queues (DLQ)
Transforming Data
- Performing Core Operations
- Deserializing Data
- Extracting Fields and Wrangling Data
- Enriching Data with Lookups
Deploying and Scaling Logstash
Performance Tuning
- Performance Troubleshooting
- Tuning and Profiling Logstash Performance
Monitoring Logstash
- Elastic Agent collection
- Metricbeat collection
- Legacy collection (deprecated)
- Monitoring UI
- Pipeline Viewer UI
- Troubleshooting
Monitoring Logstash with APIs
- Node Info API
- Plugins info API
- Node Stats API
- Hot Threads API
Working with plugins
- Cross-plugin concepts and features
- Generating plugins
- Offline Plugin Management
- Private Gem Repositories
- Event API
Integration plugins
- aws
- elastic_enterprise_search
- jdbc
- kafka
- rabbitmq
Input plugins
- azure_event_hubs
- beats
- cloudwatch
- couchdb_changes
- dead_letter_queue
- elastic_agent
- elastic_serverless_forwarder
- elasticsearch
- exec
- file
- ganglia
- gelf
- generator
- github
- google_cloud_storage
- google_pubsub
- graphite
- heartbeat
- http
- http_poller
- imap
- irc
- java_generator
- java_stdin
- jdbc
- jms
- jmx
- kafka
- kinesis
- log4j
- lumberjack
- meetup
- pipe
- puppet_facter
- rabbitmq
- redis
- relp
- rss
- s3
- s3-sns-sqs
- salesforce
- snmp
- snmptrap
- sqlite
- sqs
- stdin
- stomp
- syslog
- tcp
- twitter
- udp
- unix
- varnishlog
- websocket
- wmi
- xmpp
Output plugins
- boundary
- circonus
- cloudwatch
- csv
- datadog
- datadog_metrics
- dynatrace
- elastic_app_search
- elastic_workplace_search
- elasticsearch
- email
- exec
- file
- ganglia
- gelf
- google_bigquery
- google_cloud_storage
- google_pubsub
- graphite
- graphtastic
- http
- influxdb
- irc
- java_stdout
- juggernaut
- kafka
- librato
- loggly
- lumberjack
- metriccatcher
- mongodb
- nagios
- nagios_nsca
- opentsdb
- pagerduty
- pipe
- rabbitmq
- redis
- redmine
- riak
- riemann
- s3
- sink
- sns
- solr_http
- sqs
- statsd
- stdout
- stomp
- syslog
- tcp
- timber
- udp
- webhdfs
- websocket
- xmpp
- zabbix
Filter plugins
- age
- aggregate
- alter
- bytes
- cidr
- cipher
- clone
- csv
- date
- de_dot
- dissect
- dns
- drop
- elapsed
- elasticsearch
- environment
- extractnumbers
- fingerprint
- geoip
- grok
- http
- i18n
- java_uuid
- jdbc_static
- jdbc_streaming
- json
- json_encode
- kv
- memcached
- metricize
- metrics
- mutate
- prune
- range
- ruby
- sleep
- split
- syslog_pri
- threats_classifier
- throttle
- tld
- translate
- truncate
- urldecode
- useragent
- uuid
- wurfl_device_detection
- xml
Codec plugins
- avro
- cef
- cloudfront
- cloudtrail
- collectd
- csv
- dots
- edn
- edn_lines
- es_bulk
- fluent
- graphite
- gzip_lines
- jdots
- java_line
- java_plain
- json
- json_lines
- line
- msgpack
- multiline
- netflow
- nmap
- plain
- protobuf
- rubydebug
Tips and best practices
- JVM settings
Troubleshooting
- Troubleshooting Logstash
- Troubleshooting plugins
- Troubleshooting specific plugins
Contributing to Logstash
- How to write a Logstash input plugin
- How to write a Logstash codec plugin
- How to write a Logstash filter plugin
- How to write a Logstash output plugin
- Logstash Plugins Community Maintainer Guide
- Document your plugin
- Publish your plugin to RubyGems.org
- List your plugin
- Contributing a patch to a Logstash plugin
- Extending Logstash core
Contributing a Java Plugin
- How to write a Java input plugin
- How to write a Java codec plugin
- How to write a Java filter plugin
- How to write a Java output plugin
Breaking changes
- Breaking changes in 8.0
- Breaking changes in 7.0
- Breaking change across PQ versions prior to Logstash 6.3.0
- Breaking changes in 6.0
Release Notes
- Logstash 8.8.2 Release Notes
- Logstash 8.8.1 Release Notes
- Logstash 8.8.0 Release Notes
- Logstash 8.7.1 Release Notes
- Logstash 8.7.0 Release Notes
- Logstash 8.6.2 Release Notes
- Logstash 8.6.1 Release Notes
- Logstash 8.6.0 Release Notes
- Logstash 8.5.3 Release Notes
- Logstash 8.5.2 Release Notes
- Logstash 8.5.1 Release Notes
- Logstash 8.5.0 Release Notes
- Logstash 8.4.2 Release Notes
- Logstash 8.4.1 Release Notes
- Logstash 8.4.0 Release Notes
- Logstash 8.3.3 Release Notes
- Logstash 8.3.2 Release Notes
- Logstash 8.3.1 Release Notes
- Logstash 8.3.0 Release Notes
- Logstash 8.2.3 Release Notes
- Logstash 8.2.2 Release Notes
- Logstash 8.2.1 Release Notes
- Logstash 8.2.0 Release Notes
- Logstash 8.1.3 Release Notes
- Logstash 8.1.2 Release Notes
- Logstash 8.1.1 Release Notes
- Logstash 8.1.0 Release Notes
- Logstash 8.0.1 Release Notes
- Logstash 8.0.0 Release Notes
- Logstash 8.0.0-rc2 Release Notes
- Logstash 8.0.0-rc1 Release Notes
- Logstash 8.0.0-beta1 Release Notes
- Logstash 8.0.0-alpha2 Release Notes
- Logstash 8.0.0-alpha1 Release Notes

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Github input plugin Google_pubsub input plugin »

› ›

Google Cloud Storage Input Plugin

edit

Google Cloud Storage Input Plugin

edit

Plugin version: v0.14.0
Released on: 2023-05-02
Changelog

For other versions, see the Versioned plugin docs.

Installation

edit

For plugins not bundled by default, it is easy to install by running bin/logstash-plugin install logstash-input-google_cloud_storage. See Working with plugins for more details.

Getting Help

edit

For questions about the plugin, open a topic in the Discuss forums. For bugs or feature requests, open an issue in Github. For the list of Elastic supported plugins, please consult the Elastic Support Matrix.

Description

edit

Extracts events from files in a Google Cloud Storage bucket.

Example use-cases:

Read Stackdriver logs from a Cloud Storage bucket into Elastic.
Read gzipped logs from cold-storage into Elastic.
Restore data from an Elastic dump.
Extract data from Cloud Storage, transform it with Logstash and load it into BigQuery.

Note: While this project is partially maintained by Google, this is not an official Google product.

Installation Note

Attempting to install this plugin may result in an error:

Bundler::VersionConflict: Bundler could not find compatible versions for gem "mimemagic":
  In Gemfile:
    logstash-input-google_cloud_storage (= 0.11.0) was resolved to 0.11.0, which depends on
      mimemagic (>= 0.3.7)

Could not find gem 'mimemagic (>= 0.3.7)', which is required by gem 'logstash-input-google_cloud_storage (= 0.11.0)', in any of the sources or in gems cached in vendor/cache

If this error occurs, you can fix it by manually installing the "mimemagic" dependency directly into the Logstash’s internal Ruby Gems cache, which is present at vendor/bundle/jruby/<ruby_version>/gems/. This could be done using the bundled Ruby gem’s instance inside the Logstash’s installation bin/ folder.

To manually install the "mimemagic" gem into Logstash use:

bin/ruby -S gem install mimemagic -v '>= 0.3.7'

The mimemagic gem also requires the shared-mime-info package to be present, it can be installed using apt-get install shared-mime-info on Debian/Ubuntu or yum install shared-mime-info on Red Hat/RockyOS distributions.

Then install the plugin as usual with:

bin/logstash-plugin install logstash-input-google_cloud_storage

Metadata Attributes

edit

The plugin exposes several metadata attributes about the object being read. You can access these later in the pipeline to augment the data or perform conditional logic.

Key	Type	Description
`[@metadata][gcs][bucket]`	`string`	The name of the bucket the file was read from.
`[@metadata][gcs][name]`	`string`	The name of the object.
`[@metadata][gcs][metadata]`	`object`	A map of metadata on the object.
`[@metadata][gcs][md5]`	`string`	MD5 hash of the data. Encoded using base64.
`[@metadata][gcs][crc32c]`	`string`	CRC32c checksum, as described in RFC 4960. Encoded using base64 in big-endian byte order.
`[@metadata][gcs][generation]`	`long`	The content generation of the object. Used for object versioning
`[@metadata][gcs][line]`	`long`	The position of the event in the file. 1 indexed.
`[@metadata][gcs][line_id]`	`string`	A deterministic, unique ID describing this line. This lets you do idempotent inserts into Elasticsearch.

More information about object metadata can be found in the official documentation.

Example Configurations

edit

Basic

edit

Basic configuration to read JSON logs every minute from my-logs-bucket. For example, Stackdriver logs.

input {
  google_cloud_storage {
    interval => 60
    bucket_id => "my-logs-bucket"
    json_key_file => "/home/user/key.json"
    file_matches => ".*json"
    codec => "json_lines"
  }
}
output { stdout { codec => rubydebug } }

Idempotent Inserts into Elasticsearch

edit

If your pipeline might insert the same file multiple times you can use the line_id metadata key as a deterministic id.

The ID has the format: gs://<bucket_id>/<object_id>:<line_num>@<generation>. line_num represents the nth event deserialized from the file starting at 1. generation is a unique id Cloud Storage generates for the object. When an object is overwritten it gets a new generation.

input {
  google_cloud_storage {
    bucket_id => "batch-jobs-output"
  }
}

output {
  elasticsearch {
    document_id => "%{[@metadata][gcs][line_id]}"
  }
}

From Cloud Storage to BigQuery

edit

Extract data from Cloud Storage, transform it with Logstash and load it into BigQuery.

input {
  google_cloud_storage {
    interval => 60
    bucket_id => "batch-jobs-output"
    file_matches => "purchases.*.csv"
    json_key_file => "/home/user/key.json"
    codec => "plain"
  }
}

filter {
  csv {
    columns => ["transaction", "sku", "price"]
    convert => {
      "transaction" => "integer"
      "price" => "float"
    }
  }
}

output {
  google_bigquery {
    project_id => "my-project"
    dataset => "logs"
    csv_schema => "transaction:INTEGER,sku:INTEGER,price:FLOAT"
    json_key_file => "/path/to/key.json"
    error_directory => "/tmp/bigquery-errors"
    ignore_unknown_values => true
  }
}

Additional Resources

edit

Google Cloud Storage Input Configuration Options

edit

This plugin supports the following configuration options plus the Common Options described later.

Setting	Input type	Required
`bucket_id`	string	Yes
`json_key_file`	path	No
`interval`	number	No
`file_matches`	string	No
`file_exclude`	string	No
`metadata_key`	string	No
`processed_db_path`	path	No
`delete`	boolean	No
`unpack_gzip`	boolean	No

Also see Common Options for a list of options supported by all input plugins.

`bucket_id`

edit

Value type is string
There is no default value for this setting.

The bucket containing your log files.

`json_key_file`

edit

Value type is path
There is no default value for this setting.

The path to the key to authenticate your user to the bucket. This service user should have the storage.objects.update permission so it can create metadata on the object preventing it from being scanned multiple times.

If no key is provided the plugin will try to use the default application credentials, and if they don’t exist, it falls back to unauthenticated mode.

`interval`

edit

Value type is number
Default is: 60

The number of seconds between looking for new files in your bucket.

`file_matches`

edit

Value type is string
Default is: .*\.log(\.gz)?

A regex pattern to filter files. Only files with names matching this will be considered. All files match by default.

`file_exclude`

edit

Value type is string
Default is: ^$

Any files matching this regex are excluded from processing. No files are excluded by default.

`metadata_key`

edit

Value type is string
Default is: x-goog-meta-ls-gcs-input

This key will be set on the objects after they’ve been processed by the plugin. That way you can stop the plugin and not upload files again or prevent them from being uploaded by setting the field manually.

the key is a flag, if a file was partially processed before Logstash exited some events will be resent.

`processed_db_path`

edit

Value type is path
Default is: LOGSTASH_DATA/plugins/inputs/google_cloud_storage/db.

If set, the plugin will store the list of processed files locally. This allows you to create a service account for the plugin that does not have write permissions. However, the data will not be shared across multiple running instances of Logstash.

`delete`

edit

Value type is boolean
Default is: false

Should the log file be deleted after its contents have been updated?

`unpack_gzip`

edit

Value type is boolean
Default is: true

If set to true, files ending in .gz are decompressed before they’re parsed by the codec. The file will be skipped if it has the suffix, but can’t be opened as a gzip, e.g. if it has a bad magic number.

Common Options

edit

The following configuration options are supported by all input plugins:

Setting	Input type	Required
`add_field`	hash	No
`codec`	codec	No
`enable_metric`	boolean	No
`id`	string	No
`tags`	array	No
`type`	string	No

Details

edit

`add_field`

edit

Value type is hash
Default value is {}

Add a field to an event

`codec`

edit

Value type is codec
Default value is "plain"

The codec used for input data. Input codecs are a convenient method for decoding your data before it enters the input, without needing a separate filter in your Logstash pipeline.

`enable_metric`

edit

Value type is boolean
Default value is true

Disable or enable metric logging for this specific plugin instance by default we record all the metrics we can, but you can disable metrics collection for a specific plugin.

`id`

edit

Value type is string
There is no default value for this setting.

Add a unique ID to the plugin configuration. If no ID is specified, Logstash will generate one. It is strongly recommended to set this ID in your configuration. This is particularly useful when you have two or more plugins of the same type, for example, if you have 2 google_cloud_storage inputs. Adding a named ID in this case will help in monitoring Logstash when using the monitoring APIs.

input {
  google_cloud_storage {
    id => "my_plugin_id"
  }
}

Variable substitution in the id field only supports environment variables and does not support the use of values from the secret store.

`tags`

edit

Value type is array
There is no default value for this setting.

Add any number of arbitrary tags to your event.

This can help with processing later.

`type`

edit

Value type is string
There is no default value for this setting.

Add a type field to all events handled by this input.

Types are used mainly for filter activation.

The type is stored as part of the event itself, so you can also use the type to search for it in Kibana.

If you try to set a type on an event that already has one (for example when you send an event from a shipper to an indexer) then a new input will not override the existing type. A type set at the shipper stays with that event for its life even when sent to another Logstash server.

« Github input plugin Google_pubsub input plugin »

Was this helpful?

Feedback

The Search AI Company

Generative AI

Search

Security

Observability

By solution

Industries

Google Cloud Storage Input Plugin

Google Cloud Storage Input Plugin

Installation

Getting Help

Description

Metadata Attributes

Example Configurations

Basic

Idempotent Inserts into Elasticsearch

From Cloud Storage to BigQuery

Additional Resources

Google Cloud Storage Input Configuration Options

bucket_id

json_key_file

interval

file_matches

file_exclude

metadata_key

processed_db_path

delete

unpack_gzip

Common Options

Details

add_field

codec

enable_metric

id

tags

type

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

`bucket_id`

`json_key_file`

`interval`

`file_matches`

`file_exclude`

`metadata_key`

`processed_db_path`

`delete`

`unpack_gzip`

`add_field`

`codec`

`enable_metric`

`id`

`tags`

`type`