Add data to Elasticsearch

edit

There are multiple ways to ingest data into Elasticsearch. The option that you choose depends on whether you’re working with timestamped data or non-timestamped data, where the data is coming from, its complexity, and more.

You can load sample data into your Elasticsearch cluster using Kibana, to get started quickly.

General content

edit

General content is data that does not have a timestamp. This could be data like vector embeddings, website content, product catalogs, and more. For general content, you have the following options for adding data to Elasticsearch indices:

  • API: Use the Elasticsearch Document APIs to index documents directly, using the Dev Tools Console, or cURL.

    If you’re building a website or app, then you can call Elasticsearch APIs using an Elasticsearch client in the programming language of your choice. If you use the Python client, then check out the elasticsearch-labs repo for various example notebooks.

  • File upload: Use the Kibana file uploader to index single files for one-off testing and exploration. The GUI guides you through setting up your index and field mappings.
  • Web crawler: Extract and index web page content into Elasticsearch documents.
  • Connectors: Sync data from various third-party data sources to create searchable, read-only replicas in Elasticsearch.

Timestamped data

edit

Timestamped data in Elasticsearch refers to datasets that include a timestamp field. If you use the Elastic Common Schema (ECS), this field is named @timestamp. This could be data like logs, metrics, and traces.

For timestamped data, you have the following options for adding data to Elasticsearch data streams:

  • Elastic Agent and Fleet: The preferred way to index timestamped data. Each Elastic Agent based integration includes default ingestion rules, dashboards, and visualizations to start analyzing your data right away. You can use the Fleet UI in Kibana to centrally manage Elastic Agents and their policies.
  • Beats: If your data source isn’t supported by Elastic Agent, use Beats to collect and ship data to Elasticsearch. You install a separate Beat for each type of data to collect.
  • Logstash: Logstash is an open source data collection engine with real-time pipelining capabilities that supports a wide variety of data sources. You might use this option because neither Elastic Agent nor Beats supports your data source. You can also use Logstash to persist incoming data, or if you need to send the data to multiple destinations.
  • Language clients: The linked tutorials demonstrate how to use Elasticsearch programming language clients to ingest data from an application. In these examples, Elasticsearch is running on Elastic Cloud, but the same principles apply to any Elasticsearch deployment.

If you’re interested in data ingestion pipelines for timestamped data, use the decision tree in the Elastic Cloud docs to understand your options.