Elastic MongoDB connector reference
editElastic MongoDB connector reference
editThe Elastic MongoDB connector is a connector for MongoDB data sources.
Availability and prerequisites
editThis connector is available as a native connector in Elastic versions 8.5.0 and later. To use this connector as a native connector, satisfy all native connector requirements.
This connector is also available as a connector client from the Python connectors framework. To use this connector as a connector client, satisfy all connector client requirements.
This connector has no additional prerequisites beyond the shared requirements, linked above.
Usage
editTo use this connector as a native connector, use the Connector workflow. See Native connectors.
To use this connector as a connector client, see Connector clients and frameworks.
For additional operations, see Usage.
Example
editAn example is available for this connector. See MongoDB connector tutorial.
Known issues
editExpressions and variables in aggregation pipelines
editIt’s not possible to use expressions like new Date()
inside an aggregation pipeline.
These expressions won’t be evaluated by the underlying MongoDB client, but will be passed as a string to the MongoDB instance.
A possible workaround is to use aggregation variables.
Incorrect (new Date()
will be interpreted as string):
{ "aggregate": { "pipeline": [ { "$match": { "expiresAt": { "$gte": "new Date()" } } } ] } }
Correct (usage of $$NOW):
{ "aggregate": { "pipeline": [ { "$addFields": { "current_date": { "$toDate": "$$NOW" } } }, { "$match": { "$expr": { "$gte": [ "$expiresAt", "$current_date" ] } } } ] } }
See Known issues for any issues affecting all connectors.
Troubleshooting
editSee Troubleshooting.
Security
editSee Security.
Compatibility
editThis connector is compatible with MongoDB Atlas and MongoDB 3.6 and later.
The data source and your Elastic deployment must be able to communicate with each other over a network.
Configuration
editEach time you create an index to be managed by this connector, you will create a new connector configuration. You will need some or all of the following information about the data source.
- Host
-
The URI of the MongoDB host. Examples:
-
mongodb+srv://my_username:my_password@cluster0.mongodb.net/mydb?w=majority
-
mongodb://127.0.0.1:27017
-
- Direct connection
-
Whether to use the direct connection option for the MongoDB client.
Examples:
-
true
-
false
-
- Username
-
The MongoDB username the connector will use.
The user must have access to the configured database and collection. You may want to create a dedicated, read-only user for each connector.
- Password
- The MongoDB password the connector will use.
Anonymous authentication is supported for testing purposes only, but should not be used in production. Omit the username and password, to use default values.
- Database
- The MongoDB database to sync. The database must be accessible using the configured username and password.
- Collection
- The MongoDB collection to sync. The collection must exist within the configured database. The collection must be accessible using the configured username and password.
Deployment using Docker
editFollow these instructions to deploy the MongoDB connector using Docker.
Step 1: Download sample configuration file
Download the sample configuration file. You can either download it manually or run the following command:
curl https://raw.githubusercontent.com/elastic/connectors-python/main/config.yml --output ~/connectors-python-config/config.yml
Remember to update the --output
argument value if your directory name is different, or you want to use a different config file name.
Step 2: Update the configuration file for your self-managed connector
Update the configuration file with the following settings to match your environment:
-
elasticsearch.host
-
elasticsearch.password
-
connector_id
-
service_type
Use mongodb as the service_type
value.
Don’t forget to uncomment "mongodb" in the sources
section of the yaml
file.
If you’re running the connector service against a Dockerized version of Elasticsearch and Kibana, your config file will look like this:
elasticsearch: host: http://host.docker.internal:9200 username: elastic password: <YOUR_PASSWORD> connector_id: <CONNECTOR_ID_FROM_KIBANA> service_type: mongodb sources: # UNCOMMENT "mongodb" below to enable the MongoDB connector #mongodb: connectors.sources.mongo:MongoDataSource #s3: connectors.sources.s3:S3DataSource #dir: connectors.sources.directory:DirectoryDataSource #mysql: connectors.sources.mysql:MySqlDataSource #network_drive: connectors.sources.network_drive:NASDataSource #google_cloud_storage: connectors.sources.google_cloud_storage:GoogleCloudStorageDataSource #azure_blob_storage: connectors.sources.azure_blob_storage:AzureBlobStorageDataSource #postgresql: connectors.sources.postgresql:PostgreSQLDataSource #oracle: connectors.sources.oracle:OracleDataSource #mssql: connectors.sources.mssql:MSSQLDataSource
Note that the config file you downloaded might contain more entries, so you will need to manually copy/change the settings that apply to you.
Normally you’ll only need to update elasticsearch.host
, elasticsearch.password
, connector_id
and service_type
to run the connector service.
Step 3: Run the Docker image
Run the Docker image with the Connector Service using the following command:
docker run \ -v ~/connectors-python-config:/config \ --network "elastic" \ --tty \ --rm \ docker.elastic.co/enterprise-search/elastic-connectors:8.8.2.0-SNAPSHOT \ /app/bin/elastic-ingest \ -c /config/config.yml
Refer to this guide in the Python framework repository for more details.
Documents and syncs
editThe following describes the default syncing behavior for this connector. Use sync rules and ingest pipelines to customize syncing for specific indices.
All documents in the configured MongoDB database and collection are extracted and transformed into documents in your Elasticsearch index.
- The connector creates one Elasticsearch document for each MongoDB document in the configured database and collection.
- For each document, the connector transforms each MongoDB field into an Elasticsearch field.
- For each field, Elasticsearch dynamically determines the data type.
This results in Elasticsearch documents that closely match the original MongoDB documents.
The Elasticsearch mapping is created when the first document is created.
Each sync is a "full" sync. For each MongoDB document discovered:
- If it does not exist, the document is created in Elasticsearch.
- If it already exists in Elasticsearch, the Elasticsearch document is replaced and the version is incremented.
- If an existing Elasticsearch document no longer exists in the MongoDB collection, it is deleted from Elasticsearch.
-
Embedded documents are stored as an
object
field in the parent document.
This is recursive, because embedded documents can themselves contain embedded documents.
Sync rules
editThe following sections describe Sync rules for this connector.
Advanced rules for MongoDB can be used to express either find
queries or aggregation pipelines.
They can also be used to tune options available when issuing these queries/pipelines.
find
queries
editYou must create a text index on the MongoDB collection in order to perform text searches.
For find
queries, the structure of this JSON DSL should look like:
{ "find":{ "filter": { // find query goes here }, "options":{ // query options go here } } }
For example:
{ "find": { "filter": { "$text": { "$search": "garden", "$caseSensitive": false } }, "skip": 10, "limit": 1000 } }
find
queries also support additional options, for example the projection
object:
{ "find": { "filter": { "languages": [ "English" ], "runtime": { "$gt":90 } }, "projection":{ "tomatoes": 1 } } }
Where the available options are:
-
allow_disk_use
(true, false) — When set to true, the server can write temporary data to disk while executing the find operation. This option is only available on MongoDB server versions 4.4 and newer. -
allow_partial_results
(true, false) — Allows the query to get partial results if some shards are down. -
batch_size
(Integer) — The number of documents returned in each batch of results from MongoDB. -
filter
(Object) — The filter criteria for the query. -
limit
(Integer) — The max number of docs to return from the query. -
max_time_ms
(Integer) — The maximum amount of time to allow the query to run, in milliseconds. -
no_cursor_timeout
(true, false) — The server normally times out idle cursors after an inactivity period (10 minutes) to prevent excess memory use. Set this option to prevent that. -
projection
(Array, Object) — The fields to include or exclude from each doc in the result set. If an array, it should have at least one item. -
return_key
(true, false) — Return index keys rather than the documents. -
show_record_id
(true, false) — Return the$recordId
for each doc in the result set. -
skip
(Integer) — The number of docs to skip before returning results.
Aggregation pipelines
editSimilarly, for aggregation pipelines, the structure of the JSON DSL should look like:
{ "aggregate":{ "pipeline": [ // pipeline elements go here ], "options": { // pipeline options go here } } }
Where the available options are:
-
allowDiskUse
(true, false) — Set to true if disk usage is allowed during the aggregation. -
batchSize
(Integer) — The number of documents to return per batch. -
bypassDocumentValidation
(true, false) — Whether or not to skip document level validation. -
collation
(Object) — The collation to use. -
comment
(String) — A user-provided comment to attach to this command. -
hint
(String) — The index to use for the aggregation. -
let
(Object) — Mapping of variables to use in the pipeline. See the server documentation for details. -
maxTimeMs
(Integer) — The maximum amount of time in milliseconds to allow the aggregation to run.
Content extraction
editSee Content extraction.
Framework and source
editThis connector is included in the Python connectors framework.
View the source code for this connector (branch 8.8, compatible with Elastic 8.8).
Migrating from the Ruby connector framework
editAs part of the 8.8.0 release the MongoDB connector was moved from the Ruby connectors framework to the Python connectors framework.
This change introduces minor formatting modifications to data ingested from MongoDB:
- Nested object id field name has changed from "_id" to "id". For example, if you had a field "customer._id", this will now be named "customer.id".
-
Date format has changed from
YYYY-MM-DD'T'HH:mm:ss.fff'Z'
toYYYY-MM-DD'T'HH:mm:ss
If your MongoDB connector stopped working after migrating from 8.7.x to 8.8.x, read the workaround outlined in Known issues. If that does not work, we recommend deleting the search index attached to this connector and re-creating a MongoDB connector from scratch.