Configuration

edit

Configuration

edit

Enterprise Search requires little configuration to get started. However, for flexibility, the solution provides many configurable settings.

This document explains how to modify Enterprise Search configuration settings. It also provides a reference for each configuration setting and the configuration settings format.

Configure Enterprise Search

edit

Configure Enterprise Search by setting the values of various configuration settings. All deployments use the same configuration settings format, but access to the settings varies by deployment type.

Refer to the section for your deployment type:

Self-managed deployments can also set default values for some environment variables read by Enterprise Search.

Elastic Cloud

edit

Configure Enterprise Search on Elastic Cloud using custom user settings.

See Add Enterprise Search user settings in the Elastic Cloud documentation.

Elastic Cloud Enterprise (ECE)

edit

Configure Enterprise Search on Elastic Cloud Enterprise using custom user settings.

See Add Enterprise Search user settings in the Elastic Cloud Enterprise documentation.

Elastic Cloud on Kubernetes (ECK)

edit

Configure Enterprise Search on Elastic Cloud on Kubernetes (ECK) by editing the YAML specification.

See Configuration in the Elastic Cloud on Kubernetes documentation.

Docker

edit

When running with docker or docker-compose, configure Enterprise Search using environment variables.

Refer to the following examples:

Tar, deb, and rpm packages

edit

When installed using a tar, deb, or rpm package, configure Enterprise Search using a configuration file.

The location of the configuration file varies by package type:

.tar archives
config/enterprise_search.yml
.deb and .rpm packages
/usr/share/enterprise_search/config/enterprise_search.yml

Or, set the location of the configuration file using the following environment variable: ENT_SEARCH_CONFIG_PATH.

Configuration settings format

edit

The Enterprise Search configuration follows the YAML format.

You can nest multi-node configuration settings:

elasticsearch:
  host: http://127.0.0.1:9200
  username: elastic
  password: changeme

Or you can flatten configuration settings:

elasticsearch.host: http://127.0.0.1:9200
elasticsearch.username: elastic
elasticsearch.password: changeme

You can format non-scalar values as sequences:

secret_management.encryption_keys:
  - O9noPkMWqBTmae3hnvscNZnxXjDEl
  - 3D0LNI0iibBbjXhJGpx0lncGpwy0z

Or you can format non-scalar values as arrays:

secret_management.encryption_keys: ['O9noPkMWqBTmae3hnvscNZnxXjDEl', '3D0LNI0iibBbjXhJGpx0lncGpwy0z']

You can interpolate values from environment variables using ${}:

secret_management.encryption_keys: [${KEY_1}, ${KEY_2}]

Configuration settings reference

edit

The following settings are available to configure Enterprise Search.

Elastic Enterprise Search comes with reasonable defaults. Before adjusting the configuration, make sure you understand what you are trying to accomplish and the consequences.

For passwords, the use of environment variables is encouraged to keep values from being written to disk. For example: elasticsearch.password: ${ELASTICSEARCH_PASSWORD:changeme}

Secrets

edit
secret_management.encryption_keys

Encryption keys to protect your application secrets. This field is required.

secret_management.encryption_keys: []

secret_management.enforce_valid_encryption_keys

Encryption keys are checked for validity when Enterprise Search starts, and will include a warning message in the logs in case they are not correct. This setting controls whether Enterprise Search will start when incorrect encryption keys are found on startup. When true, Enterprise Search will not start if encryption keys are not correctly configured. Defaults to false.

secret_management.enforce_valid_encryption_keys: false

Elasticsearch

edit
allow_es_settings_modification

Enterprise Search needs one-time permission to alter Elasticsearch settings. Ensure the Elasticsearch settings are correct, then set the following to true. Or, adjust Elasticsearch’s config/elasticsearch.yml instead.

allow_es_settings_modification: false

elasticsearch.host

Elasticsearch full cluster URL.

elasticsearch.host: http://127.0.0.1:9200

elasticsearch.username

The username that the Enterprise Search server should use to make changes within Elasticsearch. For example, the Enterprise Search server uses this username to authenticate to Elasticsearch to create indices when needed.

The user must have adequate permission within Elasticsearch.

Alternatively, use a token for the Enterprise Search service account, which can be configured as elasticsearch.service_account_token.

elasticsearch.username: elastic

elasticsearch.password

The password for the username provided in elasticsearch.username.

elasticsearch.password: changeme

elasticsearch.service_account_token

Token for the Enterprise Search service account.

elasticsearch.service_account_token: XXXXXXXXXX

This token is used by the Enterprise Search server to authenticate to Elasticsearch when managing internal Enterprise Search indices. A guide on how to generate a service account token for Enterprise Search can be found in the Elasticsearch documentation for Service Accounts. If both the elasticsearch.service_account_token and the Authorization header in elasticsearch.headers are present, then the elasticsearch.service_account_token will take precedence.

elasticsearch.headers

Elasticsearch custom HTTP headers to add to each request.

elasticsearch.headers: 'X-My-Header: Contents of the header'

elasticsearch.ssl.enabled

SSL communication with Elasticsearch enabled or not.

elasticsearch.ssl.enabled: false

elasticsearch.ssl.certificate

Path to client certificate file to use for client-side validation from Elasticsearch.

elasticsearch.ssl.certificate_authority

Absolute pathname to the keystore that contains Certificate Authorities for Elasticsearch SSL certificate.

elasticsearch.ssl.certificate_authority: /path/elasticsearch/config/certs/http_ca.crt

elasticsearch.ssl.key

Path to the key file for the client certificate.

elasticsearch.ssl.key_passphrase

Passphrase for the above key file.

elasticsearch.ssl.verify

true to verify SSL certificate from Elasticsearch, false otherwise.

elasticsearch.ssl.verify: true

elasticsearch.startup_retry.enabled

Elasticsearch startup retry.

elasticsearch.startup_retry.enabled: true

elasticsearch.startup_retry.interval

Elasticsearch startup retry.

elasticsearch.startup_retry.interval: 5 # seconds

elasticsearch.startup_retry.fail_after

Elasticsearch startup retry.

elasticsearch.startup_retry.fail_after: 600 # seconds

Kibana

edit
kibana.host

Define the URL at which Enterprise Search can reach Kibana. Defaults to http://localhost:5601 for testing purposes.

kibana.host: http://localhost:5601

kibana.external_url

Define the exposed URL at which users can reach Kibana. Defaults to the value of kibana.host.

kibana.headers

Custom HTTP headers to add to requests made to Kibana from Enterprise Search.

kibana.headers: 'X-My-Header: Contents of the header'

kibana.startup_retry.enabled

Kibana startup retry.

kibana.startup_retry.enabled: false

kibana.startup_retry.interval

Kibana startup retry.

kibana.startup_retry.interval: 5 # seconds

kibana.startup_retry.fail_after

Kibana startup retry.

kibana.startup_retry.fail_after: 600 # seconds

Hosting and network

edit
ent_search.external_url logo cloud

Define the exposed URL at which users will reach Enterprise Search. Defaults to localhost:3002 for testing purposes. Most cases will use one of:

  • An IP: http://255.255.255.255
  • A FQDN: http://example.com
  • Shortname defined via /etc/hosts: http://ent-search.search

    ent_search.external_url: http://localhost:3002

ent_search.listen_host

Web application listen_host. Your application will run on this host. Must be a valid IPv4 or IPv6 address.

ent_search.listen_host: 127.0.0.1

ent_search.listen_port

Web application listen_port. Your application will run on this host and port. Must be a valid port number (1-65535).

ent_search.listen_port: 3002

Limits

edit

Configurable limits for Enterprise Search.

Overriding the default limits can impact performance negatively. Also, changing a limit here does not actually guarantee that Enterprise Search will work as expected as related Elasticsearch limits can be exceeded.

Workplace Search
edit
workplace_search.content_source.document_size.limit logo cloud

Configure the maximum allowed document size for a content source.

workplace_search.content_source.document_size.limit: 100kb

workplace_search.content_source.total_fields.limit logo cloud

Configure how many fields a content source can have.

The Elasticsearch/Lucene setting indices.query.bool.max_clause_count might also need to be adjusted if "Max clause count exceeded" errors start occurring. See Search settings in the Elasticsearch documentation.

workplace_search.content_source.total_fields.limit: 64

workplace_search.content_source.sync.enabled logo cloud

Configure whether or not workplace search can run synchronization jobs. If this is set to false, no syncs will run. Default is true.

workplace_search.content_source.sync.enabled: true

workplace_search.content_source.sync.max_errors logo cloud

Configure how many errors to tolerate in a sync job. If the job encounters more total errors than this value, the job will fail. This only applies to errors tied to individual documents.

workplace_search.content_source.sync.max_errors: 1000

workplace_search.content_source.sync.max_consecutive_errors logo cloud

Configure how many errors in a row to tolerate in a sync job. If the job encounters more errors in a row than this value, the job will fail. This only applies to errors tied to individual documents.

workplace_search.content_source.sync.max_consecutive_errors: 10

workplace_search.content_source.sync.max_error_ratio logo cloud

Configure the ratio of <errored documents> / <total documents> to tolerate in a sync job or in a rolling window (see workplace_search.content_source.sync.error_ratio_window_size). If the job encounters an error ratio greater than this value in a given window, or overall at the end of the job, the job will fail. This only applies to errors tied to individual documents.

workplace_search.content_source.sync.max_error_ratio: 0.15

workplace_search.content_source.sync.error_ratio_window_size logo cloud

Configure how large of a window to consider when calculating an error ratio (see workplace_search.content_source.sync.max_error_ratio).

workplace_search.content_source.sync.error_ratio_window_size: 100

workplace_search.content_source.sync.thumbnails.enabled logo cloud

Configure whether or not a content source should generate thumbnails for the documents it syncs. Not all file types/sizes/content or Content Sources support thumbnail generation, even if this is enabled.

workplace_search.content_source.sync.thumbnails.enabled: true

workplace_search.content_source.indexing.rules.limit logo cloud

Configure how many indexing rules a content source can have.

workplace_search.content_source.indexing.rules.limit: 100

workplace_search.content_source.sync.refresh_interval.full logo cloud

Configure the refresh interval for full sync job (in ISO 8601 Duration format).

workplace_search.content_source.sync.refresh_interval.full: P3D

workplace_search.content_source.sync.refresh_interval.incremental logo cloud

Configure the refresh interval for incremental sync job (in ISO 8601 Duration format).

workplace_search.content_source.sync.refresh_interval.incremental: PT2H

workplace_search.content_source.sync.refresh_interval.delete logo cloud

Configure the refresh interval for delete sync job (in ISO 8601 Duration format).

workplace_search.content_source.sync.refresh_interval.delete: PT6H

workplace_search.content_source.sync.refresh_interval.permissions logo cloud

Configure the refresh interval for permissions sync job (in ISO 8601 Duration format).

workplace_search.content_source.sync.refresh_interval.permissions: PT5M

workplace_search.content_source.salesforce.enable_cases logo cloud

Configure whether or not Salesforce and Salesforce Sandbox connectors should sync Cases.

workplace_search.content_source.salesforce.enable_cases: true

workplace_search.synonyms.sets.limit logo cloud

Configure total number of synonym sets a Workplace Search instance can have.

workplace_search.synonyms.sets.limit: 256

workplace_search.synonyms.terms_per_set.limit logo cloud

Configure total number of terms an individual synonym set can have.

workplace_search.synonyms.terms_per_set.limit: 32

workplace_search.remote_sources.query_timeout logo cloud

Configure the query timeout (in milliseconds) for remote sources via the Search API.

workplace_search.remote_sources.query_timeout: 10000

workplace_search.content_source.localhost_base_urls.enabled

Configure whether to allow localhost URLs as base URLs in content sources (by default, they are not allowed).

workplace_search.content_source.localhost_base_urls.enabled: false

workplace_search.content_source.external.unsafe_backend_allowed

Configure whether to allow unsafe HTTP backends for connectors (typically for localhost development). Defaults to false (HTTPS is enforced).

workplace_search.content_source.external.unsafe_backend_allowed: true

App Search
edit
app_search.engine.document_size.limit logo cloud

Configure the maximum allowed document size.

app_search.engine.document_size.limit: 100kb

app_search.engine.total_fields.limit logo cloud

Configure how many fields an engine can have. The Elasticsearch/Lucene setting indices.query.bool.max_clause_count might also need to be adjusted if "Max clause count exceeded" errors start occurring. See Search settings in the Elasticsearch documentation.

app_search.engine.total_fields.limit: 64

app_search.engine.source_engines_per_meta_engine.limit logo cloud

Configure how many source engines a meta engine can have.

app_search.engine.source_engines_per_meta_engine.limit: 15

app_search.engine.total_facet_values_returned.limit logo cloud

Configure how many facet values can be returned by a search.

app_search.engine.total_facet_values_returned.limit: 250

app_search.engine.query.limit logo cloud

Configure how big full-text queries are allowed. The Elasticsearch/Lucene setting indices.query.bool.max_clause_count might also need to be adjusted if "Max clause count exceeded" errors start occurring. See Search settings in the Elasticsearch documentation.

app_search.engine.query.limit: 128

app_search.engine.synonyms.sets.limit logo cloud

Configure total number of synonym sets an engine can have.

app_search.engine.synonyms.sets.limit: 256

app_search.engine.synonyms.terms_per_set.limit logo cloud

Configure total number of terms a synonym set can have.

app_search.engine.synonyms.terms_per_set.limit: 32

app_search.engine.analytics.total_tags.limit logo cloud

Configure how many analytics tags can be associated with a single query or clickthrough.

app_search.engine.analytics.total_tags.limit: 16

Workers

edit
worker.threads

Configure the number of worker threads.

worker.threads: 1

APIs

edit
hide_version_info

Set to true hide product version information from API responses.

hide_version_info: false

Mailer

edit
email.account.enabled

Connect Enterprise Search to a mailer. See Configuring a mail service.

email.account.enabled: false

email.account.smtp.auth

Connect Enterprise Search to a mailer. See Configuring a mail service.

email.account.smtp.auth: plain

email.account.smtp.starttls.enable

Connect Enterprise Search to a mailer. See Configuring a mail service.

email.account.smtp.starttls.enable: false

email.account.smtp.host

Connect Enterprise Search to a mailer. See Configuring a mail service.

email.account.smtp.host: 127.0.0.1

email.account.smtp.port

Connect Enterprise Search to a mailer. See Configuring a mail service.

email.account.smtp.port: 25

email.account.smtp.user

Connect Enterprise Search to a mailer. See Configuring a mail service.

email.account.smtp.password

Connect Enterprise Search to a mailer. See Configuring a mail service.

email.account.email_defaults.from

Connect Enterprise Search to a mailer. See Configuring a mail service.

Logging

edit
log_directory

Choose your log export path.

log_directory: log

log_level

Log level can be: debug, info, warn, error, fatal, or unknown.

log_level: info

In 7.x versions prior to 7.17.16 and 8.x versions prior to 8.11.2, Documents API logs the raw content of indexed documents at the info log level. Starting in 7.17.16+ for 7.x versions and 8.11.2+ for 8.x versions, it only logs the raw content of indexed documents at the debug log level.

log_format

Log format can be: default, json

log_format: default

filebeat_log_directory

Choose your Filebeat logs export path.

filebeat_log_directory: log

ilm.enabled

This setting is deprecated and ILM can no longer be disabled. The index lifecycle policies that Enterprise Search creates can be managed in Kibana. See the ILM documentation.

ilm.enabled: true

enable_stdout_app_logging

Enable logging app logs to stdout (enabled by default).

enable_stdout_app_logging: true

log_rotation.keep_files

The number of files to keep on disk when rotating logs. When set to 0, no rotation will take place.

log_rotation.keep_files: 7

log_rotation.rotate_every_bytes

The maximum file size in bytes before rotating the log file. If log_rotation.keep_files is set to 0, no rotation will take place and there will be no size limit for the singular log file.

log_rotation.rotate_every_bytes: 1048576 # 1 MiB

connector.crawler.logging.events.enabled

Enable or disable indexing of Elasticsearch Crawler Event logs. These are enabled by default. Disabling these will impact dashboards and analytics.

connector.crawler.logging.events.enabled: true

TLS/SSL

edit
ent_search.ssl.enabled

Configure TLS/SSL encryption.

ent_search.ssl.enabled: false

ent_search.ssl.keystore.path

Configure TLS/SSL encryption.

ent_search.ssl.keystore.password

Configure TLS/SSL encryption.

ent_search.ssl.keystore.key_password

Configure TLS/SSL encryption.

ent_search.ssl.redirect_http_from_port

Configure TLS/SSL encryption.

Session

edit
secret_session_key

Set a session key to persist user sessions through process restarts.

APM Instrumentation

edit
apm.enabled logo cloud

Enable Elastic APM agent within Enterprise Search.

apm.enabled: true

apm.server_url logo cloud

Set the custom APM Server URL.

apm.server_url: 'http://localhost:8200'

apm.secret_token logo cloud

Set the APM authentication token (use if APM Server requires a secret token).

apm.secret_token: 'your-token-here'

apm.service_name

Override the APM service name. Allowed characters: a-z, A-Z, 0-9, -, _ and space.

apm.service_name: 'Enterprise Search'

apm.environment

Override the APM service environment.

apm.environment: 'production'

Monitoring

edit
monitoring.reporting_enabled logo cloud

Enable automatic monitoring metrics reporting to Elasticsearch via metricbeat.

monitoring.reporting_enabled: false

monitoring.reporting_period logo cloud

Configure metrics reporting frequency. This setting should be aligned with monitoring.ui.min_interval_seconds setting in Kibana, or Stack Monitoring dashboards for Enterprise Search may have gaps in graphs on high metric resolutions.

monitoring.reporting_period: 10s

monitoring.metricsets logo cloud

Configure metricsets to be reported to Elasticsearch

monitoring.metricsets: ['health', 'stats']

monitoring.index_prefix logo cloud

Override the index name prefix used to index Enterprise Search metrics. The index will have ILM enabled and will be managed by Enterprise Search.

monitoring.index_prefix: metricbeat-ent-search

Diagnostics report

edit
diagnostic_report_directory

Path where diagnostic reports will be generated.

diagnostic_report_directory: diagnostics

Elastic web crawler

edit

If you are looking for the App Search web crawler configuration documentation, see the App Search web crawler configuration docs. To compare features with the App Search web crawler, see Web crawler.

connector.crawler.http.user_agent logo cloud

The User-Agent HTTP Header used for the Elastic web crawler.

connector.crawler.http.user_agent: Elastic-Crawler (<crawler_version_number>)

When running Elastic Web Crawler on Elastic Cloud, the default user agent value is Elastic-Crawler Elastic Cloud (https://www.elastic​.co/guide/en/cloud/current/ec-get-help.html; <unique identifier>).

connector.crawler.http.user_agent_platform

The user agent platform used for the Elastic web crawler with identifying information. See User-Agent - Syntax in the MDN web docs.

This value will be added as a suffix to connector.crawler.http.user_agent and used as the final User-Agent Header. This value is blank by default.

connector.crawler.workers.pool_size.limit logo cloud

The number of parallel crawls allowed per instance of Enterprise Search. By default, it is set to 2x the number of available logical CPU cores. On Intel CPUs, the default value is 4x the number of physical CPU cores due to hyper-threading. See Hyper-threading on Wikipedia.

connector.crawler.workers.pool_size.limit: N

You cannot set connector.crawler.workers.pool_size.limit to more than 8x the number of physical CPU cores available for the Enterprise Search instance.

Keep in mind that despite the setting above, you can still only have one crawl request running per engine at a time.

Per-crawl Resource Limits
edit

These limits guard against infinite loops and other traps common to production web crawlers. If your crawler is hitting these limits, try changing your crawl rules or the content you’re crawling. Adjust these limits as a last resort.

connector.crawler.crawl.max_duration.limit logo cloud

The maximum duration of a crawl, in seconds. Beyond this limit, the Elastic web crawler will stop, abandoning all remaining URLs in the crawl queue.

connector.crawler.crawl.max_duration.limit: 86400 # seconds

connector.crawler.crawl.max_crawl_depth.limit logo cloud

The maximum number of sequential pages the Elastic web crawler will traverse starting from the given set of entry points. Beyond this limit, the web crawler will stop discovering new links.

connector.crawler.crawl.max_crawl_depth.limit: 10

connector.crawler.crawl.max_url_length.limit logo cloud

The maximum number of characters within each URL to crawl. The Elastic web crawler will skip URLs that exceed this length.

connector.crawler.crawl.max_url_length.limit: 2048

connector.crawler.crawl.max_url_segments.limit logo cloud

The maximum number of segments within the path of each URL to crawl. The Elastic web crawler will skip URLs whose paths exceed this length. Example: The path /a/b/c/d has 4 segments.

connector.crawler.crawl.max_url_segments.limit: 16

connector.crawler.crawl.max_url_params.limit logo cloud

The maximum number of query parameters within each URL to crawl. The Elastic web crawler will skip URLs that exceed this length. Example: The query string in /a?b=c&d=e has 2 query parameters.

connector.crawler.crawl.max_url_params.limit: 32

connector.crawler.crawl.max_unique_url_count.limit logo cloud

The maximum number of unique URLs the Elastic web crawler will index during a single crawl. Beyond this limit, the web crawler will stop.

connector.crawler.crawl.max_unique_url_count.limit: 100000

Advanced Per-crawl Limits
edit
connector.crawler.crawl.threads.limit logo cloud

The number of parallel threads to use for each crawl. The main effect from increasing this value will be an increased throughput of the Elastic web crawler at the expense of higher CPU load on Enterprise Search and Elasticsearch instances as well as higher load on the website being crawled.

connector.crawler.crawl.threads.limit: 10

connector.crawler.crawl.url_queue.url_count.limit logo cloud

The maximum size of the crawl frontier - the list of URLs the Elastic web crawler needs to visit. The list is stored in Elasticsearch, so the limit could be increased as long as the Elasticsearch cluster has enough resources (disk space) to hold the queue index.

connector.crawler.crawl.url_queue.url_count.limit: 100000

Per-Request Timeout Limits
edit
connector.crawler.http.connection_timeout logo cloud

The maximum period to wait until abortion of the request, when a connection is being initiated.

connector.crawler.http.connection_timeout: 10 # seconds

connector.crawler.http.read_timeout logo cloud

The maximum period of inactivity between two data packets, before the request is aborted.

connector.crawler.http.read_timeout: 10 # seconds

connector.crawler.http.request_timeout logo cloud

The maximum period of the entire request, before the request is aborted.

connector.crawler.http.request_timeout: 60 # seconds

Per-Request Resource Limits
edit
connector.crawler.http.response_size.limit logo cloud

The maximum size of an HTTP response (in bytes) supported by the Elastic web crawler.

connector.crawler.http.response_size.limit: 10485760

connector.crawler.http.redirects.limit logo cloud

The maximum number of HTTP redirects before a request is failed.

connector.crawler.http.redirects.limit: 10

Content Extraction Resource Limits

edit
connector.crawler.extraction.title_size.limit logo cloud

The maximum size (in bytes) of some fields extracted from crawled pages.

connector.crawler.extraction.title_size.limit: 1024

connector.crawler.extraction.body_size.limit logo cloud

The maximum size (in bytes) of some fields extracted from crawled pages.

connector.crawler.extraction.body_size.limit: 5242880

connector.crawler.extraction.keywords_size.limit logo cloud

The maximum size (in bytes) of some fields extracted from crawled pages.

connector.crawler.extraction.keywords_size.limit: 512

connector.crawler.extraction.description_size.limit logo cloud

The maximum size (in bytes) of some fields extracted from crawled pages.

connector.crawler.extraction.description_size.limit: 1024

connector.crawler.extraction.extracted_links_count.limit logo cloud

The maximum number of links extracted from each page for further crawling.

connector.crawler.extraction.extracted_links_count.limit: 1000

connector.crawler.extraction.indexed_links_count.limit logo cloud

The maximum number of links extracted from each page and indexed in a document.

connector.crawler.extraction.indexed_links_count.limit: 25

connector.crawler.extraction.headings_count.limit logo cloud

The maximum number of HTML headers to be extracted from each page.

connector.crawler.extraction.headings_count.limit: 25

connector.crawler.extraction.default_deduplication_fields logo cloud

Default document fields used to compare documents during de-duplication.

connector.crawler.extraction.default_deduplication_fields: ['title', 'body_content', 'meta_keywords', 'meta_description', 'links', 'headings']

Elastic web crawler HTTP Security Controls
edit
connector.crawler.security.ssl.certificate_authorities logo cloud

A list of custom SSL Certificate Authority certificates to be used for all connections made by the Elastic web crawler to your websites. These certificates are added to the standard list of CA certificates trusted by the JVM. Each item in this list could be a file name of a certificate in PEM format or a PEM-formatted certificate as a string.

connector.crawler.security.ssl.certificate_authorities: []

connector.crawler.security.ssl.verification_mode logo cloud

Control SSL verification mode used by the Elastic web crawler:

  • full - validate both the SSL certificate and the hostname presented by the server (this is the default and the recommended value)
  • certificate - only validate the SSL certificate presented by the server
  • none - disable SSL validation completely (this is very dangerous and should never be used in production deployments).

    connector.crawler.security.ssl.verification_mode: full

connector.crawler.security.auth.allow_http logo cloud

Allow/Disallow authenticated crawling of non-HTTPS URLs:

  • false - Do not allow crawling non-HTTPS URLs (this is the default and the recommended value)
  • true - Allow crawling non-HTTPS URLs

Enabling this setting could expose your Authorization headers to a man-in-the-middle attack and should never be used in production deployments. See https://en.wikipedia.org/wiki/Man-in-the-middle_attack for more details.

Elastic web crawler DNS Security Controls
edit

The settings in this section could make your deployment vulnerable to SSRF attacks (especially in cloud environments) from the owners of any domains you crawl. Do not enable any of the settings here unless you fully control DNS domains you access with the Elastic web crawler. See Server Side Request Forgery on OWASP for more details on the SSRF attack and the risks associated with it.

connector.crawler.security.dns.allow_loopback_access

Allow the Elastic web crawler to access the localhost (127.0.0.0/8 IP namespace).

connector.crawler.security.dns.allow_loopback_access: false

connector.crawler.security.dns.allow_private_networks_access

Allow the Elastic web crawler to access the private IP space: link-local, network-local addresses, etc. See Reserved IP addresses - IPv4 on Wikipedia for more details.

connector.crawler.security.dns.allow_private_networks_access: false

Elastic web crawler HTTP proxy settings
edit

If you need the Elastic web crawler to send HTTP requests through an HTTP proxy, use the following settings to provide the proxy information to Enterprise Search.

Your proxy connections are subject to the DNS security controls described in Elastic web crawler DNS Security Controls. If your proxy server is running on a private address or a loopback address, you will need to explicitly allow the Elastic web crawler to connect to it.

connector.crawler.http.proxy.host logo cloud

The host of the proxy.

connector.crawler.http.proxy.host: example.com

connector.crawler.http.proxy.port logo cloud

The port of the proxy.

connector.crawler.http.proxy.port: 8080

connector.crawler.http.proxy.protocol logo cloud

The protocol to be used when connecting to the proxy: http (default) or https.

connector.crawler.http.proxy.protocol: http

connector.crawler.http.proxy.username logo cloud

The username portion of the Basic HTTP credentials to be used when connecting to the proxy.

connector.crawler.http.proxy.username: kimchy

connector.crawler.http.proxy.password logo cloud

The password portion of the Basic HTTP credentials to be used when connecting to the proxy.

connector.crawler.http.proxy.password: A3renEWhGVxgYFIqfPAV73ncUtPN1b

Advanced Elastic web crawler tuning
edit
connector.crawler.http.compression.enabled

Enable/disable HTTP content (gzip/deflate) compression in Elastic web crawler requests.

connector.crawler.http.compression.enabled: true

connector.crawler.http.default_encoding

Default encoding used for responses that do not specify a charset.

connector.crawler.http.default_encoding: UTF-8

connector.crawler.http.head_requests.enabled

Enable/disable performing HEAD requests before GET requests when crawling websites. Enabling HEAD requests allows Crawler to decide whether or not to download a page based on its content-type header. This can speed up crawls for websites that contain many unindexable binary files. This setting is false by default.

connector.crawler.http.head_requests.enabled: true

Read-only mode

edit
skip_read_only_check

If true, pending migrations can be executed without enabling read-only mode. Proceeding with migrations while indices are allowing writes can have unintended consequences. Use at your own risk, should not be set to true when upgrading a production instance with ongoing traffic.

skip_read_only_check: false

Environment variables reference

edit

Self-managed deployments can set default values for the following environment variables read by Enterprise Search.

Set these values within config/env.sh.

JAVA_OPTS

Java options for JVM tuning (used for app-server and CLI commands).

export JAVA_OPTS=${JAVA_OPTS:-"-Xms2g -Xmx2g"}

APP_SERVER_JAVA_OPTS

Additional Java options for the application server.

export APP_SERVER_JAVA_OPTS="${APP_SERVER_JAVA_OPTS:-}"

JAVA_GC_LOGGING

Enable Java GC logging (see below for the default configuration).

export JAVA_GC_LOGGING=true

JAVA_GC_LOG_DIR

Where to put the files.

export JAVA_GC_LOG_DIR=log

JAVA_GC_LOG_KEEP_FILES

How many of the most recent files to keep.

export JAVA_GC_LOG_KEEP_FILES=10

JAVA_GC_LOG_MAX_FILE_SIZE

How big GC logs should grow before triggering log rotation.

export JAVA_GC_LOG_MAX_FILE_SIZE=10m

Additional configuration tasks

edit

Refer to the following for further documentation on specific configuration tasks: