IMPORTANT: No additional bug fixes or documentation updates
will be released for this version. For the latest information, see the
current release documentation.
Configuration
editConfiguration
editUse the following files to configure App Search, Workplace Search, and shared concerns within Enterprise Search:
-
config/enterprise-search.yml
-
Defines the configuration for the deployment.
-
config/env.sh
-
Defines default values for environment variables used by Enterprise Search.
Refer to the following docs for specific configuration tasks:
Enterprise Search configuration settings
editSet the configuration for Enterprise Search within config/enterprise-search.yml
:
## ================= Elastic Enterprise Search Configuration ================== # # NOTE: Elastic Enterprise Search comes with reasonable defaults. # Before adjusting the configuration, make sure you understand what you # are trying to accomplish and the consequences. # # NOTE: For passwords, the use of environment variables is encouraged # to keep values from being written to disk, e.g. # elasticsearch.password: ${ELASTICSEARCH_PASSWORD:changeme} # # ---------------------------------- Secrets ---------------------------------- # # Encryption keys to protect your application secrets. This field is required. # #secret_management.encryption_keys: [] # # ------------------------------- Elasticsearch ------------------------------- # # Enterprise Search needs one-time permission to alter Elasticsearch settings. # Ensure the Elasticsearch settings are correct, then set the following to # true. Or, adjust Elasticsearch's config/elasticsearch.yml instead. # See README.md for more details. # #allow_es_settings_modification: false # # Elasticsearch full cluster URL: # #elasticsearch.host: http://127.0.0.1:9200 # # Elasticsearch credentials: # #elasticsearch.username: elastic #elasticsearch.password: changeme # # Elasticsearch custom HTTP headers to add to each request: # #elasticsearch.headers: # X-My-Header: Contents of the header # # Elasticsearch SSL settings: # #elasticsearch.ssl.enabled: false #elasticsearch.ssl.certificate: #elasticsearch.ssl.certificate_authority: #elasticsearch.ssl.key: #elasticsearch.ssl.key_passphrase: #elasticsearch.ssl.verify: true # # Elasticsearch startup retry: # #elasticsearch.startup_retry.enabled: true #elasticsearch.startup_retry.interval: 5 # seconds #elasticsearch.startup_retry.fail_after: 600 # seconds # # ---------------------------------- Kibana ----------------------------------- # # The primary URL at which users interact with Kibana. This is used when # Enterprise Search links users to Kibana. # #kibana.external_url: # # Kibana startup retry: # #kibana.startup_retry.enabled: false #kibana.startup_retry.interval: 5 # seconds #kibana.startup_retry.fail_after: 600 # seconds # # ------------------------------- Hosting & Network --------------------------- # # Define the exposed URL at which users will reach Enterprise Search. # Defaults to localhost:3002 for testing purposes. # Most cases will use one of: # # * An IP: http://255.255.255.255 # * A FQDN: http://example.com # * Shortname defined via /etc/hosts: http://ent-search.search # #ent_search.external_url: http://localhost:3002 # # Web application listen_host and listen_port. # Your application will run on this host and port. # # * ent_search.listen_host: Must be a valid IPv4 or IPv6 address. # * ent_search.listen_port: Must be a valid port number (1-65535). # #ent_search.listen_host: 127.0.0.1 #ent_search.listen_port: 3002 # # ------------------------------ Authentication ------------------------------- # # Authentication settings are used for the standalone Enterprise Search interface. # More info: https://www.elastic.co/guide/en/enterprise-search/7.14/user-interfaces.html # # Auth name associated with the options being set up. If realm chains are # configured in elasticsearch.yml for the associated Elasticsearch instance, # then the names of the realms should also be used here. Multiple auth # providers may be configured. Each must have a unique name. # #ent_search.auth.<auth_name> # # The origin of authenticated Enterprise Search users. # Options are elasticsearch-native and elasticsearch-saml. # # Docs: https://www.elastic.co/guide/en/enterprise-search/7.14/users-access.html # # * elasticsearch-native: Users are managed via the Elasticsearch native realm. # * elasticsearch-saml: Users are managed via the Elasticsearch SAML realm. # #ent_search.auth.<auth_name>.source: # # Auth providers are consulted in ascending order (that is to say, the realm # with the lowest order value is consulted first). You should make sure each # configured realm has a distinct order setting. # #ent_search.auth.<auth_name>.order: # # The name to be displayed on the login screen associated with this provider. # #ent_search.auth.<auth_name>.description: # # The URL to an icon to be displayed on the login screen associated with this # provider. # #ent_search.auth.<auth_name>.icon: # # Boolean value to determine whether or not to display this login option on the # login screen. It is common to hide an option if you would like to create role # mappings before allowing the option to be used as a valid login mechanism. # #ent_search.auth.<auth_name>.hidden: false # # Adds a message to the login screen. Useful for displaying information about # maintenance windows, links to corporate sign up pages, etc. This field # supports Markdown. # #ent_search.login_assistance_message: # # ---------------------------------- Limits ----------------------------------- # # Configurable limits for Enterprise Search. # NOTE: Overriding the default limits can impact performance negatively. # Also, changing a limit here does not actually guarantee that # Enterprise Search will work as expected as related Elasticsearch limits # can be exceeded. # #### Workplace Search # # Configure the maximum allowed document size for a content source. # #workplace_search.content_source.document_size.limit: 100kb # # Configure how many fields a content source can have. # NOTE: The Elasticsearch/Lucene setting `indices.query.bool.max_clause_count` # might also need to be adjusted if "Max clause count exceeded" errors start # occurring. See more here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-settings.html # #workplace_search.content_source.total_fields.limit: 64 # # Configure how many errors to tolerate in a sync job. # If the job encounters more total errors than this value, the job will fail. # NOTE: this only applies to errors tied to individual documents. # #workplace_search.content_source.sync.max_errors: 1000 # # Configure how many errors in a row to tolerate in a sync job. # If the job encounters more errors in a row than this value, the job will fail. # NOTE: this only applies to errors tied to individual documents. # #workplace_search.content_source.sync.max_consecutive_errors: 10 # # Configure the ratio of <errored documents> / <total documents> to tolerate in a sync job # or in a rolling window (see `workplace_search.content_source.sync.error_ratio_window_size`). # If the job encounters an error ratio greater than this value in a given window, or overall # at the end of the job, the job will fail. # NOTE: this only applies to errors tied to individual documents. # #workplace_search.content_source.sync.max_error_ratio: 0.15 # # Configure how large of a window to consider when calculating an error ratio # (see `workplace_search.content_source.sync.max_error_ratio`). # #workplace_search.content_source.sync.error_ratio_window_size: 100 # # Configure whether or not a content source should generate thumbnails for the documents # it syncs. Not all file types/sizes/content or Content Sources support thumbnail generation, # even if this is enabled. # #workplace_search.content_source.sync.thumbnails.enabled: true # # Configure how many indexing rules a content source can have. # #workplace_search.content_source.indexing.rules.limit: 100 # #### App Search # # Configure the maximum allowed document size. # #app_search.engine.document_size.limit: 100kb # # Configure how many fields an engine can have. # NOTE: The Elasticsearch/Lucene setting `indices.query.bool.max_clause_count` # might also need to be adjusted if "Max clause count exceeded" errors start # occurring. See more here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-settings.html # #app_search.engine.total_fields.limit: 64 # # Configure how many source engines a meta engine can have. # #app_search.engine.source_engines_per_meta_engine.limit: 15 # # Configure how many facet values can be returned by a search. # #app_search.engine.total_facet_values_returned.limit: 250 # # Configure how big full-text queries are allowed. # NOTE: The Elasticsearch/Lucene setting `indices.query.bool.max_clause_count` # might also need to be adjusted if "Max clause count exceeded" errors start # occurring. See more here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-settings.html # #app_search.engine.query.limit: 128 # # Configure total number of synonym sets an engine can have. # #app_search.engine.synonyms.sets.limit: 256 # # Configure total number of terms a synonym set can have. # #app_search.engine.synonyms.terms_per_set.limit: 32 # # Configure how many analytics tags can be associated with a single query or clickthrough. # #app_search.engine.analytics.total_tags.limit: 16 # # ---------------------------------- Workers ---------------------------------- # # Configure the number of worker threads. # #worker.threads: 1 # # ----------------------------------- APIs ------------------------------------ # # Set to true hide product version information from API responses. # #hide_version_info: false # # ---------------------------------- Mailer ----------------------------------- # # Connect Enterprise Search to a mailer. # Docs: https://www.elastic.co/guide/en/workplace-search/current/workplace-search-smtp-mailer.html # #email.account.enabled: false #email.account.smtp.auth: plain #email.account.smtp.starttls.enable: false #email.account.smtp.host: 127.0.0.1 #email.account.smtp.port: 25 #email.account.smtp.user: #email.account.smtp.password: #email.account.email_defaults.from: # # ---------------------------------- Logging ---------------------------------- # # Choose your log export path. # #log_directory: log # # Log level can be: debug, info, warn, error, fatal, or unknown. # #log_level: info # # Log format can be: default, json # #log_format: default # # Choose your Filebeat logs export path. # #filebeat_log_directory: log # # Use Index Lifecycle Management (ILM) to manage analytics and API logs # retention. # # auto: Use ILM when supported by the underlying Elasticsearch cluster # true: Use ILM (requires ILM support in the underlying Elasticsearch cluster) # false: Don't use ILM (analytics and API logs will grow unconstrained) # # Docs: https://www.elastic.co/guide/en/app-search/current/logs.html # #ilm.enabled: auto # # Enable logging app logs to stdout (enabled by default). # #enable_stdout_app_logging: true # # The number of files to keep on disk when rotating logs. When set to 0, no # rotation will take place. # #log_rotation.keep_files: 7 # # The maximum file size in bytes before rotating the log file. If # log_rotation.keep_files is set to 0, no rotation will take place and there # will be no size limit for the singular log file. # #log_rotation.rotate_every_bytes: 1048576 # 1 MiB # # ---------------------------------- TLS/SSL ---------------------------------- # # Configure TLS/SSL encryption. # #ent_search.ssl.enabled: false #ent_search.ssl.keystore.path: #ent_search.ssl.keystore.password: #ent_search.ssl.keystore.key_password: #ent_search.ssl.redirect_http_from_port: # # ---------------------------------- Session ---------------------------------- # # Set a session key to persist user sessions through process restarts. # #secret_session_key: # # --------------------------------- Telemetry --------------------------------- # # Reporting your basic feature usage statistics helps us improve your user # experience. Your data is never shared with anyone. # # Set to false to disable telemetry capabilities entirely. You can alternatively # opt out through the Settings page. # #telemetry.enabled: true # # If false, collection of telemetry data is disabled; however, it can be # enabled via the Settings page if telemetry.allow_changing_opt_in_status is # true. # #telemetry.opt_in: true # # If true, users are able to change the telemetry setting at a later time # through the Settings page. If false, the value of telemetry.opt_in determines # whether to send telemetry data or not. # #telemetry.allow_changing_opt_in_status: true # # ----------------------------- Diagnostics report ---------------------------- # # Path where diagnostic reports will be generated. # #diagnostic_report_directory: diagnostics # # ------------------------------ Crawler Preview ------------------------------ # # The User-Agent HTTP Header used for the Crawler. # #crawler.http.user_agent: Elastic Crawler (<crawler_version_number>) # # The user agent platform used for the Crawler with identifying information: # https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent#syntax # Note: this value will be added as a suffix to `crawler.http.user_agent` and # used as the final User-Agent Header. # #crawler.http.user_agent_platform: # # The number of parallel crawls allowed per instance of Enterprise Search. # By default, it is set to 2x the number of available CPU cores. # #crawler.workers.pool_size.limit: N # # ------------------------- # Per-crawl Resource Limits # ------------------------- # # These limits guard against infinite loops and other traps common to # production web crawlers. # # If your crawler is hitting these limits, try changing your crawl rules # or the content you're crawling. Adjust these limits as a last resort. # # # The maximum duration of a crawl, in seconds. Beyond this limit, the # crawler will stop, abandoning all remaining URLs in the crawl queue. # #crawler.crawl.max_duration.limit: 86400 # seconds # # The maximum number of sequential pages the crawler will traverse starting # from the given set of entry points. Beyond this limit, the crawler will stop # discovering new links. # #crawler.crawl.max_crawl_depth.limit: 10 # # The maximum number of characters within each URL to crawl. The crawler # will skip URLs that exceed this length. # #crawler.crawl.max_url_length.limit: 2048 # # The maximum number of segments within the path of each URL to crawl. The # crawler will skip URLs whose paths exceed this length. # Example: The path '/a/b/c/d' has 4 segments. # #crawler.crawl.max_url_segments.limit: 16 # # The maximum number of query parameters within each URL to crawl. The # crawler will skip URLs that exceed this length. # Example: The query string in '/a?b=c&d=e' has 2 query parameters. # #crawler.crawl.max_url_params.limit: 32 # # The maximum number of unique URLs the crawler will index during a single crawl. # Beyond this limit, the crawler will stop. # #crawler.crawl.max_unique_url_count.limit: 100000 # # ------------------------- # Advanced Per-crawl Limits # ------------------------- # # The number of parallel threads to use for each crawl. # The main effect from increasing this value will be an increased throughput # of the crawler at the expense of higher CPU load on Enterprise Search and # Elasticsearch instances as well as higher load on the website being crawled. # #crawler.crawl.threads.limit: 10 # # The maximum size of the crawl frontier - the list of URLs the crawler needs to visit. # The list is stored in Elasticsearch, so the limit could be increased as long # as the Elasticsearch cluster has enough resources (disk space) to hold the queue index. # #crawler.crawl.url_queue.url_count.limit: 100000 # # --------------------------- # Per-Request Timeout Limits # --------------------------- # # The maximum period to wait until abortion of the request, when a connection is being initiated. # #crawler.http.connection_timeout: 10 # seconds # # The maximum period of inactivity between two data packets, before the request is aborted. # #crawler.http.read_timeout: 10 # seconds # # The maximum period of the entire request, before the request is aborted. # #crawler.http.request_timeout: 60 # seconds # # --------------------------- # Per-Request Resource Limits # --------------------------- # # The maximum size of an HTTP response (in bytes) supported by the crawler. # #crawler.http.response_size.limit: 10485760 # # The maximum number of HTTP redirects before a request is failed. # #crawler.http.redirects.limit: 10 # # ---------------------------------- # Content Extraction Resource Limits # ---------------------------------- # # The maximum size (in bytes) of some fields extracted from crawled pages # #crawler.extraction.title_size.limit: 1024 #crawler.extraction.body_size.limit: 5242880 #crawler.extraction.keywords_size.limit: 512 #crawler.extraction.description_size.limit: 1024 # # The maximum number of links extracted from each page for further crawling # #crawler.extraction.extracted_links_count.limit: 1000 # # The maximum number of links extracted from each page and indexed in a document # #crawler.extraction.indexed_links_count.limit: 25 # # The maximum number of HTML headers to be extracted from each page # #crawler.extraction.headings_count.limit: 25 # # Document fields used to compare documents during de-duplication # #crawler.extraction.content_hash_include: ['document_title', 'document_body', 'meta_keywords', 'meta_description', 'links', 'headings'] # # ------------------------------ # Crawler HTTP Security Controls # ------------------------------ # # A list of custom SSL Certificate Authority certificates to be used for all # connections made by the crawler to your websites. These certificates are added # to the standard list of CA certificates trusted by the JVM. # # Please note: Each item in this list could be a file name of a certificate in # PEM format or a PEM-formatted certificate as a string. # #crawler.security.ssl.certificate_authorities: [] # # Control SSL verification mode used by the crawler: # * full - validate both the SSL certificate and the hostname presented by the # server (this is the default and the recommended value) # * certificate - only validate the SSL certificate presented by the server # * none - disable SSL validation completely (this is very dangerous and # should never be used in production deployments). # #crawler.security.ssl.verification_mode: full # # ----------------------------- # Crawler DNS Security Controls # ----------------------------- # # WARNING: The settings in this section could make your deployment vulnerable to # SSRF attacks (especially in cloud environments) from the owners of any domains # you crawl. Do not enable any of the settings here unless you fully control DNS # domains you access with the crawler. # # See https://owasp.org/www-community/attacks/Server_Side_Request_Forgery for # more details on the SSRF attack and the risks associated with it. # # Allow crawler to access the localhost (127.0.0.0/8 IP namespace) # #crawler.security.dns.allow_loopback_access: false # # Allow crawler to access the private IP space: link-local, network-local addresses, etc # (see https://en.wikipedia.org/wiki/Reserved_IP_addresses#IPv4 for more details) # #crawler.security.dns.allow_private_networks_access: false # # ------------------------------ Read-only mode ------------------------------- # # If true, pending migrations can be executed without enabling read-only mode. # Proceeding with migrations while indices are allowing writes can have # unintended consequences. Use at your own risk, should not be set to true when # upgrading a production instance with ongoing traffic. # #skip_read_only_check: false #
Enterprise Search environment variables
editSet default values for environment variables used by Enterprise Search within config/env.sh
:
##################################################### ## Elastic Enterprise Search Environment Variables ## ##################################################### # Java options for JVM tuning (used for app-server and CLI commands) export JAVA_OPTS=${JAVA_OPTS:-"-Xms2g -Xmx2g"} # Additional Java options for the application server export APP_SERVER_JAVA_OPTS="${APP_SERVER_JAVA_OPTS:-}" #------------------------------------------------------------------------------ # Enable Java GC logging (see below for the default configuration) export JAVA_GC_LOGGING=true # Example Environment variables for further logging configuration: # Where to put the files # export JAVA_GC_LOG_DIR=log # # How many of the most recent files to keep # export JAVA_GC_LOG_KEEP_FILES=10 # # How big GC logs should grow before triggering log rotation # export JAVA_GC_LOG_MAX_FILE_SIZE=10m