Elastic web crawler
editElastic web crawler
editLooking for the App Search web crawler? See the App Search documentation.
To compare the web crawler with the App Search web crawler, see the reference table on this page.
This feature is not available at all Elastic subscription levels. Refer to the Elastic subscriptions pages for Elastic Cloud and self-managed deployments.
Overview
editUse the web crawler to programmatically discover, extract, and index searchable content from websites and knowledge bases. When you ingest data with the web crawler a search-optimized Elasticsearch index is created to hold and sync webpage content.
The web crawler is a native Elasticsearch solution. It reads and writes directly to Elasticsearch indices in a format that enables developers to build intuitive, relevant search experiences using App Search engines and the Search UI library.
Web crawler documentation:
- Getting started with website search: Concrete guide to building a website search experience, using the crawler UI.
-
Managing crawls: Detailed reference for managing crawls using the Kibana UI. Learn how to:
- Manage duplicated documents
- Extract binary content such as PDFs from webpages
- Schedule automated crawls
-
Optimizing web content: Optimize your web content source files for the web crawler, to manage webpage discovery and content extraction. Learn about:
- Custom fields using proxy: How to extract custom fields from webpages using a proxy server.
- Troubleshooting crawls: Detailed troubleshooting reference
- Web crawler events logs reference: Detailed web crawler events logs reference
-
View web crawler events logs: How to view web crawler events logs in Kibana
Appendix: Compare the web crawler and App Search web crawler
edit
App Search web crawler |
Web crawler |
|
Interface |
GUI / API |
GUI-only |
Binary content extraction |
Yes |
Yes |
Search |
App Search |
Elasticsearch / App Search using Elasticsearch search API for App Search |
Ingest pipelines |
Yes |
Yes |
Monitoring |
Yes |
Yes |
APM |
Yes |
Yes |
Audit logging |
Yes |
No |
Event logging |
Yes |
Yes |
Public REST API |
Yes |
No |