- Elasticsearch for Apache Hadoop and Spark: other versions:
- Preface
- Elasticsearch for Apache Hadoop
- Documentation sections
- Key features
- Requirements
- Installation
- Architecture
- Configuration
- Runtime options
- Security
- Logging
- Map/Reduce integration
- Apache Hive integration
- Apache Pig support
- Apache Spark support
- Apache Storm support
- Mapping and Types
- Error Handlers
- Kerberos
- Hadoop Metrics
- Performance considerations
- Cloud/restricted environments
- Troubleshooting
- Resources
- License
- Breaking Changes
- Release Notes
- Elasticsearch for Apache Hadoop version 7.9.3
- Elasticsearch for Apache Hadoop version 7.9.2
- Elasticsearch for Apache Hadoop version 7.9.1
- Elasticsearch for Apache Hadoop version 7.9.0
- Elasticsearch for Apache Hadoop version 7.8.1
- Elasticsearch for Apache Hadoop version 7.8.0
- Elasticsearch for Apache Hadoop version 7.7.1
- Elasticsearch for Apache Hadoop version 7.7.0
- Elasticsearch for Apache Hadoop version 7.6.2
- Elasticsearch for Apache Hadoop version 7.6.1
- Elasticsearch for Apache Hadoop version 7.6.0
- Elasticsearch for Apache Hadoop version 7.5.2
- Elasticsearch for Apache Hadoop version 7.5.1
- Elasticsearch for Apache Hadoop version 7.5.0
- Elasticsearch for Apache Hadoop version 7.4.2
- Elasticsearch for Apache Hadoop version 7.4.1
- Elasticsearch for Apache Hadoop version 7.4.0
- Elasticsearch for Apache Hadoop version 7.3.2
- Elasticsearch for Apache Hadoop version 7.3.1
- Elasticsearch for Apache Hadoop version 7.3.0
- Elasticsearch for Apache Hadoop version 7.2.1
- Elasticsearch for Apache Hadoop version 7.2.0
- Elasticsearch for Apache Hadoop version 7.1.1
- Elasticsearch for Apache Hadoop version 7.1.0
- Elasticsearch for Apache Hadoop version 7.0.1
- Elasticsearch for Apache Hadoop version 7.0.0
- Elasticsearch for Apache Hadoop version 7.0.0-rc2
- Elasticsearch for Apache Hadoop version 7.0.0-rc1
- Elasticsearch for Apache Hadoop version 7.0.0-beta1
- Elasticsearch for Apache Hadoop version 7.0.0-alpha2
- Elasticsearch for Apache Hadoop version 7.0.0-alpha1
Hadoop Metrics
editHadoop Metrics
editThe Hadoop system records a set of metric counters for each job that it runs. elasticsearch-hadoop extends on that and provides metrics about its activity for each job run by leveraging the Hadoop Counters infrastructure. During each run, elasticsearch-hadoop sends statistics from each task instance, as it is running, which get aggregated by the Map/Reduce infrastructure and are available through the standard Hadoop APIs.
elasticsearch-hadoop provides the following counters, available under org.elasticsearch.hadoop.mr.Counter
enum:
Table 11. Available counters
Counter name | Purpose |
---|---|
Data focused |
|
BYTES_SENT |
Total number of data/communication bytes sent over the network to Elasticsearch |
BYTES_ACCEPTED |
Data/Documents accepted by Elasticsearch in bytes |
BYTES_RETRIED |
Data/Documents rejected by Elasticsearch in bytes |
BYTES_RECEIVED |
Data/Documents received from Elasticsearch in bytes |
Document focused |
|
DOCS_SENT |
Number of docs sent over the network to Elasticsearch |
DOCS_ACCEPTED |
Number of documents sent and accepted by Elasticsearch |
DOCS_RETRIED |
Number of documents sent but rejected by Elasticsearch |
DOCS_RECEIVED |
Number of documents received from Elasticsearch |
Network focused |
|
BULK_TOTAL |
Number of bulk requests made to Elasticsearch |
BULK_RETRIES |
Number of bulk retries (caused by document rejections) |
SCROLL_TOTAL |
Number of scroll pulled from Elasticsearch |
NODE_RETRIES |
Number of node fall backs (caused by network errors) |
NET_RETRIES |
Number of network retries (caused by network errors) |
Time focused |
|
NET_TOTAL_TIME_MS |
Overall time (in ms) spent over the network |
BULK_TOTAL_TIME_MS |
Time (in ms) spent over the network by the bulk requests |
BULK_RETRIES_TOTAL_TIME_MS |
Time (in ms) spent over the network retrying bulk requests |
SCROLL_TOTAL_TIME_MS |
Time (in ms) spent over the network reading the scroll requests |
One can use the counters programatically, depending on the API used, through mapred or mapreduce. Whatever the choice, elasticsearch-hadoop performs automatic reports without any user intervention. In fact, when using elasticsearch-hadoop one will see the stats reported at the end of the job run, for example:
13:55:08,100 INFO main mapreduce.Job - Job job_local127738678_0013 completed successfully 13:55:08,101 INFO main mapreduce.Job - Counters: 35 ... Elasticsearch Hadoop Counters Bulk Retries=0 Bulk Retries Total Time(ms)=0 Bulk Total=20 Bulk Total Time(ms)=518 Bytes Accepted=159129 Bytes Sent=159129 Bytes Received=79921 Bytes Retried=0 Documents Accepted=993 Documents Sent=993 Documents Received=0 Documents Retried=0 Network Retries=0 Network Total Time(ms)=937 Node Retries=0 Scroll Total=0 Scroll Total Time(ms)=0