WARNING: Version 5.5 has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Installation
editInstallation
editelasticsearch-hadoop binaries can be obtained either by downloading them from the elastic.co site as a ZIP (containing project jars, sources and documentation) or by using any Maven-compatible tool with the following dependency:
<dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop</artifactId> <version>5.5.3</version> </dependency>
The jar above contains all the features of elasticsearch-hadoop and does not require any other dependencies at runtime; in other words it can be used as is.
elasticsearch-hadoop binary is suitable for Hadoop 2.x (also known as YARN) environments. Support for Hadoop 1.x environments are deprecated in 5.5 and will no longer be tested against in 6.0.
Minimalistic binaries
editIn addition to the uber jar, elasticsearch-hadoop provides minimalistic jars for each integration, tailored for those who use just one module (in all other situations the uber
jar is recommended); the jars are smaller in size and use a dedicated pom, covering only the needed dependencies.
These are available under the same groupId
, using an artifactId
with the pattern elasticsearch-hadoop-{integration}
:
Map/Reduce.
<dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop-mr</artifactId> <version>5.5.3</version> </dependency>
Apache Hive.
<dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop-hive</artifactId> <version>5.5.3</version> </dependency>
Apache Pig.
<dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop-pig</artifactId> <version>5.5.3</version> </dependency>
Apache Spark.
<dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-spark-20_2.10</artifactId> <version>5.5.3</version> </dependency>
spark artifact. Notice the |
The Spark connector framework is the most sensitive to version incompatibilities. For your convenience, a version compatibility matrix has been provided below:
Spark Version | Scala Version | ES-Hadoop Artifact ID |
---|---|---|
1.0 - 1.2 |
2.10 |
<unsupported> |
1.0 - 1.2 |
2.11 |
<unsupported> |
1.3 - 1.6 |
2.10 |
elasticsearch-spark-13_2.10 |
1.3 - 1.6 |
2.11 |
elasticsearch-spark-13_2.11 |
2.0+ |
2.10 |
elasticsearch-spark-20_2.10 |
2.0+ |
2.11 |
elasticsearch-spark-20_2.11 |
Cascading.
<dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop-cascading</artifactId> <version>5.5.3</version> </dependency>
Note that Cascading itself is not available in Maven central but rather in its own repo conjars.org. Make sure to add this repository to your build configuration in order for the Cascading dependencies to be properly resolved:
<repositories> <repository> <id>conjars.org</id> <url>http://conjars.org/repo</url> </repository> </repositories>
Storm.
<dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-storm</artifactId> <version>5.5.3</version> </dependency>
Releases are available in the central Maven repository.
Development Builds
editDevelopment (or nightly or snapshots) builds are published daily at sonatype-oss repository (see below). Make sure to use snapshot versioning:
<dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop</artifactId> <version>5.5.4.BUILD-SNAPSHOT</version> </dependency>
but also enable the dedicated snapshots repository :