New

The executive guide to generative AI

Read more

Installation

edit

elasticsearch-hadoop binaries can be obtained either by downloading them from the elastic.co site as a ZIP (containing project jars, sources and documentation) or by using any Maven-compatible tool with the following dependency:

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch-hadoop</artifactId>
  <version>5.4.3</version>
</dependency>

The jar above contains all the features of elasticsearch-hadoop and does not require any other dependencies at runtime; in other words it can be used as is.

elasticsearch-hadoop binary is suitable for both Hadoop 1.x and Hadoop 2.x (also known as YARN) environments without any changes.

Minimalistic binaries

edit

In addition to the uber jar, elasticsearch-hadoop provides minimalistic jars for each integration, tailored for those who use just one module (in all other situations the uber jar is recommended); the jars are smaller in size and use a dedicated pom, covering only the needed dependencies. These are available under the same groupId, using an artifactId with the pattern elasticsearch-hadoop-{integration}:

Map/Reduce.

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch-hadoop-mr</artifactId> 
  <version>5.4.3</version>
</dependency>

mr artifact

Apache Hive.

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch-hadoop-hive</artifactId> 
  <version>5.4.3</version>
</dependency>

hive artifact

Apache Pig.

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch-hadoop-pig</artifactId> 
  <version>5.4.3</version>
</dependency>

pig artifact

Apache Spark.

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch-spark-20_2.10</artifactId> 
  <version>5.4.3</version>
</dependency>

spark artifact. Notice the -20 part of the suffix which indicates the Spark version compatible with the artifact. Use 20 for Spark 2.0+ and 13 for Spark 1.3-1.6. Notice the _2.10 suffix which indicates the Scala version compatible with the artifact. Currently it is the same as the version used by Spark itself.

The Spark connector framework is the most sensitive to version incompatibilities. For your convenience, a version compatibility matrix has been provided below:

Spark Version Scala Version ES-Hadoop Artifact ID

1.0 - 1.2

2.10

<unsupported>

1.0 - 1.2

2.11

<unsupported>

1.3 - 1.6

2.10

elasticsearch-spark-13_2.10

1.3 - 1.6

2.11

elasticsearch-spark-13_2.11

2.0+

2.10

elasticsearch-spark-20_2.10

2.0+

2.11

elasticsearch-spark-20_2.11

Cascading.

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch-hadoop-cascading</artifactId> 
  <version>5.4.3</version>
</dependency>

cascading artifact

Note that Cascading itself is not available in Maven central but rather in its own repo conjars.org. Make sure to add this repository to your build configuration in order for the Cascading dependencies to be properly resolved:

<repositories>
  <repository>
    <id>conjars.org</id>
    <url>http://conjars.org/repo</url>
  </repository>
</repositories>

Storm.

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch-storm</artifactId> 
  <version>5.4.3</version>
</dependency>

storm artifact

Releases are available in the central Maven repository.

Development Builds

edit

Development (or nightly or snapshots) builds are published daily at sonatype-oss repository (see below). Make sure to use snapshot versioning:

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch-hadoop</artifactId>
  <version>5.4.4.BUILD-SNAPSHOT</version> 
</dependency>

notice the BUILD-SNAPSHOT suffix indicating a development build

but also enable the dedicated snapshots repository :

<repositories>
  <repository>
    <id>sonatype-oss</id>
    <url>http://oss.sonatype.org/content/repositories/snapshots</url> 
    <snapshots><enabled>true</enabled></snapshots> 
  </repository>
</repositories>

add snapshot repository

enable snapshots capability on the repository otherwise these will not be found by Maven

Was this helpful?
Feedback