LangChain4j with Elasticsearch as the embedding store

In the previous post, we discovered what LangChain4j is and how to:

Have a discussion with LLMs by implementing a ChatLanguageModel and a ChatMemory
Retain chat history in memory to recall the context of a previous discussion with an LLM

This blog post is covering how to:

Create vector embeddings from text examples
Store vector embeddings in the Elasticsearch embedding store
Search for similar vectors

Create embeddings

To create embeddings, we need to define an EmbeddingModel to use. For example, we can use the same mistral model we used in the previous post. It was running with ollama:

EmbeddingModel model = OllamaEmbeddingModel.builder()
  .baseUrl(ollama.getEndpoint())
  .modelName(MODEL_NAME)
  .build();

A model is able to generate vectors from text. Here we can check the number of dimensions generated by the model:

Logger.info("Embedding model has {} dimensions.", model.dimension());
// This gives: Embedding model has 4096 dimensions.

To generate vectors from a text, we can use:

Response<Embedding> response = model.embed("A text here");

Or if we also want to provide Metadata to allow us filtering on things like text, price, release date or whatever, we can use Metadata.from(). For example, we are adding here the game name as a metadata field:

TextSegment game1 = TextSegment.from("""
    The game starts off with the main character Guybrush Threepwood stating "I want to be a pirate!"
    To do so, he must prove himself to three old pirate captains. During the perilous pirate trials, 
    he meets the beautiful governor Elaine Marley, with whom he falls in love, unaware that the ghost pirate 
    LeChuck also has his eyes on her. When Elaine is kidnapped, Guybrush procures crew and ship to track 
    LeChuck down, defeat him and rescue his love.
""", Metadata.from("gameName", "The Secret of Monkey Island"));
Response<Embedding> response1 = model.embed(game1);
TextSegment game2 = TextSegment.from("""
    Out Run is a pseudo-3D driving video game in which the player controls a Ferrari Testarossa 
    convertible from a third-person rear perspective. The camera is placed near the ground, simulating 
    a Ferrari driver's position and limiting the player's view into the distance. The road curves, 
    crests, and dips, which increases the challenge by obscuring upcoming obstacles such as traffic 
    that the player must avoid. The object of the game is to reach the finish line against a timer.
    The game world is divided into multiple stages that each end in a checkpoint, and reaching the end 
    of a stage provides more time. Near the end of each stage, the track forks to give the player a 
    choice of routes leading to five final destinations. The destinations represent different 
    difficulty levels and each conclude with their own ending scene, among them the Ferrari breaking 
    down or being presented a trophy.
""", Metadata.from("gameName", "Out Run"));
Response<Embedding> response2 = model.embed(game2);

If you'd like to run this code, please checkout the Step5EmbedddingsTest.java class.

Add Elasticsearch to store our vectors

LangChain4j provides an in-memory embedding store. This is useful to run simple tests:

EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
embeddingStore.add(response1.content(), game1);
embeddingStore.add(response2.content(), game2);

But obviously, this could not work with much bigger dataset because this datastore stores everything in memory and we don't have infinite memory on our servers. So, we could instead store our embeddings into Elasticsearch which is by definition "elastic" and can scale up and out with your data. For that, let's add Elasticsearch to our project:

<dependency>
  <groupId>dev.langchain4j</groupId>
  <artifactId>langchain4j-elasticsearch</artifactId>
  <version>${langchain4j.version}</version>
</dependency>

<dependency>
  <groupId>org.testcontainers</groupId>
  <artifactId>elasticsearch</artifactId>
  <version>1.20.1</version>
  <scope>test</scope>
</dependency>

As you noticed, we also added the Elasticsearch TestContainers module to the project, so we can start an Elasticsearch instance from our tests:

// Create the elasticsearch container
ElasticsearchContainer container =
  new ElasticsearchContainer("docker.elastic.co/elasticsearch/elasticsearch:8.15.0")
    .withPassword("changeme");

// Start the container. This step might take some time...
container.start();

// As we don't want to make our TestContainers code more complex than
// needed, we will use login / password for authentication.
// But note that you can also use API keys which is preferred.
final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials("elastic", "changeme"));

// Create a low level Rest client which connects to the elasticsearch container.
client = RestClient.builder(HttpHost.create("https://" + container.getHttpHostAddress()))
  .setHttpClientConfigCallback(httpClientBuilder -> {
    httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
    httpClientBuilder.setSSLContext(container.createSslContextFromCa());
    return httpClientBuilder;
  })
  .build();

// Check the cluster is running
client.performRequest(new Request("GET", "/"));

To use Elasticsearch as an embedding store, you "just" have to switch from the LangChain4j in-memory datastore to the Elasticsearch datastore:

EmbeddingStore<TextSegment> embeddingStore =
  ElasticsearchEmbeddingStore.builder()
    .restClient(client)
    .build();
embeddingStore.add(response1.content(), game1);
embeddingStore.add(response2.content(), game2);

This will store your vectors in Elasticsearch in a default index. You can also change the index name to something more meaningful:

EmbeddingStore<TextSegment> embeddingStore =
  ElasticsearchEmbeddingStore.builder()
    .indexName("games")
    .restClient(client)
    .build();
embeddingStore.add(response1.content(), game1);
embeddingStore.add(response2.content(), game2);

If you'd like to run this code, please checkout the Step6ElasticsearchEmbedddingsTest.java class.

Search for similar vectors

To search for similar vectors, we first need to transform our question into a vector representation using the same model we used previously. We already did that, so it's not hard to do this again. Note that we don't need the metadata in this case:

String question = "I want to pilot a car";
Embedding questionAsVector = model.embed(question).content();

We can build a search request with this representation of our question and ask the embedding store to find the first top vectors:

EmbeddingSearchResult<TextSegment> result = embeddingStore.search(
  EmbeddingSearchRequest.builder()
    .queryEmbedding(questionAsVector)
    .build());

We can iterate over the results now and print some information, like the game name which is coming from the metadata and the score:

result.matches().forEach(m -> Logger.info("{} - score [{}]",
  m.embedded().metadata().getString("gameName"), m.score()));

As we could expect, this gives us "Out Run" as the first hit:

Out Run - score [0.86672974]
The Secret of Monkey Island - score [0.85569763]

If you'd like to run this code, please checkout the Step7SearchForVectorsTest.java class.

Behind the scene

The default configuration for the Elasticsearch Embedding store is using the approximate kNN query behind the scene.

POST games/_search
{
  "query" : {
    "knn": {
      "field": "vector",
      "query_vector": [-0.019137882, /* ... */, -0.0148779955]
    }
  }
}

But this could be changed by providing another configuration (ElasticsearchConfigurationScript) than the default one (ElasticsearchConfigurationKnn) to the Embedding store:

EmbeddingStore<TextSegment> embeddingStore =
  ElasticsearchEmbeddingStore.builder()
    .configuration(ElasticsearchConfigurationScript.builder().build())
    .indexName("games")
    .restClient(client)
    .build();

The ElasticsearchConfigurationScript implementation runs behind the scene a script_score query using a cosineSimilarity function.

Basically, when calling:

EmbeddingSearchResult<TextSegment> result = embeddingStore.search(
  EmbeddingSearchRequest.builder()
    .queryEmbedding(questionAsVector)
    .build());

This now calls:

POST games/_search
{
  "query": {
    "script_score": {
      "script": {
        "source": "(cosineSimilarity(params.query_vector, 'vector') + 1.0) / 2",
        "params": {
          "queryVector": [-0.019137882, /* ... */, -0.0148779955]
        }
      }
    }
  }
}

In which case the result does not change in term of "order" but just the score is adjusted because the cosineSimilarity call does not use any approximation but compute the cosine for each of the matching vectors:

Out Run - score [0.871952]
The Secret of Monkey Island - score [0.86380446]

If you'd like to run this code, please checkout the Step7SearchForVectorsTest.java class.

Conclusion

We have covered how easily you can generate embeddings from your text and how you can store and search for the closest neighbours in Elasticsearch using 2 different approaches:

Using the approximate and fast knn query with the default ElasticsearchConfigurationKnn option
Using the exact but slower script_score query with the ElasticsearchConfigurationScript option

The next step will be about building a full RAG application, based on what we learned here.

Elasticsearch has native integrations to industry leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.

To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.