LangChain tutorial: A guide to building LLM-powered applications

139686_-_Elastic_-_Headers_-_V1_2.jpg

Large language models (LLMs) like GPT-4 and LLaMA have created a whole world of possibilities over the past couple of years. It’s heralded a boom in AI tools and applications, and ChatGPT has become a household name seemingly overnight. But this boom wouldn’t be possible without the powerful tools and frameworks created to facilitate this new generation of apps. 

One of these frameworks is LangChain, which makes it easy to build new apps using existing LLMs. It was developed by machine learning expert Harrison Chase and launched in 2022 as an open source project. This framework is a huge step in bridging the technical gap between existing language models and building new and varied applications.

LangChain explained

In simple terms, LangChain is a standardized interface that simplifies the process of building AI apps. It gives you a variety of tools you can use to connect different components and create complex workflows. This includes LLMs and various types of data sources. When a user interacts with the app, LangChain uses its connections to an LLM to process the request and generate appropriate responses. It can also use information and data from external sources like a document or database to provide more accurate and contextually relevant answers.

For instance, if a user asks a question, LangChain will use the LLM to comprehend the question and formulate a response. But it will also pull from one or more external data sources to enhance its reply. This makes your application much more intelligent and capable of handling complex and specialized queries.

Essentially, you’re augmenting the abilities of the LLM by providing it with data that is more relevant to the problems you want it to solve.

Available as both a Python and TypeScript package, it has several impressive features:

  • Model interaction: LangChain allows interaction with any language model, managing inputs and extracting information from outputs.

  • Efficient integration: It provides efficient integration with popular AI platforms like OpenAI and Hugging Face.

  • Flexibility and customization: LangChain offers flexibility, customization options, and powerful components to create a wide variety of applications across different industries.

  • Core components: The framework consists of LangChain libraries, LangChain templates, LangServe, and LangSmith, which simplify the entire application lifecycle. 

  • Standardized interfaces: It provides standardized interfaces, prompt management, and memory capabilities, enabling language models to interact with data sources.

This combination of features makes it flexible, quick, scalable, and easy to use, which is music to the ears of any developers tempted to get started with AI. 

How does LangChain work?

LangChain is a modular framework that integrates with LLMs. It’s a standardized interface that abstracts away the complexities and difficulties of working with different LLM APIs — it’s the same process for integrating with GPT-4, LLaMA, or any other LLM you want to use. It also has dynamic LLM selection, which means developers can select the most appropriate LLM for the specific task they’re using LangChain to carry out.

The modular design also facilitates the processing and transformation of input data into actionable outputs. It handles various data types, including text, code, and multimedia formats, and it provides tools for preprocessing, cleaning, and normalizing data. This is to ensure the data is suitable for consumption by the LLMs and may involve tokenization, normalization, and language identification.

LangChain also processes the LLM’s output, transforming it into formats appropriate for the app or task-specific requirements. This includes things like formatting text, generating code snippets, and providing summaries of complex data.

Core concepts of LangChain

LangChain's architecture is built on the concept of components and chains. Components represent reusable modules that perform specific tasks, such as processing input data, generating text formats, accessing external information, or managing workflows. Chains are sequences of components that work together to achieve a broader goal, such as summarizing a document, generating creative text formats, or providing personalized recommendations.

Components and modules

In LangChain, the terms "components" and "modules" are sometimes used interchangeably, but there is a subtle distinction between the two: 

  • Components are the core building blocks of LangChain, representing specific tasks or functionalities. These are typically small and focused and can be reused across different applications and workflows. 

  • Modules, on the other hand, combine multiple components to form more complex functionalities. LangChain even provides standard interfaces for a few of their main modules, including memory modules (a reusable building block that stores and manages data for use by large language models) and agents (a dynamic control unit that orchestrates chains based on real-time feedback and user interaction). 

Like components, modules are reusable and can be combined together to create even more complex workflows. This is called a chain, where sequences of components or modules are put together to achieve a specific goal. Chains are fundamental to workflow orchestration in LangChain and are essential for building effective applications that handle a wide range of tasks.

Integration with LLMs

LangChain seamlessly integrates with LLMs by providing a standardized interface. But LangChain's integration with LLMs goes beyond simply providing a connection mechanism. It also offers several features that optimize the use of LLMs for building language-based applications:

  • Prompt management: LangChain enables you to craft effective prompts that help the LLMs understand the task and generate a useful response.

  • Dynamic LLM selection: This allows you to select the most appropriate LLM for different tasks based on factors like complexity, accuracy requirements, and computational resources.

  • Memory management integration: LangChain integrates with memory modules, which means LLMs can access and process external information.

  • Agent-based management: This enables you to orchestrate complex LLM-based workflows that adapt to changing circumstances and user needs.

Workflow management

In LangChain, workflow management is the process of orchestrating and controlling the execution of chains and agents to solve a specific problem. This involves managing the flow of data, coordinating the execution of components, and ensuring that applications respond effectively to user interactions and changing circumstances. Here are some of the key workflow management components:

  • Chain orchestration: LangChain coordinates the execution of chains to ensure tasks are performed in the correct order and data is correctly passed between components.

  • Agent-based management: The use of agents is simplified with predefined templates and a user-friendly interface.

  • State management: LangChain automatically tracks the state of the application, providing developers with a unified interface for accessing and modifying state information.

  • Concurrency management: LangChain handles the complexities of concurrent execution, enabling developers to focus on the tasks and interactions without worrying about threading or synchronization issues.

Setting up LangChain

One of the advantages of LangChain is that there are very few requirements to get started. For this guide, we’ll be using Python, so you’ll need to have Python 3.8 or later installed on your computer. That’s it!

Installation and configuration

The first step is to install the core LangChain framework. The easiest way to do this is with this PIP command:

pip install langchain

The next thing you need is an LLM for LangChain to interact with. We’ll use OpenAI in this example, but you can use any LLM you want for your app:

pip install openai

For OpenAI to work, you also need an API key to authenticate your requests. You can get one by creating an OpenAI account, going to the API keys section, and selecting “Create new secret key.” Once you have the key, keep it safe somewhere. You’ll need it shortly.

Finally, create a text file called data.txt. This is going to be the external data source you use to give context to the LLM:

In West Philadelphia born and raised
On the playground was where I spent most of my days
Chillin' out, maxin', relaxin', all cool
And all shootin' some b-ball outside of the school

Now it’s time for the fun bit!

Developing applications with LangChain

For building this LangChain app, you’ll need to open your text editor or IDE of choice and create a new Python (.py) file in the same location as data.txt. You’re going to create a super basic app that sends a prompt to OpenAI’s GPT-3 LLM and prints the response.

Looking for something a bit more advanced? Check out our guide to using Amazon Bedrock with Elasticsearch and Langchain.

Step 1: Import the OpenAI class from LangChain

At the top of your Python script, add this import statement to pull in the OpenAI class from LangChain’s LLM module:

from langchain.llms import OpenAI

Step 2: Define a functional read data from a text file:

Define the function, so the app takes the file path as an argument. This will open the file for reading and return its contents:

def read_data_from_file(file_path):
    with open(file_path, 'r') as file:
        return file.read()

Step 3: Initialize the OpenAI model

Create an instance of the OpenAI class with your API key, replacing YOUR-OPENAI-KEY with the actual key you obtained from OpenAI:

gpt3 = OpenAI(api_key='YOUR-OPENAI-KEY')

Step 4: Define a function to request a response from OpenAI

Write a function that takes a prompt as its argument and returns the response from the GPT-3 model:

def get_response(prompt):
    return gpt3(prompt)

Step 5: Read data from the text file

Specify the path to the text file and use the function you defined earlier to read its contents. You’ll then store the data in the external_data variable:

file_path = 'data.txt'
external_data = read_data_from_file(file_path)

Step 6: Create a test prompt

This is where you define the prompt you’re going to send to GPT-3. In this example, you’re going to ask it to read the text and tell you what TV show the text file is talking about:

prompt = f"Based on the following data: {external_data}, what TV show is this about?"

Step 7: Get the response from GPT-3 and print it

Call a function that sends the prepared prompt and then retrieves and prints the response from GPT-3:

print("Response:", get_response(prompt))

Step 8: Run the app and check the response

Once you’ve done all of this, you have a Python app that looks like this:

from langchain.llms import OpenAI

# Function to read data from a file
def read_data_from_file(file_path):
    with open(file_path, 'r') as file:
        return file.read()

# Initialize the LLM
gpt3 = OpenAI(api_key='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')

def get_response(prompt):
    return gpt3(prompt)

# Read data from your text file
file_path = 'data.txt'
external_data = read_data_from_file(file_path)

# Prepare your prompt including the external data
prompt = f"Based on the following data: {external_data}, what TV show is this about?"

# Get the response from GPT-3
print("Response:", get_response(prompt))

So now all that’s left to do is run your Python app to make sure it works! Save the file, and run your app with this command in the terminal:

python YOUR-APP-NAME.py

If everything has gone to plan, you get a response that looks something like this:

Response: 
This is the opening theme song for the popular 1990s TV show "The Fresh Prince of Bel-Air".

Use cases

This example is an over-simplified demo, but the flexibility of LangChain means there are endless possibilities for building new AI apps. We couldn’t possibly try to list them all here, but we’ll run through a few case studies to highlight the various things you could build:

  • Chatbot: Build your own chatbot where you can ask questions in natural language and maintain conversation history.  

  • Q&A app: Create an app where you can ask for the information you’re after, and it’ll find the answer from stored documents.

  • Text search (BM25): Create your own text search app to query large amounts of data.

  • Vector search: Build an app that searches for data similarities and filters metadata.

  • Hybrid search (text and vector): Develop an AI that matches similar documents using both text and vector filtering.

  • LangChain with your own LLM: Use LangChain to build an AI app that uses your own LLM with external data sources.

Build LLM-powered apps using LangChain

It should be clear by now that by combining the power of LLMs with the context and extra information in external data sources, LangChain gives you unlimited possibilities. It’s also remarkably easy to get started, as shown in this LangChain tutorial. This ease of use combined with the flexibility and power of LangChain make it an ideal platform for developing a wide range of AI applications. Whether you are building a chatbot, a Q&A app, or a search engine, LangChain can help you create innovative and effective solutions.

What you should do next

Whenever you're ready, here are four ways we can help you harness insights from your business’ data:

  1. Start a free trial and see how Elastic can help your business.

  2. Tour our solutions to see how the Elasticsearch Platform works, and how our solutions will fit your needs.

  3. Discover how to incorporate generative AI in the enterprise.

  4. Share this article with someone you know who'd enjoy reading it. Share it with them via email, LinkedIn, Twitter, or Facebook.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use. 

Elastic, Elasticsearch, ESRE, Elasticsearch Relevance Engine and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.