The rise of generative AI models, such as the Gemini 1.5 Pro model, has opened exciting possibilities for building intelligent agents capable of complex tasks. AI agents enable autonomous behavior by using generative models and external tools to perceive their environment, make decisions, and take actions to achieve goals. But the reality of generative AI applications and AI agents is that they involve lots of time and upkeep to manage the underlying infrastructure and boilerplate code.
LangChain on Vertex AI (Reasoning Engine) is a managed service in Vertex AI that provides a runtime environment for deploying agents built with any orchestration framework, including LangChain. Reasoning Engine abstracts away complexities such as deployment, scaling, and monitoring, which allows developers to focus on the core logic and capabilities within their agents.
In this blog post, we’ll walk through how LangChain on Vertex AI helps developers simplify the complexities of deploying and managing your AI agents. With a single API call to reasoning_engines.create()
you can deploy your application to a scalable and secure environment. Then, Reasoning Engine takes care of the deployment, infrastructure, autoscaling, monitoring, and observability, which lets you get back to innovation and problem solving.
In a previous blog post on Function Calling in Gemini, we discussed a native framework within the Gemini model that can be used to turn natural language prompts into structured data and back again. Developers can use the Function Calling framework to define functions as tools that the Gemini model can use to connect to external systems and APIs to fetch real-time data that supplements the generative model's trained knowledge about the world.
If you want to work with the model, tools, and function components for simple use cases such as entity extraction, structured data outputs, or custom workflows with external APIs, then you probably want to stick with Function Calling.
As you continue to build on top of the model and tools framework by adding more complex workflows, reasoning logic, and error handling to your generative AI application, you might find yourself getting lost in the data connections, retrievers, and orchestration layers and their associated configuration. This is when you know that you've reached the limitations of existing approaches for building and deploying AI agents.
There are many different ways to add more functionality to your generative AI application that uses an LLM to generate content. You might have developed a series of prompts or chained generative model requests to perform a task or set of tasks. Or maybe you've implemented a ReAct agent in LangChain. Or you might be developing on the cutting edge as you implement reflection agents or deploy multi-agent routers.
But when does your application code become an AI agent? How can you build your AI agent code in a modular, composable, and maintainable way rather than a monolithic bundle of confusing code? And how can you deploy your agent in a scalable and reliable way? In the following section, we’ll dive into the technical details of working with agents using LangChain on Vertex AI, which offers developers a streamlined approach to building and deploying production-ready AI agents.
Building and deploying agents with LangChain on Vertex AI involves four distinct layers, each catering to specific development needs.
Building custom generative AI applications with agentic capabilities often involves adding tools and functions on top of powerful generative models, such as Gemini. While prototyping is exciting, moving to production raises concerns about deployment, scaling, and management of these complex systems. This is where Vertex AI's Reasoning Engine comes in!
In this section, we’ll walk through the key steps of building, testing, and deploying your AI agent with LangChain on Vertex AI based on the sample notebook for building and deploying an agent with LangChain on Vertex AI. You can also go hands-on with the links and resources at the end of this blog post to get started yourself!
To start, we’ll need to define functions that Gemini will use as tools to interact with external systems and APIs to retrieve real-time information. With Reasoning Engine and the provided LangChain template, there’s no need to write up an OpenAPI specification or represent your API call as an abstract function signature–just write Python functions!
You can define functions to perform retrieval augmented generation (RAG) and retrieve indexed documents from a vector database based on a user query, as in:
def search_documents(query):
"""Searches a vector database for snippets in relevant documents"""
from langchain_google_community import VertexAISearchRetriever
retriever = VertexAISearchRetriever(
project_id=PROJECT_ID,
data_store_id=DATA_STORE_ID,
location_id=LOCATION_ID,
max_documents=100,
)
result = str(retriever.invoke(query))
return result
You can also define functions that go beyond traditional RAGs and make queries to APIs to retrieve information from external data sources in real-time, as in:
def get_exchange_rate(currency_from, currency_to):
"""Retrieves the exchange rate between two currencies"""
import requests
response = requests.get(
f"https://api.frankfurter.app/",
params={"from": currency_from, "to": currency_to},
)
return response.json()
You can even go well beyond RAG implementations and REST API calls to define functions that use OSS or custom Python libraries to perform various types of operations. For example, you might want to create a function that generates and sends a SQL query to BigQuery, searches for businesses using the Maps Places API, or downloads a file from Google Drive, as in:
def download_file_from_google_drive(file_id):
"""Downloads a file from Google Drive"""
import google.auth
from googleapiclient.http import MediaIoBaseDownload
from googleapiclient.discovery import build
creds, _ = google.auth.default()
service = build("drive", "v3", credentials=creds)
request = service.files().get_media(fileId=file_id)
file = io.BytesIO()
downloader = MediaIoBaseDownload(file, request)
return file.getvalue()
If you can represent it in a Python function, then you can provide it as a tool for your agent!
Once you’ve defined all of the functions that you want to include as tools in your AI agent, you can define an agent using our LangChain template:
agent = reasoning_engines.LangchainAgent(
model=model,
tools=[search_documents, get_exchange_rate, download_file_from_google_drive]
)
Note that the tools
kwarg includes references to the functions that you described earlier, and the LangChain template in Reasoning Engine introspects the function name, function arguments, default argument values, docstrings, and type hints so that it can pass all of this information as part of the tool description to the agent and Gemini model.
We designed this LangChain template so that you can quickly get started out-of-the-box using default values. We also built the template so that you can have maximum flexibility when customizing the layers of your agent to modify reasoning behavior, generative model parameters, swap out the default agent logic for another type of LangChain agent, or even swap out LangChain for an entirely different orchestration framework!
Now you’re ready to move on to the deployment step of productionizing your AI agent! Here, you specify the instance of the agent that you defined previously along with the set of Python packages and dependencies required for your agent:
remote_agent = reasoning_engines.ReasoningEngine.create(
agent,
requirements=[
"google-cloud-aiplatform[reasoningengine,langchain]",
],
)
When deploying your agent with Reasoning Engine, there’s no need to add API routes via a web framework, no need for Docker images or containers, and no need for complicated deployment steps. And after a couple of minutes, your AI agent is deployed and ready to accept queries.
Now that you’ve deployed your agent with LangChain on Vertex AI, you can send a prompt to the remotely deployed agent using the following query:
>>> remote_agent.query(
input="What's the exchange rate from US dollars to Swedish currency today?"
)
{'input': "What's the exchange rate from US dollars to Swedish currency today?",
'output': 'Today, 1 US dollar is equal to 10.949 Swedish krona.'}
In this case, the Gemini model didn’t know the exchange rate based on its training data. Rather, our agent used the function that we defined to fetch the current exchange rate, passed that information back to the Gemini model, and Gemini was able to use that real-time information to generate a natural language summary!
Let's take a deeper look behind the scenes of this example query and break down what actions the AI agent took at runtime to go from the user’s input prompt to the output that contains a natural language summary of the answer:
get_exchange_rate
) and which parameters to send as inputs to the function (the currencies that the user wants to know about).get_exchange_rate
) with the provided parameters.Once your agent is deployed as a Reasoning Engine endpoint in Vertex AI, you can run the following command to get the resource identifier for your remotely deployed agent:
>>> remote_agent_path = remote_agent.resource_name
projects/954731410984/locations/us-central1/reasoningEngines/8658662864829022208
And now you can import and query the remotely deployed agent in a separate Python application using the Vertex AI SDK for Python, as in:
remote_agent = reasoning_engines.ReasoningEngine(remote_agent_path)
response = remote_agent.query(input=query)
Or, you can send queries to your remotely deployed agent using REST API calls from Python, cURL, or your preferred programming language.
To start building and deploying agents with LangChain on Vertex AI, you can go hands-on with the following developer resources:
By combining the power of LangChain and Vertex AI, developers can use generative models to build intelligent agents that can tackle complex real-world tasks and autonomous workflows.
We’re excited to see what kinds of intelligent, agentic applications that you build with Reasoning Engine and LangChain on Vertex AI. Happy coding!