Building and Deploying AI Agents with LangChain on Vertex AI

annawhelan · 05-09-2024 02:14 PM

Overview

The rise of generative AI models, such as the Gemini 1.5 Pro model, has opened exciting possibilities for building intelligent agents capable of complex tasks. AI agents enable autonomous behavior by using generative models and external tools to perceive their environment, make decisions, and take actions to achieve goals. But the reality of generative AI applications and AI agents is that they involve lots of time and upkeep to manage the underlying infrastructure and boilerplate code.

LangChain on Vertex AI (Reasoning Engine) is a managed service in Vertex AI that provides a runtime environment for deploying agents built with any orchestration framework, including LangChain. Reasoning Engine abstracts away complexities such as deployment, scaling, and monitoring, which allows developers to focus on the core logic and capabilities within their agents.

In this blog post, we’ll walk through how LangChain on Vertex AI helps developers simplify the complexities of deploying and managing your AI agents. With a single API call to reasoning_engines.create() you can deploy your application to a scalable and secure environment. Then, Reasoning Engine takes care of the deployment, infrastructure, autoscaling, monitoring, and observability, which lets you get back to innovation and problem solving.

Background on generative models and tools

In a previous blog post on Function Calling in Gemini, we discussed a native framework within the Gemini model that can be used to turn natural language prompts into structured data and back again. Developers can use the Function Calling framework to define functions as tools that the Gemini model can use to connect to external systems and APIs to fetch real-time data that supplements the generative model's trained knowledge about the world.

If you want to work with the model, tools, and function components for simple use cases such as entity extraction, structured data outputs, or custom workflows with external APIs, then you probably want to stick with Function Calling.

As you continue to build on top of the model and tools framework by adding more complex workflows, reasoning logic, and error handling to your generative AI application, you might find yourself getting lost in the data connections, retrievers, and orchestration layers and their associated configuration. This is when you know that you've reached the limitations of existing approaches for building and deploying AI agents.

Challenges when going from model to agent

There are many different ways to add more functionality to your generative AI application that uses an LLM to generate content. You might have developed a series of prompts or chained generative model requests to perform a task or set of tasks. Or maybe you've implemented a ReAct agent in LangChain. Or you might be developing on the cutting edge as you implement reflection agents or deploy multi-agent routers.

But when does your application code become an AI agent? How can you build your AI agent code in a modular, composable, and maintainable way rather than a monolithic bundle of confusing code? And how can you deploy your agent in a scalable and reliable way? In the following section, we’ll dive into the technical details of working with agents using LangChain on Vertex AI, which offers developers a streamlined approach to building and deploying production-ready AI agents.

What’s in an agent? Key components in LangChain on Vertex AI

Building and deploying agents with LangChain on Vertex AI involves four distinct layers, each catering to specific development needs.

Model (Gemini model): This layer handles content generation, understanding and responding to user queries in natural language, and summarizing information.
Tools (Gemini Function Calling): This layer allows your agent to interact with external systems and APIs, enabling it to perform actions beyond just generating text or images.
Reasoning (LangChain): This layer organizes your application code into functions, defining configuration parameters, initialization logic, and runtime behavior. LangChain simplifies LLM application development by providing the building blocks for generative AI applications, and developers maintain control over crucial aspects like custom functions, agent behavior, and model parameters.
Deployment (Reasoning Engine): This Vertex AI service hosts your AI agent and provides benefits such as security, observability, and scalability. Reasoning Engine is compatible with LangChain or any open-source framework to build customizable agentic workflows.

Building custom generative AI applications with agentic capabilities often involves adding tools and functions on top of powerful generative models, such as Gemini. While prototyping is exciting, moving to production raises concerns about deployment, scaling, and management of these complex systems. This is where Vertex AI's Reasoning Engine comes in!

Building and deploying an AI agent with LangChain on Vertex AI

In this section, we’ll walk through the key steps of building, testing, and deploying your AI agent with LangChain on Vertex AI based on the sample notebook for building and deploying an agent with LangChain on Vertex AI. You can also go hands-on with the links and resources at the end of this blog post to get started yourself!

1. Define your functions

To start, we’ll need to define functions that Gemini will use as tools to interact with external systems and APIs to retrieve real-time information. With Reasoning Engine and the provided LangChain template, there’s no need to write up an OpenAPI specification or represent your API call as an abstract function signature–just write Python functions!

You can define functions to perform retrieval augmented generation (RAG) and retrieve indexed documents from a vector database based on a user query, as in:

def search_documents(query):
    """Searches a vector database for snippets in relevant documents"""
    from langchain_google_community import VertexAISearchRetriever

    retriever = VertexAISearchRetriever(
        project_id=PROJECT_ID,
        data_store_id=DATA_STORE_ID,
        location_id=LOCATION_ID,
        max_documents=100,
    )

    result = str(retriever.invoke(query))
    return result

You can also define functions that go beyond traditional RAGs and make queries to APIs to retrieve information from external data sources in real-time, as in:

def get_exchange_rate(currency_from, currency_to):
    """Retrieves the exchange rate between two currencies"""
    import requests
    response = requests.get(
        f"https://api.frankfurter.app/",
        params={"from": currency_from, "to": currency_to},
    )
    return response.json()

You can even go well beyond RAG implementations and REST API calls to define functions that use OSS or custom Python libraries to perform various types of operations. For example, you might want to create a function that generates and sends a SQL query to BigQuery, searches for businesses using the Maps Places API, or downloads a file from Google Drive, as in:

def download_file_from_google_drive(file_id):
    """Downloads a file from Google Drive"""
    import google.auth
    from googleapiclient.http import MediaIoBaseDownload
    from googleapiclient.discovery import build

    creds, _ = google.auth.default()
    service = build("drive", "v3", credentials=creds)

    request = service.files().get_media(fileId=file_id)
    file = io.BytesIO()
    downloader = MediaIoBaseDownload(file, request)

    return file.getvalue()

If you can represent it in a Python function, then you can provide it as a tool for your agent!

2. Define your agent

Once you’ve defined all of the functions that you want to include as tools in your AI agent, you can define an agent using our LangChain template:

agent = reasoning_engines.LangchainAgent(
    model=model,
    tools=[search_documents, get_exchange_rate, download_file_from_google_drive]
)

Note that the tools kwarg includes references to the functions that you described earlier, and the LangChain template in Reasoning Engine introspects the function name, function arguments, default argument values, docstrings, and type hints so that it can pass all of this information as part of the tool description to the agent and Gemini model.

We designed this LangChain template so that you can quickly get started out-of-the-box using default values. We also built the template so that you can have maximum flexibility when customizing the layers of your agent to modify reasoning behavior, generative model parameters, swap out the default agent logic for another type of LangChain agent, or even swap out LangChain for an entirely different orchestration framework!

3. Deploy your agent

Now you’re ready to move on to the deployment step of productionizing your AI agent! Here, you specify the instance of the agent that you defined previously along with the set of Python packages and dependencies required for your agent:

remote_agent = reasoning_engines.ReasoningEngine.create(
    agent,
    requirements=[
        "google-cloud-aiplatform[reasoningengine,langchain]",
    ],
)

When deploying your agent with Reasoning Engine, there’s no need to add API routes via a web framework, no need for Docker images or containers, and no need for complicated deployment steps. And after a couple of minutes, your AI agent is deployed and ready to accept queries.

Interacting with your deployed AI agent: From prompt to response

Now that you’ve deployed your agent with LangChain on Vertex AI, you can send a prompt to the remotely deployed agent using the following query:

>>> remote_agent.query(
    input="What's the exchange rate from US dollars to Swedish currency today?"
)

{'input': "What's the exchange rate from US dollars to Swedish currency today?",
 'output': 'Today, 1 US dollar is equal to 10.949 Swedish krona.'}

In this case, the Gemini model didn’t know the exchange rate based on its training data. Rather, our agent used the function that we defined to fetch the current exchange rate, passed that information back to the Gemini model, and Gemini was able to use that real-time information to generate a natural language summary!

Let's take a deeper look behind the scenes of this example query and break down what actions the AI agent took at runtime to go from the user’s input prompt to the output that contains a natural language summary of the answer:

User submits a query: The user sends an input prompt asking about currency exchange rates between two different currencies.
Send query and tools to model: The agent packages the query with tool descriptions and sends it to the Gemini model.
Model decides on tool usage: Based on the query and tool descriptions, the Gemini model decides whether to utilize a specific function (get_exchange_rate) and which parameters to send as inputs to the function (the currencies that the user wants to know about).
Application calls the tool: The application executes the model’s instructions by calling the appropriate function (get_exchange_rate) with the provided parameters.
Tool results: The application receives a response from the tool (an API response payload).
Return results to model: The application sends the API response payload to the model.
Return results to agent: The agent interacts with the model to understand the observation based on the response.
Agent determines next steps: This process repeats if the agent determines additional tool calls are necessary or if the agent should prepare a final response to send to the user.
Model generates response: Based on the results from the external API and the agent iterations, the model then generates a natural language response for the user that contains the latest currency exchange rate information.

Once your agent is deployed as a Reasoning Engine endpoint in Vertex AI, you can run the following command to get the resource identifier for your remotely deployed agent:

>>> remote_agent_path = remote_agent.resource_name
projects/954731410984/locations/us-central1/reasoningEngines/8658662864829022208

And now you can import and query the remotely deployed agent in a separate Python application using the Vertex AI SDK for Python, as in:

remote_agent = reasoning_engines.ReasoningEngine(remote_agent_path)
response = remote_agent.query(input=query)

Or, you can send queries to your remotely deployed agent using REST API calls from Python, cURL, or your preferred programming language.

Benefits of LangChain on Vertex AI and Reasoning Engine

Simplified development: LangChain on Vertex AI streamlines agent development with its modular components and intuitive API built from the ground up for creating and deploying AI agents.
Flexibility and control: Developers maintain control over critical aspects of agent behavior and functionality at all of the relevant layers underneath your AI agent.
Production-ready deployment: Vertex AI's Reasoning Engine handles the complexities of deployment, scaling, and management.
Security and scalability: Vertex AI provides a secure and scalable environment for running agents in production.

Start building AI agents with LangChain on Vertex AI

To start building and deploying agents with LangChain on Vertex AI, you can go hands-on with the following developer resources:

By combining the power of LangChain and Vertex AI, developers can use generative models to build intelligent agents that can tackle complex real-world tasks and autonomous workflows.

We’re excited to see what kinds of intelligent, agentic applications that you build with Reasoning Engine and LangChain on Vertex AI. Happy coding!

hopeclip918 · 05-09-2024 08:43 PM

This introduces a solution which empowers deployment. By combining, this simplifies the process. Kudos for advancing AI.

Yravikrishna · 05-11-2024 07:55 PM

Excellent article!

Sudhakar692 · 05-12-2024 01:13 AM

This article is really great specially for the beginners like me. I feel motivated and eager to learn more on GenAI and with Langchain and Vertex AI.

Please share a roadmap or cloud journey path.

abdullahi4-tech · 05-14-2024 01:53 AM

Excellent tools vertex AI.Kudos to advanced AI.

emerworth · 06-04-2024 08:52 AM

Great article!

I've been working with the Reasoning Engine for a while now, and I came across an issue when the query requires making two tool calls. I've modified the example input to demonstrate how to execute two tool calls.

If you run:

agent.query(input="What's the exchange rate from US dollars to Swedish and from US dollars to AUD currency today?")

You will obtain:

This model can reply with multiple function calls in one response. Please don't rely on `additional_kwargs.function_call` as only the last one will be saved.Use `tool_calls` instead.

---------------------------------------------------------------------------
_MultiThreadedRendezvous                  Traceback (most recent call last)
File ~/Documents/GitHub/langchain-finance-bot/.venv/lib/python3.11/site-packages/google/api_core/grpc_helpers.py:170, in _wrap_stream_errors.<locals>.error_remapped_callable(*args, **kwargs)
    169     prefetch_first = getattr(callable_, "_prefetch_first_result_", True)
--> 170     return _StreamingResponseIterator(
    171         result, prefetch_first_result=prefetch_first
    172     )
    173 except grpc.RpcError as exc:

File ~/Documents/GitHub/langchain-finance-bot/.venv/lib/python3.11/site-packages/google/api_core/grpc_helpers.py:92, in _StreamingResponseIterator.__init__(self, wrapped, prefetch_first_result)
     91     if prefetch_first_result:
---> 92         self._stored_first_result = next(self._wrapped)
     93 except TypeError:
     94     # It is possible the wrapped method isn't an iterable (a grpc.Call
     95     # for instance). If this happens don't store the first result.

File ~/Documents/GitHub/langchain-finance-bot/.venv/lib/python3.11/site-packages/grpc/_channel.py:543, in _Rendezvous.__next__(self)
    542 def __next__(self):
--> 543     return self._next()

File ~/Documents/GitHub/langchain-finance-bot/.venv/lib/python3.11/site-packages/grpc/_channel.py:969, in _MultiThreadedRendezvous._next(self)
    968 elif self._state.code is not None:
--> 969     raise self

_MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
...
    172     )
    173 except grpc.RpcError as exc:
--> 174     raise exceptions.from_grpc_error(exc) from exc

InvalidArgument: 400 Request contains an invalid argument.

The issue arises because the agent executes two tool calls. In the notebook example, you'll only see one tool call following this flow:
query -> agent response -> tool call (get_curr...) -> tool execution -> agent response -> response.

However, in the prompt I modified, the agent behaves differently:
query -> agent response -> tool call (get_curr...) -> tool execution -> second tool call (get_curr...) -> tool execution -> agent response -> response

I encountered this issue while building my own agent and was able to replicate it in this notebook.

I tried to fix it myself, but unfortunately, I couldn't find a solution. Can you help?

koverholt · 06-04-2024 11:18 AM

Hi @emerworth!

Thanks for letting us know about this issue. It seems like that warning is coming up to help ensure that Reasoning Engine captures 2 or more function call responses in the event that Gemini Function Calling returns multiple, parallel function calls. I opened https://issuetracker.google.com/issues/344921847 based on the information you provided, and please feel free to add additional context to that issue. Thanks!

Hugo2 · 06-17-2024 01:19 PM

Great post, thank you!. This is a very useful service. Can it be extended to OpenAI models?

koverholt · 06-20-2024 02:15 PM

Hi @Hugo2, good question! The Reasoning Engine service has two main developer paths. One that uses the LangChain template and lets you work with the Gemini API in Vertex AI among other models in the Vertex AI Model Garden. The other development path lets you define custom classes to be deployed, of which there you could call out to any LangChain functionality (or other frameworks) or different models that you want.

Hugo2 · 06-25-2024 12:48 PM

thank you! @koverholt

fconrady · 06-27-2024 06:39 AM

@koverholt What's the difference between deploying a LangChain chain in Vertex AI Reasoning Engine vs. putting it into a Google Cloud function? What does the Vertex AI Reasoning Engine give me, that the Google Cloud function doesn't?

trk · 08-09-2024 02:31 AM

Thank you for sharing this insightful piece! However, I wanted to bring up a concern that many in the community, including myself, have been experiencing with LangChain in production. Frequent upgrades and changes have led to some challenges in maintaining stability and performance in production environments .

Given that Vertex AI Reasoning Engine is integrated with LangChain, I'm curious 🤔 about how Google is addressing these issues. Will adopting Vertex AI Reasoning Engine help mitigate these challenges, or should we expect similar production concerns?

Your thoughts and any guidance on this would be greatly appreciated as we evaluate the best approach moving forward.

koverholt · 08-09-2024 08:10 AM

@fconrady: Good question! There are so many ways to deploy LangChain apps - the focus of Reasoning Engine is convenience - ease of deployment with the Vertex AI SDK, build an agent using the LangChain template (or use a different framework), deploy an agent with one command, query agents using the Vertex AI SDKs in various languages, and we've recently added tracing and logging of deployed agents. You can handle all of those detailed yourself in any other deployment tool - you can think of Reasoning Engine as a convenient way to handle those build, deploy, and manage tasks. Best way to see if it's a good fit for you is to try it out and compare to your existing process!

@trk: Thanks for sharing your concern! I can definitely relate to the churn and fast pace that you mentioned - not only in LangChain but in Generative AI technologies in general. In Reasoning Engine, you can work directly with the LangChain template to customize your agent, or you can use any Python framework you'd like with a custom agent. And in either case there are no hidden dependencies to deal with - you control exactly the versions of LangChain, Vertex AI SDK, and other dependencies that your agent needs to run. We track and pin versions of langchain, langchain-community, langchain-google-vertexai and test them together when you install a given version of the google-cloud-aiplatform[reasoningengine,langchain] Python package with extras. And of course you can break out and pin versions of libraries in your agent as specific as you need so they don't change behind the scenes. This kind of stability for deploying and operating your agents is exactly the scenario we are trying to solve for based on our experiences working with developers to deploy and manage ML apps, and it's a similar approach to any deployed app that depends on libraries that are always changing. Finally, we work very closely with the LangChain developer and user communities to make sure we're not only fixing things along the way, but that we're handling agents and integrations the right way when it comes to best practices for runnables, memory, agent settings, prompt templates, etc.

oloUser · 08-22-2024 01:39 PM

Can you build a RAG with both Vertex AI Search and Google Search within Reasoning Engine?

koverholt · 08-23-2024 10:25 AM

Hi @oloUser, and thanks for the great question!

You can definitely build a RAG between Vertex AI Search and Google Search with Reasoning Engine. There's multiple ways to build this kind of agent pipeline - I would start by defining a tool that retrieves documents from Vertex AI Search as shown in this sample notebook, then within that same notebook you could define a second tool that uses Gemini's grounding in Google Search results, as in this other sample notebook.

From there you could determine if you want to combine them into a single tool (to fetch results from both sources at once), add a tool to combine the results, or continue from there depending on your use case. Hope that helps!

oloUser · 08-26-2024 02:10 PM

Hi @koverholt I am trying to make Google Search work with Reasoning Engine. I was successful at building RAG and grounding it in Google Search with Reasoning Engine. When I test the agent "locally" in CoLab I get desired results. However, when trying to deploy it, I get the following error:

TypeError: no default __reduce__ due to non-trivial __cinit__

Hypothesis: The error message "TypeError: no default reduce due to non-trivial cinit" usually arises when you try to pickle (serialize) an object that contains C extensions. In this case, the grpc.Channel object within the LangchainAgent is likely causing the issue. Cloudpickle, which is used to serialize the reasoning_engine object, cannot handle the C extensions within the gRPC channel.

Unfortunately, I am unable to fix it. I would greatly appreciate your help trying to find a path forward. Thank you!

koverholt · 08-26-2024 09:59 PM

Hi @oloUser! Take a look at the troubleshooting steps listed on this (and adjacent) docs pages: https://cloud.google.com/vertex-ai/generative-ai/docs/reasoning-engine/troubleshooting/deploy

This type of serialization error can usually be fixed via one of the following two solutions mentioned on that page:

Dirty state on LangchainAgent. Instantiate a fresh instance of the LangchainAgent or remove agent.set_up() from the code before deploying to reasoning engine.
Inconsistent package specs. See the section on troubleshooting serialization errors. [In particular, check that your versions of cloudpickle and pydantic match up with the recommended versions in those troubleshooting docs and/or the sample notebooks].

If you continue to run into errors, please file a bug on the Vertex AI issue tracker so we can get more information about your development environment and dig into the issue a bit deeper.

oloUser · 08-26-2024 10:24 PM

Hi @koverholt :

Thank you for your advice! I spent two days going through all the checks that you suggested with no success.

I forgot to mention that I was able to successfully deploy RAG to Reasoning Engine with a Vertex AI Vector Search as a retrieval tool. But when I try to deploy Google Search as a tool, I get the error. Here is the link to my code: https://github.com/aniebyl/langchain/blob/master/reasoning_engine_rag_langchain_agent_embedded_in_go...

Since none of the sample notebooks have an example using Google Search as a tool in RAG with Reasoning Engine, i am starting to suspect that this may not be allowed. Any thoughts on that?

Also, I have already filed a bug in the tracker - thank you!

koverholt · 08-26-2024 10:50 PM

@oloUser, got it, thanks for going through the basic troubleshooting steps and sharing your code! The use of Google Search as a tool in RAG with Reasoning Engine is definitely possible, but might be introducing one of the following scenarios related to serialization that needs to be handled.

In the screenshot that you posted, I see that a notebook cell was run between cell 20 and cell 22, which can lead to the Dirty state onLangchainAgent scenario in the above linked troubleshooting docs. In that case you should initialize the agent and deploy a freshly initialized version of the agent.

Or if the deployment is still failing with serialization errors, you might need to check that the way you are using the retrievers is not setting up a stateful client (which would lead to serialization errors as well) - if so you'll need to move the client initialization to the set_up() method within your agent class to avoid serialization errors.

Whereas in the notebook code you shared (thanks for that!), I see a different permissions error happening when the deployed agent is trying to access the data store in Vertex AI Search. In that case, you'll want to make sure that the Google-managed service account for Reasoning Engine has permissions to access that data store per the notebook cell on "Grant Discovery Engine Editor access to Reasoning Engine service account". Hopefully one or more of those tips helps, otherwise we can follow up in the bug that you filed for more information. Thanks!

oloUser · 08-28-2024 02:48 PM

@koverholt It worked! I really appreciate you taking time to guide me through the troubleshooting. You're the best!

FYI, it was a combination of two things:

1) testing agent locally and then reinitiating it to avoid dirty state on LangchainAgent before deployment did not work. I had to restart the session and skip local testing altogether to make sure the agent was in its initial state.

2) set_up () method took care of the serialization error.

Since I am developing in LangGraph, I used your other notebook (https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/reasoning-engine/tutorial_lang... ) and added a Google Search and Vertex AI Vector Search tools to it.

Also, I checked and it looks like I already gave service-PROJECT_ID@gcp-sa-aiplatform-re.iam.gserviceaccount.com service account access to Discovery Engine as Editor.

Have a great day!

Aleks

koverholt · 08-30-2024 10:48 AM

@oloUser, that's great to hear, thanks for letting us know how things turned out! I wonder if restarting the session solved a different issue like dirty state or Python package versions of things that were installed but not active in the session yet.

And I'm happy to hear that the LangGraph notebook example for Reasoning Engine was helpful for you in figuring out a clean way to handle Tool interoperability with Vertex AI Search RAG + Grounding in Google Search. I had a similar epiphany this week when implementing a reAct agent w/ LangGraph as described in this tutorial, and I was able to make use of Tools + general Gemini knowledge that has been not as straightforward to figure out using only the Vertex AI SDK + LangChain, similar to your "two tools" problem. Stay tuned for more content from us about using LangGraph in Reasoning Engine and Vertex AI! 😀

kingychiu · 08-31-2024 12:34 AM

Hi @koverholt , Thank you so much for the post.

We have successfully deployed the local Langchain Vertexai Agent with firebase function. Now I am trying to use the deploy version of the local Langchain Vertexai Agent. I have a few questions to ask.

Local Langchain Vertexai Agent: https://cloud.google.com/vertex-ai/generative-ai/docs/reasoning-engine/develop

Deployed: https://cloud.google.com/vertex-ai/generative-ai/docs/reasoning-engine/deploy

Q1. How are environment variables handled? For example, in my local Langchain Vertexai Agent (on firebase function), I am setting the GCP Project ID to the dev, staging, prod GCP project. How can we do this with the deployed Langchain Vertexai Agent?

Q2. Similar to Q1, what is the recommended way to access GCP Secret Manager with the deployed version? I am using the firebase function secret parameter in my current firebase function.

Q3. We have some custom tools that require a setup script. For example: `playwright install`, how to do it with the deployed version?

Q4. About

extra_packages: A list of internal package dependencies. These package dependencies are local files or directories that correspond to the local Python packages required by the application.

Does it mean local pip packages? or import statements like `from utils import libs`?

If it means import statements, do we put everything in 1 file for deployment?

Sorry for asking so many questions. To sum up, I think the confusion comes from the gap between the local and deployed versions seems to be very large/unclear. Will the scope/variables in memory got deployed as well (like the env variables question I was asking)... How the code will be packaged ... The process now seems to be too magical.

Also, what is the benefit of using the deployed version over deploying the local version to the firebase function / cloud function / cloud run?

nadav_w · 09-03-2024 06:19 AM

Great article and a step up in offering to build GenAI applications based on agents.

Regarding costs:
In the notebook intro_reasoning_engine.ipynb it's mentioned :
This tutorial uses billable components of Google Cloud: Vertex AI

I assume this is according to the model selected, and in the text, the pricing is according to the characters in/out.

What about the costs of the tool's execution?
Are there other hidden costs of the langchain agent?

koverholt · 09-03-2024 09:06 AM

@kingychiu, all good questions! So that it's easier to keep track of questions / answers for other folks. Could you post this as a new discussion topic on the AI/ML forums here (https://www.googlecloudcommunity.com/gc/AI-ML/bd-p/cloud-ai-ml)? And you can tag me there so that we can dig into the answers & discussion from there. Thanks!

koverholt · 09-03-2024 09:09 AM

@nadav_w, thanks for the question on billable components! Essentially there are usage-based costs on the hosted agent endpoint in Reasoning Engine, similar to Cloud Functions or other serverless compute options, and then there are usage-based costs for calls to the Gemini API for content generation & function calling (both of which are metered by token counts). Currently Reasoning Engine is in Public Preview stage, and when it goes to GA stage, full details on pricing will be posted in its documentation.

DanieleV · 09-13-2024 06:57 AM

Hi @koverholt , Thank you so much for the post.

I was wondering since it is possible to assign multiple tools to the agent we created, s there any way to help the agent choose the tool to use correctly?
Is it possible instead using Langchain to orchestrate multiple agents to work simultaneously? For example one agent has task to do conversion between currencies of different countries and another one to do conversion on time zone between different countries and we want to create a structure where depending on the demand it responds the appropriate one. Is there any documentation on this?

Thank you very much for your availability

koverholt · 09-13-2024 10:52 AM

@DanieleV: Great question! This is a quickly evolving field and I appreciate you asking about this. To handle (and improve) use cases with multiple tools and agents, I suggest the following (from simplest to more complex):

When defining your Python functions as tools that you use with Reasoning Engine, the more details that you give in your function name, type hints for parameters, and docstrings (including few-shot or many shot-examples of tool invocations that you consider good or bad), this will heavily influence how and when your agent will predict the use of a given tool and its associated parameters.

Beyond Reasoning Engine, you can look at what's behind the LangChain template in the customization section of the Reasoning engine docs, and also suggest looking at or trying the implementation of Tool Calling Agents in LangChain and/or LangGraph Tool Calling to understand the differences in performance of how much of the "function calling reasoning" you want to defer to the LLM layer (LangChain & LangGraph approach) vs. the function calling layer of the LLM (Function Calling / Reasoning Engine approach).

Finally, the only way to know quantitatively which approach is working better than others is through evaluation frameworks like the Gen AI Evaluation Service in Vertex AI, and you can find lots of sample notebooks of that here in the generative-ai repo on GitHub.

nadav_w · 09-17-2024 01:21 AM

Hi @koverholt ,

can you explain the deployment of the reasoning engine and the difference between this approach and the approach mentioned with LangServe and CloudRun as presented in this blog?

DanieleV · 09-17-2024 02:42 AM

Hi @koverholt

thanks for the quick response, you probably already explained this but could you clarify again what are the advantages of a custom approach with LangChain and the ReasoningEngine instead of Agent Builder? Is it just a question of which framework is better or can we have cost and performance advantages?

Thanks again

emerworth · 09-17-2024 08:04 AM

Hi @koverholt,

I wanted to inquire about the possibility of implementing streaming with the Reasoning Engine in the future. Currently, I’m developing a chatbot using RE, but we need to stream the responses to enhance the user experience.

koverholt · 09-17-2024 10:25 AM

@nadav_w, thanks for the question, this is a common one! In the end, solutions such as Reasoning Engine, LangServe + Cloud Run, and other deployment options are all just different ways of deploying and hosting your agent as Python code. Some developers prefer to work directly with Cloud Run, while other developers prefer to work at a higher-level abstraction such as LangServe on Cloud Run -or- Reasoning Engine. If I'm starting with the agent and building the app around it, I appreciate starting with Reasoning Engine or LangServe. If I'm starting with an app that does more than just interact with an agent, I'll typically start from Cloud Run. And as your agent and app grow in complexity, you can switch between approaches to make the app / agent more modular and maintainable!

@DanieleV, good question! The answer here is similar to the previous question on deployment, but this time focuses on the developer's experience building the agent rather than deployment. Reasoning Engine, LangChain, and Agent Console are just different ways of constructing agents at different abstraction levels. If you spend most of your day at the LangChain layer, it might make sense to just use LangChain or LangGraph directly in your code. If you spend most of your day working with Google Cloud SDKs and APIs such as Vertex AI, then you might find Reasoning Engine the easiest to work with. Or if you want to quickly prototype an agent that matches up with the chatbot + RAG approach, then Agent Console is a good starting point. I often prototype simple versions of agents in 2 or more tools to get a feel for which approach will work best for a given use case.

@emerworth, yes! This has been a common feature request in Reasoning Engine and is being worked on. Feel free to open a new feature request on the public issue tracker and point me to it. That way we can learn more about your use case and let you know when it's ready for testing / usage!