How to build AI agents with long-term memory using...

ilnardo92

This blog has been co-author with Julia Wiesinger , ADK Product Manager at Google Cloud.

TL;DR: Build AI agents with long-term memory using Vertex AI Memory Bank with Agent Development Kit (ADK) agent. This guide shows you how to create stateful, personalized experiences.

When building AI agents for production, there are use cases that require you to solve the memory problem. Without it, the process of developing AI agents that are truly helpful is undermined, as they become "digital goldfish," treating every interaction as the first. A common workaround is to use the LLM's context window as a makeshift memory, but this approach is unsustainable. It's expensive, inefficient, and leads to issues like “lost in the middle” and “context rot”—a rapid decline in output quality as the growing context becomes diluted with irrelevant details.

Hitting the token window limit, combined with these issues, presents a core challenge when developing AI agents. True agent memory requires more than just storing facts; it needs the ability to intelligently forget. This is precisely where Vertex AI Agent Engine Memory Bank excels. As a managed service, it provides persistent memory for agents, enabling more natural, contextual, and continuous user engagements.

Why Vertex AI Memory Bank?

Vertex AI Memory Bank is a fully managed service that provides persistent, long-term memory for AI agents, moving beyond the limitations of an LLM's context window. If you are building AI agents, this provides several key advantages:

Eliminates undifferentiated work: Instead of building and managing your own memory infrastructure (e.g., vector databases, retrieval logic, and data pipelines), you get a managed solution out-of-the-box.
Solves for scale: It provides scalable, long-term agent memory that is more efficient than repeatedly populating a large context window.
LLM-based memory management: Memory Bank uses Gemini to intelligently handle memory operations. It automatically consolidates and resolves conflicting facts, ensuring the agent's memory is always up-to-date and relevant without manual intervention.
Seamless integration: It is designed to work with popular agent development frameworks, offering native integration with the Google Cloud Agent Development Kit (ADK), as well as support for LangGraph and CrewAI.

To dive deeper into Memory Bank's capabilities and explore practical use cases, explore the official blog or get started with the SDK examples.

How Memory Bank works with the new SDK

Memory Bank simplifies the implementation of persistent, long-term memory for your agents through a straightforward API and SDK. It integrates natively with the Agent Development Kit (ADK) and supports popular frameworks such as LangGraph and CrewAI.

Here’s the practical workflow, matching each step with its corresponding SDK method:

1. Generate, store and consolidate memories

Your agent can create new memories as Memory Bank automatically extracts facts from conversation history at the user level. This asynchronous process ensures your agent experiences no delays, maintaining responsiveness. To explicitly create a single memory, you can use create_memory() to add a specific fact to the user's collection. Most commonly, you will use generate_memories() to analyze a conversation history and have Gemini extract and store salient facts as shown below.

# pip install "google-cloud-aiplatform>=1.100.0" 

import vertexai

# Initialize client
client = vertexai.Client(
    project="your-project",
    location="your-region",
)

# Create Agent Engine if not exists
agent_engine = client.agent_engines.create()

# Extract memories from a conversation event
operation = client.agent_engines.generate_memories(
    name=agent_engine.api_resource.name,
    direct_contents_source={"events": [{"content": {
        "role": "user",
        "parts": [{"text": "This is a user conversation"}],
    }}]},
    scope={"user_id": "123"},
)

Vertex AI Memory Bank automatically keeps memories up to date. When generate_memories() is called with new information that relates to an existing memory, it uses Gemini to intelligently consolidate the facts, resolving contradictions. For instance, if an initial memory states the user had "I always have fruity ice-cream" and a new conversation reveals its preference is now "I had vanilla ice-cream" Memory Bank will merge this information, ensuring the memory reflects the most current state.

2. Recall relevant information

Your agent can retrieve memories to provide context for its responses. You can either retrieve all memories for a user or perform a similarity search to find the most relevant ones for the current query. To retrieve all memories (simple retrieval), you can use the retrieve_memories() method. Below you can see how to retrieve the most relevant memories (Similarity Search) using the Vertex AI Memory Bank SDK for Python.

import vertexai

# Initialize client
client = vertexai.Client(
    project="your-project",
    location="your-region",
)

# Create Agent Engine if not exists
agent_engine = client.agent_engines.get(name="your-agent-engine-resource-name")

# Find relevant memories based on the current query
client.agent_engines.retrieve_memories(
    name=agent_engine.api_resource.name,
    scope={"user_id": "123"},
    similarity_search_params={
        "search_query": "This is a user query",
	  "top_k": 10
    },
)

This workflow ensures your agent can maintain continuity across sessions and provide truly personalized responses by always having the right information at the right time.

Getting started with Vertex AI Memory Bank with ADK

You can integrate Memory Bank into your agent using the Agent Development Kit (ADK) for an out-of-the-box experience. Here’s a step-by-step guide to build AI agents with persistent memory using the ADK, based on our sample notebook.

After you have set up your Google Cloud project and authenticated your environment, you can get a memory-enabled agent running in just a few steps.

Step 1: Create the Agent Engine Instance

To access Agent Engine Sessions and Memory Bank, the first step is to create an Agent Engine instance. This provides the backend for your agent memory.

import vertexai

client = vertexai.Client(
    project="your-gcp-project-id",
    location="us-central1",
)

agent_engine = client.agent_engines.create()

Step 2: Define the ADK Agent

With the engine created, you can define your local ADK agent. Your agent needs a Memory tool to control when and how it accesses stored information. This example uses PreloadMemoryTool, which retrieves memories at the start of each turn and places them in the system instruction.

from google import adk

agent = adk.Agent(
    model="gemini-2.5-flash",
    name='my-agent',
    instruction="You are an Agent...",
    tools=[adk.tools.preload_memory_tool.PreloadMemoryTool()]
)

Step 3: Initiate Memory, session services and Runner

Before interacting with your agent, instantiate VertexAiMemoryBankService for memory and VertexAiSessionService to manage conversations. Also you need to instantiate the Runner which orchestrates Memory and session services in combination with the agent and tools.

from google.adk.memory import VertexAiMemoryBankService
from google.adk.sessions import VertexAiSessionService

# Get the ID from your created Agent Engine instance
agent_engine_id = agent_engine.api_resource.name.split("/")[-1]
app_name = "your-app-name"

memory_service = VertexAiMemoryBankService(
    project="your-gcp-project-id",
    location="us-central1",
    agent_engine_id=agent_engine_id
)

session_service = VertexAiSessionService(
    project_id="your-gcp-project-id",
    location="us-central1",
    agent_engine_id=agent_engine_id
)

runner = adk.Runner(
    agent=agent,
    app_name=app_name,
    session_service=session_service,
    memory_service=memory_service
)

Step 4: Interact with your agent across sessions

Now that the agent and its memory services are configured, you can interact with it to see how it remembers information across two different conversations.

First, you'll have an information-gathering session where you provide the agent with specific facts. You start by creating a new session for a user and then have a conversation. For example, you might tell the agent: "Hi, I work as an agent engineer” or "I love hiking and have a dog named Max"

At the end of this conversation, you pass the session history to the memory service. This triggers Memory Bank to asynchronously extract and store these key facts.

# Create the first session for a new user
session1 = await runner.session_service.create_session(
    app_name=app_name,
    user_id=USER_ID,
)

# Interact with the agent to provide information...
chat_loop(session1.id, USER_ID)

# Hi, can I help you today? 

# Get the completed session and trigger memory generation
completed_session = await runner.session_service.get_session(app_name=app_name, user_id=USER_ID, session_id=session1.id)

await memory_service.add_session_to_memory(completed_session)

Next, in a separate memory-recall session, you can test the agent's ability to remember. You start a new session with the same user ID. Because the agent is configured with a memory tool, it will now retrieve the stored facts. When you ask questions that require context from the first conversation, the agent can answer intelligently. For example: "What do you remember about me?" or "What is my dog's name?". The agent, recalling the previous session, will be able to tell you about your profession, your hobbies, and that your dog's name is Max.

# Create a new session with the same user
session2 = await runner.session_service.create_session(
    app_name=app_name,
    user_id=USER_ID,
)

# Interact again to see the agent recall the stored information
chat_loop(session2.id, USER_ID)

# You are an agent engineer...

Once you are confident with the Vertex AI Memory Bank SDK for Python, you can proceed to a simple web application like the one below to show how enabling persistent memory for agents fundamentally transforms the user experience.

demo (1).png

By maintaining continuity and remembering user preferences across interactions, agents eliminate the frustration of repetitive information, leading to an engaging experience.

What’s next

Ready to get started? Sign up via express mode registration with your Gmail account to receive an API key and test Memory Bank capabilities within the free tier usage quotas. Then when ready, scale your applications on Vertex AI. For more in-depth information:

Explore the Documentation
Review Github samples (ADK, LangGraph and CrewAI)
Read our Official blog

Memory Bank is currently in public preview, and your feedback is invaluable as we continue to evolve the product. For questions or feedback, please reach out to us at vertex-ai-agents-preview-support@google.com.