This blog has been co-author with Julia Wiesinger, ADK Product Manager at Google Cloud.
TL;DR: Build AI agents with long-term memory using Vertex AI Memory Bank with Agent Development Kit (ADK) agent. This guide shows you how to create stateful, personalized experiences.
When building AI agents for production, there are use cases that require you to solve the memory problem. Without it, the process of developing AI agents that are truly helpful is undermined, as they become "digital goldfish," treating every interaction as the first. A common workaround is to use the LLM's context window as a makeshift memory, but this approach is unsustainable. It's expensive, inefficient, and leads to issues like “lost in the middle” and “context rot”—a rapid decline in output quality as the growing context becomes diluted with irrelevant details.
Hitting the token window limit, combined with these issues, presents a core challenge when developing AI agents. True agent memory requires more than just storing facts; it needs the ability to intelligently forget. This is precisely where Vertex AI Agent Engine Memory Bank excels. As a managed service, it provides persistent memory for agents, enabling more natural, contextual, and continuous user engagements.
Vertex AI Memory Bank is a fully managed service that provides persistent, long-term memory for AI agents, moving beyond the limitations of an LLM's context window. If you are building AI agents, this provides several key advantages:
To dive deeper into Memory Bank's capabilities and explore practical use cases, explore the official blog or get started with the SDK examples.
Memory Bank simplifies the implementation of persistent, long-term memory for your agents through a straightforward API and SDK. It integrates natively with the Agent Development Kit (ADK) and supports popular frameworks such as LangGraph and CrewAI.
Here’s the practical workflow, matching each step with its corresponding SDK method:
Your agent can create new memories as Memory Bank automatically extracts facts from conversation history at the user level. This asynchronous process ensures your agent experiences no delays, maintaining responsiveness. To explicitly create a single memory, you can use create_memory() to add a specific fact to the user's collection. Most commonly, you will use generate_memories() to analyze a conversation history and have Gemini extract and store salient facts as shown below.
# pip install "google-cloud-aiplatform>=1.100.0"
import vertexai
# Initialize client
client = vertexai.Client(
project="your-project",
location="your-region",
)
# Create Agent Engine if not exists
agent_engine = client.agent_engines.create()
# Extract memories from a conversation event
operation = client.agent_engines.generate_memories(
name=agent_engine.api_resource.name,
direct_contents_source={"events": [{"content": {
"role": "user",
"parts": [{"text": "This is a user conversation"}],
}}]},
scope={"user_id": "123"},
)
Vertex AI Memory Bank automatically keeps memories up to date. When generate_memories() is called with new information that relates to an existing memory, it uses Gemini to intelligently consolidate the facts, resolving contradictions. For instance, if an initial memory states the user had "I always have fruity ice-cream" and a new conversation reveals its preference is now "I had vanilla ice-cream" Memory Bank will merge this information, ensuring the memory reflects the most current state.
Your agent can retrieve memories to provide context for its responses. You can either retrieve all memories for a user or perform a similarity search to find the most relevant ones for the current query. To retrieve all memories (simple retrieval), you can use the retrieve_memories() method. Below you can see how to retrieve the most relevant memories (Similarity Search) using the Vertex AI Memory Bank SDK for Python.
import vertexai
# Initialize client
client = vertexai.Client(
project="your-project",
location="your-region",
)
# Create Agent Engine if not exists
agent_engine = client.agent_engines.get(name="your-agent-engine-resource-name")
# Find relevant memories based on the current query
client.agent_engines.retrieve_memories(
name=agent_engine.api_resource.name,
scope={"user_id": "123"},
similarity_search_params={
"search_query": "This is a user query",
"top_k": 10
},
)
This workflow ensures your agent can maintain continuity across sessions and provide truly personalized responses by always having the right information at the right time.
You can integrate Memory Bank into your agent using the Agent Development Kit (ADK) for an out-of-the-box experience. Here’s a step-by-step guide to build AI agents with persistent memory using the ADK, based on our sample notebook.
After you have set up your Google Cloud project and authenticated your environment, you can get a memory-enabled agent running in just a few steps.
To access Agent Engine Sessions and Memory Bank, the first step is to create an Agent Engine instance. This provides the backend for your agent memory.
import vertexai
client = vertexai.Client(
project="your-gcp-project-id",
location="us-central1",
)
agent_engine = client.agent_engines.create()
With the engine created, you can define your local ADK agent. Your agent needs a Memory tool to control when and how it accesses stored information. This example uses PreloadMemoryTool, which retrieves memories at the start of each turn and places them in the system instruction.
from google import adk
agent = adk.Agent(
model="gemini-2.5-flash",
name='my-agent',
instruction="You are an Agent...",
tools=[adk.tools.preload_memory_tool.PreloadMemoryTool()]
)
Before interacting with your agent, instantiate VertexAiMemoryBankService for memory and VertexAiSessionService to manage conversations. Also you need to instantiate the Runner which orchestrates Memory and session services in combination with the agent and tools.
from google.adk.memory import VertexAiMemoryBankService
from google.adk.sessions import VertexAiSessionService
# Get the ID from your created Agent Engine instance
agent_engine_id = agent_engine.api_resource.name.split("/")[-1]
app_name = "your-app-name"
memory_service = VertexAiMemoryBankService(
project="your-gcp-project-id",
location="us-central1",
agent_engine_id=agent_engine_id
)
session_service = VertexAiSessionService(
project_id="your-gcp-project-id",
location="us-central1",
agent_engine_id=agent_engine_id
)
runner = adk.Runner(
agent=agent,
app_name=app_name,
session_service=session_service,
memory_service=memory_service
)
Now that the agent and its memory services are configured, you can interact with it to see how it remembers information across two different conversations.
First, you'll have an information-gathering session where you provide the agent with specific facts. You start by creating a new session for a user and then have a conversation. For example, you might tell the agent: "Hi, I work as an agent engineer” or "I love hiking and have a dog named Max"
At the end of this conversation, you pass the session history to the memory service. This triggers Memory Bank to asynchronously extract and store these key facts.
# Create the first session for a new user
session1 = await runner.session_service.create_session(
app_name=app_name,
user_id=USER_ID,
)
# Interact with the agent to provide information...
chat_loop(session1.id, USER_ID)
# Hi, can I help you today?
# Get the completed session and trigger memory generation
completed_session = await runner.session_service.get_session(app_name=app_name, user_id=USER_ID, session_id=session1.id)
await memory_service.add_session_to_memory(completed_session)
Next, in a separate memory-recall session, you can test the agent's ability to remember. You start a new session with the same user ID. Because the agent is configured with a memory tool, it will now retrieve the stored facts. When you ask questions that require context from the first conversation, the agent can answer intelligently. For example: "What do you remember about me?" or "What is my dog's name?". The agent, recalling the previous session, will be able to tell you about your profession, your hobbies, and that your dog's name is Max.
# Create a new session with the same user
session2 = await runner.session_service.create_session(
app_name=app_name,
user_id=USER_ID,
)
# Interact again to see the agent recall the stored information
chat_loop(session2.id, USER_ID)
# You are an agent engineer...
Once you are confident with the Vertex AI Memory Bank SDK for Python, you can proceed to a simple web application like the one below to show how enabling persistent memory for agents fundamentally transforms the user experience.
By maintaining continuity and remembering user preferences across interactions, agents eliminate the frustration of repetitive information, leading to an engaging experience.
Ready to get started? Sign up via express mode registration with your Gmail account to receive an API key and test Memory Bank capabilities within the free tier usage quotas. Then when ready, scale your applications on Vertex AI. For more in-depth information:
Memory Bank is currently in public preview, and your feedback is invaluable as we continue to evolve the product. For questions or feedback, please reach out to us at vertex-ai-agents-preview-support@google.com.