Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Configuring Vertex AI for use in Multiple Locations

Hi all,

I am currently setting up Vertex AI using the @Google/genai SDK for Node.js library. I have multiple users from different locations with individual credentials and am getting stuck on how to initialize the model location and endpoints. 

The call for setting up the config is:  https://googleapis.github.io/js-genai/release_docs/interfaces/client.GoogleGenAIOptions.html 

 

import {GoogleGenAI} from '@google/genai';
const ai = new GoogleGenAI({
  vertexai: true,
  project: 'PROJECT_ID',
  location: 'PROJECT_LOCATION'
});

 

From what I have read, a GCP project is not tied to a specific location. But, models seem to only be available in certain regions. If I hard-code a location in like 'us-central1' would that allow users around the world to use Vertex? Would that cause issues if their project is in a different location or create latency problems?

And, if I do not hardcode, how can I guarantee the endpoint so that if they were to use provisioned throughput, they could place the correct order?

I have had a hard time understanding exactly how this works, especially since I am new to working with GCP. I would appreciate any insight people may have!

Solved Solved
0 2 783
1 ACCEPTED SOLUTION

Hi @mbillawala,

Welcome to Google Cloud Community!

GCP projects are global, but Vertex AI resources—such as models and endpoints—are region-specific. This means when you deploy a model, it will be assigned to a particular region like us-central1, europe-west4, or asia-southeast1.

If you hardcode a region like us-central1, then:

  • Yes, users worldwide can still access the model. It's common practice to centralize models in one region for simplicity and to control costs.
  • No, it won’t block users from other regions, but depending on their distance from the region, they might experience some latency.

However, if your users are sensitive to latency (e.g., real-time applications like chat or streaming), consider multi-region deployments or region-aware routing to reduce delays.

It's important to note that while projects are global, resources like models and endpoints are tied to specific regions. As long as the user has the correct IAM permissions and access to the project, they can call the endpoint—even if they’re located elsewhere geographically or organizationally.

Best Practices:

  • Centralize in one region (e.g., us-central1) if latency is not a critical concern.
  • Use environment variables or a configuration file to dynamically manage the region, rather than hardcoding it.
  • Clearly document the region for users, so they know where to provision throughput if necessary.
  • Monitor latency and usage patterns—if you notice spikes from a specific region, consider deploying a replica there.

For further reading on how Vertex AI manages model locations, endpoints, and regional configurations, check out the following resources:

  • Deploy a Model to an Endpoint – Vertex AI: A guide on how to deploy models to regional endpoints, endpoint URL structures, and managing compute resources, including autoscaling, latency, and endpoint types.
  • Vertex AI Locations: A list of all supported regions for Vertex AI services to help you determine where to deploy your models and which services are available in each region.
  • Vertex AI API Reference: For developers working with REST or SDKs, this reference covers endpoint formats, request structures, and authentication.
  • Vertex AI SDK for Node.js – GoogleGenAIOptions: This documentation outlines the project and location fields and explains how they affect endpoint resolution.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

View solution in original post

2 REPLIES 2