Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Setup API gateway to call vertex ai endpoint

Hi

i am working on a POC to call vertex ai endpoint via API Gateway, i am having issue writing api congif as per vertex ai specification.

Is there any other way i can make use of another secure layer before calling vertex ai endpoint from application?

3 REPLIES 3

You can consider implementing a proxy service or a serverless function that acts as an intermediary between your application and the Vertex AI endpoint. This intermediary service can handle the API configuration, authentication, and any additional processing you require. Your proxy service function should be designed to accept incoming HTTP requests from your application.

Authenticate the incoming requests to ensure they are authorized to access the proxy service.
You might use API keys, OAuth tokens, or other authentication mechanisms for this purpose.
Verify that the requester has the necessary permissions to make requests to the proxy.

Choose Google Cloud Functions as a serverless function that will serve as your secure proxy service. Implement the proxy service logic in this function. This logic should include making a secure authenticated request to the Vertex AI endpoint on behalf of your application.

Here is a quickstart to get you familiarized with the service.

Is there anywhere to get additional detail around this?  I'm specifically looking for clear guidance on how to create the required OpenAPI-compatible specification for Gemini Flash 1.5 chat completions (responses) on Vertex AI.  I've found the Gemini Flash API reference but I've not found a way to systematically create a reliable API spec.  The rest of the setup is pretty straightforward, though without some better support from Google on the Gemini Flash API spec - this is very much like throwing a dart.

I think a general lack of published OpenAPI specs for some of these hyperscaler LLMs is inhibiting adoption of these platforms.  In the majority of large enterprise ecosystems, hybrid and multi-cloud computing is prevalent and the corporate systems that rely on these LLMs can be located anywhere - including outside of the hyperscaler environment.  Providing clear API specs for all published LLMs will facilitate simpler integration into applications and GenAI use cases.  Until that happens, these vendor tools will remain in silos.

Is there anywhere to get additional detail around this?  I'm specifically looking for clear guidance on how to create the required OpenAPI-compatible specification for Gemini Flash 1.5 chat completions (responses) on Vertex AI.  I've found the Gemini Flash API reference but I've not found a way to systematically create a reliable API spec.  The rest of the setup is pretty straightforward, though without some better support from Google on the Gemini Flash API spec - this is very much like throwing a dart.

I think a general lack of published OpenAPI specs for some of these hyperscaler LLMs is inhibiting adoption of these platforms.  In the majority of large enterprise ecosystems, hybrid and multi-cloud computing is prevalent and the corporate systems that rely on these LLMs can be located anywhere - including outside of the hyperscaler environment.  Providing clear API specs for all published LLMs will facilitate simpler integration into applications and GenAI use cases.  Until that happens, these vendor tools will remain in silos.