503 (TARGET_CONNECT_TIMEOUT) when calling OpenAI A...

jamesw96

Since late night on June 2nd, I've been getting 503 responses from my Apigee proxy endpoints that call the OpenAI API. The rate varies from 25%-100% of the time. I didn't make any proxy or configuration changes recently. I'm not hitting OpenAI API limits.

I have different endpoints for "chat completions" and "speech recognition" that call OpenAI, and they're both affected. APIs that don't call OpenAI's API (e.g. they call services on Azure instead) don't have this issue.

The message I get during CORSResponseOrErrorFlowExecution during postflow is:

Status: 503
Reason phrase: Service Unavailable
Body: {"fault":{"faultstring":"The Service is temporarily unavailable","detail":{"errorcode":"messaging.adaptors.http.flow.ServiceUnavailable","reason":"TARGET_CONNECT_TIMEOUT"}}}

I've tried:

Making the same request from the client sometimes succeeds, and sometimes it receives a 503. The rate varies a lot. I investigated a debug trace and I couldn't see significant differences between calls that succeeded and failed.
If I make the same calls directly to the OpenAI API on my laptop, they always succeed. If I make the calls via the Apigee proxy, they fail intermittently.
Running through the suggestions at https://cloud.google.com/apigee/docs/api-platform/troubleshoot/playbooks/runtime/vpc-503-target-conn... didn't reveal anything (e.g. changing the connection timeout didn't help). I'm using ephemeral IPs, not Nat IPs.

Do you recommend any other debugging steps, or any kind of network config I can investigate? Might moving to Nat IPs help?

Also, is there any way for you to check that connections to "api.openai.com" work reliably? I'm not getting far with OpenAI's support bot but they suggested:

Check that outbound HTTPS (TCP 443) traffic to api.openai.com is reliably allowed.
No SSL/TLS inspection or decryption is breaking or delaying your requests.
DNS resolution to api.openai.com is functioning without intermittent failures.

a_aleinikov

Hi @jamesw96 ,

It sounds like the issue may be with outbound connectivity from Apigee to OpenAI. A few quick tips:

Try using NAT IPs instead of ephemeral ones — this often helps with stability.
Make sure outbound traffic to api.openai.com (TCP 443) is fully allowed, with no TLS/SSL inspection or DNS issues.
If direct calls from your laptop work, the problem is likely in the Apigee network path.
A Cloud NAT gateway can help if you need consistent IPs and better control.

jamesw96

Unfortunately I couldn't get this to work following the suggestions above. I also submitted a Google Cloud support ticket which was a useless exercise as the support person was just telling me untrue things and didn't know what he was doing with Apigee. I ended up changing from the OpenAI API to Azure OpenAI, which is working just fine.

503 (TARGET_CONNECT_TIMEOUT) when calling OpenAI API since late night on June 2nd