Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Persistent 503 Error at 120s Despite 300s Timeout and Probe Settings - Need Urgent Help

Hi everyone,

I’m at my wits’ end with a persistent issue on my Cloud Run service, `deepfake-news-detector-api` in the europe-west1 region.  My service worked perfectly days ago but now fails with a 503 error after exactly 120 seconds, despite needing 3-4 minutes (180-240s) for a cold start. I need assistance to get it back online.

**Details:**
- **What Worked**: Days ago, the service ran fine with 16 GiB memory, 4 CPUs, and a 300-second timeout, handling the cold start without issues.
- **Current Issue**: Since a few days ago, it consistently returns 503 after 120 seconds. I’ve tried:
- Memory: 16 GiB.
- CPU: 4.
- Request Timeout: 300s (also tried 600s).
- Startup Probe: Initial Delay 60s, Timeout 600s, Failure Threshold 5.
- Minimum Instances: 0.
- **Logs**: Show a SIGABRT (segmentation fault) in `gunicorn/arbiter.py` during worker initialization, followed by “Startup probe failed” or instance shutdown.
- **Latest Logs (via CLI)**: [Paste the last 5 lines from `gcloud logging read` here if you have them—run the command below first.]

**Command to Reproduce:**
```cmd
gcloud logging read "resource.type=cloud_run_revision resource.labels.service_name=deepfake-news-detector-api resource.labels.region=europe-west1" --limit=5 --freshness=1h --format="value(textPayload)" 

What I’ve Tried:

  • Reverted to the original working setup (16 GiB, 4 CPUs, 300s timeout).
  • Adjusted startup probe to wait up to 50 minutes (5 x 600s).
  • Cleared environment variables (e.g., GUNICORN_TIMEOUT).
  • Deployed via both CLI and Console—same 120s 503.

Can anyone from the community or Google explain why the 120s limit persists despite my settings? Is this a Cloud Run bug? How do I fix the SIGABRT crash? I need my service back online urgently—please help!

Thanks,
[Maxim]

0 1 107
1 REPLY 1

Hi @Maximkaa,

Welcome to Google Cloud Community!

According thru my observation, your Cloud Run Service seems to have a fixed 120 seconds cold start timeout while encountering the 300s or 600s timeout settings. The SIGABRT triggers in gunicorn/arbiter.py suggests that there is an issue with process startup which is likely due to faulty memory allocation or threading issues.

Possible Causes & Fixes

  1. Google’s 120s Cold Start Limit
  2. Startup Probe Adjustments
    • Set initialDelaySeconds=20, failureThreshold=10, and periodSeconds=10.
    • Log debug info in gunicorn.conf.py.

  3. Reduce Memory Usage
    • Limit GUNICORN_WORKERS to 1 and track memory usage (gcloud logging read with severity>=ERROR).

  4. Workaround: Keep Instances Warm
    • Set a lightweight warm-up service to ping Cloud Run every few minutes.

  5. Test in GCE VM or "CPU Always Allocated" Mode
    • This avoids cold starts altogether.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.