Re: GCP Secret Manager empty reply/broken pipe on ...

BrianPez · 01-22-2024 11:16 AM

I originally posted this on StackOverFlow (T)

And in the Cloud SDK for PHP on GitHub (https://github.com/googleapis/google-cloud-php/issues/6960)

I have an Apache web page (Unbuntu 22 VM) that is loaded hundreds to thousands of times a minute. On every page load, a call via the Cloud SDK for PHP for a single secret from Secret Manager is made. On the ten minute boundaries (:10, :20, etc), many fail with one of four PHP errors:

cURL error 35: error:0A000126:SSL routines::unexpected eof while reading (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://secretmanager.googleapis.com/v1/projects/20988994674/secrets/prod-frontend-db-app-password/v...

cURL error 55: OpenSSL SSL_write: Broken pipe, errno 32 (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://secretmanager.googleapis.com/v1/projects/20988994674/secrets/prod-frontend-db-app-password/v...

cURL error 56: OpenSSL SSL_read: error:0A000126:SSL routines::unexpected eof while reading, errno 0 (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://secretmanager.googleapis.com/v1/projects/20988994674/secrets/prod-frontend-db-app-password/v...

cURL error 56: OpenSSL SSL_read: error:0A000126:SSL routines::unexpected eof while reading, errno 32 (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://secretmanager.googleapis.com/v1/projects/20988994674/secrets/prod-frontend-db-app-password/v...

Volume doesn't seem to matter. I have some examples when not on a ten minute boundary where 2k+ requests work fine. I have examples on the ten minute boundary where it fails with 650 page loads. The issue on the GitHub was closed saying it is a server-side or network error, since the server sent no data.

Here is are some examples from Friday

13:18 :: requests 147 :: cURL error 0
13:19 :: requests 1042 :: cURL error 104
13:20 :: requests 1245 :: cURL error 602
13:21 :: requests 1644 :: cURL error 0
13:22 :: requests 707 :: cURL error 0

13:38 :: requests 136 :: cURL error 0
13:39 :: requests 2005 :: cURL error 0
13:40 :: requests 835 :: cURL error 671
13:41 :: requests 1404 :: cURL error 104
13:42 :: requests 497 :: cURL error 0

13:58 :: requests 116 :: cURL error 0
13:59 :: requests 643 :: cURL error 35
14:00 :: requests 900 :: cURL error 681
14:01 :: requests 1516 :: cURL error 261
14:02 :: requests 938 :: cURL error 0

I don't see anything related to quotes that should affect this (https://cloud.google.com/secret-manager/quotas#request-rate-quotas). I am only reading the secret value, so that should fall in the access request of 90k per minute. If I was hitting the read request of 600 per minute, I should see errors any time I got over, which I am not.

I have changed the OS level keep-alive as per this (https://cloud.google.com/compute/docs/troubleshooting/general-tips#idle-connections) with no effect.

lawrencenelson

Hello @BrianPez,

Welcome to the Google Cloud Community!

The broken pipe error typically happens when your request either gets blocked or exceeds a certain time limit. After reaching the timeout on the requesting side, the connection is closed. Consequently, if the responding side (server) attempts to write to the socket after this closure, it will encounter a broken pipe error [1][2].

You mentioned that your Apache webpage loads hundreds to thousands of times per minute. I suggest replicating your VM using another test VM and reducing your calls to the Secret Manager to about a quarter of the original requests. This approach will help us identify the root issue.

[1]. https://stackoverflow.com/questions/11866792/how-to-prevent-errno-32-broken-pipe

[2]. https://github.com/hashicorp/consul/issues/19622

BrianPez

@lawrencenelson Thank you for the reply.

What I don't understand is the 10 minute boundary. I don't see anything in the documentation beside a 90k limit per minute. If you look at the data in my original post, I have requests on the non-10 minute boundary that are higher without failures (for example 13:21: 1644 and 13:39: 2005) than on the boundary with failures (13:20: 1245 w/602 failures, 14:00: 900 w/681 failures).

Based on the second link, is there any why to determine if I am hitting some sort of rate limit? Since I am getting a broken pipe or empty response, I am not getting an error message from the server which might indicate throttling.

I have previously done what you suggested for load splitting. I took the 850 devices and split them between two VMs (400 and 450 devices). The Google Cloud SDK post has the raw data examples for twenty minute blocks for the 850 and the 450 devices. I did not see failures on the 450, and I saw the higher request counts without a failure. For example,

08:19 2242
08:20 2089

Since that post was written, I started seeing failures on both the 400 and 450 VMs.

GCP Secret Manager empty reply/broken pipe on Ubuntu 22 VM in PHP