Cloud Storage deletes across multiple buckets seem...

pfath · 10-29-2024 02:36 PM

We have 3 different Google Cloud Storage buckets from which we are continuously deleting objects, based on an object-specific retention period (our own business logic). One bucket ("bucket 1") is multi-region (us), and the other two ("buckets 2 and 3") are regional (us-central1). Right now, we are averaging around 18 deletes per second on bucket 1 (multi-region), 30 deletes per second on bucket 2, and 220 deletes per second on bucket 3. These deletes are performed by Node.js workers running in GKE pods, also running in us-central1.

Our bucket 3 deletes are obviously the highest-throughput. They are on their own, separate 5 GKE nodes. However, at times, bucket 3 deletes will pause, and when that happens, the rate of deletes for buckets 1 and 2 will jump to about 90 deletes/second.

We have also seen similar correlation between delete and write/upload throughput, although it's more complicated to compare, since uploads are customer-driven.

Given these are 3 entirely separate buckets, it doesn't seem like they should bottleneck each other with deletes. Also, none of these buckets are exceeding the 1,000/s API limit for write/upload; the closest is bucket 3 which sometimes peaks around 850/s, but doesn't reach 1,000.

Are there shared resources that could be causing these throughputs to be shared? Are there ways for us to decouple these delete processes so that they won't interfere with each other?

greb

Hi @pfath,

It seems that you are experiencing some unanticipated activity when deleting items from your buckets. While each bucket is distinct, there are several potential reasons and remedies to take into account.

Even with separate buckets, there may be backend contention since all deletes are in the same region. Google Cloud could be applying rate limits across regional resources under high load.

Recommendations

Stagger Delete Operations: Try staggering deletes across buckets to avoid simultaneous peaks, which could ease backend contention.
Separate Regions for GKE Nodes: If feasible, run GKE nodes for bucket 1 deletes in a different region than the others. This might reduce regional bottlenecks.
Monitor Metrics: Tracking latency and errors per bucket could reveal if certain actions are causing delays.
Reach Out to Support: Google Cloud Support might confirm if there’s a shared backend bottleneck affecting your operations and suggest tweaks.

These options should help you better isolate the delete operations, reduce contention, and potentially improve the throughput consistency across your buckets.

I hope the above information is helpful.

pfath

Thanks. We had tried some of those already, and others are not feasible. That said, we seem to have found a lot of success with using batch delete calls. Unfortunately there is no supported Node.js client for batch deletes, but we were able to craft one with the batch API and delete up to around 500 objects per second in the bucket, with no impact to deleting in other buckets or writing in any buckets. We'll keep pursuing that option and see if it helps us clean up all our buckets.

Cloud Storage deletes across multiple buckets seem to share throughput