Code runs slower in Autopilot cluster compared to ...

bennetthardwick · 01-10-2025 12:36 AM

I've been doing some benchmarking of difference HTTP request libraries across Go and Rust and I've noticed that the Rust code is running much slower in a Pod in an Autopilot cluster compared to when it is run on a GCE instance (such as a g1-small). The code downloads from a bucket in the same region at about 210MiB/s on the g1-small - but on the Autopilot pod (where the machine-family is n1) it runs at about 100MiB/s. Weirdly this only affects the Rust code - Go for example will download at about 200MiB/s regardless of whether it is a GCE instance of a GKE pod.

I've created a repo that I've been using for reproducing the issue: https://github.com/bennetthardwick/http-download-testing. If anyone has any ideas on investigating this I would appreciate it a lot.

mokit

Hi, @bennetthardwick.

That's quite interesting project. Have you added any tracers to track the full latency path throughout the entire request?

Regards,
Mokit

bennetthardwick

Hi @mokit - do you mean application level tracing like otel or is there some GCP tool I can use to trace the requests?

mokit

Yes, you can implement OpenTelemetry or a similar Google service, such as GCP Cloud Trace.

bennetthardwick

Thanks @mokit I've been able to reproduce my issue by running the test container with `docker run --cpus "0.1"` so it seems it's an issue with Rust when it is limited by the scheduler.

mokit

That's great to hear @bennetthardwick 🎉!

Code runs slower in Autopilot cluster compared to GCE instance