How to deploy tpu workloads on GKE Autopilot?

sennyhan · 11-19-2024 04:30 AM

Hi community!
I try to deploy tpu workloads on GKE Autopilot,but it does'nt work. My autopilot cluster is in region us-east5 and kubernetes version 1.30.5-gke.1443001. Then I deploy an job using tpu-v5-lite-device. According to guide(https://cloud.google.com/kubernetes-engine/docs/how-to/tpus-autopilot ), GKE should automatically create tpu nodes to run the Pods，but it does'nt create tpu vm。And GKE Autopilot cluster's detail info shows that cloud tpu feature is disabled。

How to enable tpu feature on GKE Autopilot?

Please advice
Good Day

nmagcalengjr

Hi @sennyhan,

Welcome to Google Cloud Community!

Based on this documentation, In Autopilot, you need to choose a TPU type and topology, then specify them in your Kubernetes manifest. GKE manages provisioning nodes with TPUs and scheduling your workloads. To use TPUs in Autopilot workloads, you request a TPU version and a supported topology for that TPU version in your workload manifest.

For your reference, this page helps you to choose the Google Kubernetes Engine (GKE) mode of operation that's the best fit for your workloads.

Consider checking these prerequisite requirements for your project:

If you need further assistance, you can reach out to Google Cloud Support at any time.

I hope the above information is helpful.

sennyhan

Thank you very much! I have a question here。

which requirement is not satisfied when GKE Autopilot cluster's detail info shows that cloud tpu feature is disabled。