Custom model training with preemptible machines?

jasonbrancazio · 11-16-2023 03:53 PM

This post asked a similar question last year:
https://www.googlecloudcommunity.com/gc/AI-ML/Usage-of-spot-machines-while-training-in-Vertex-AI/m-p...

However, I just noticed several IAM quotas for custom model training with preemptible machines, e.g. "Custom model training preemptible Nvidia T4 GPUs per region"

Type "Quota:Custom model training preemptible" in the Filter on this page and you will see the quotas:
https://console.cloud.google.com/iam-admin/quotas

Has anything changed? I still don't see a way to specify preemptible training in the python SDK or in the REST API.

Poala_Tenorio

Google Vertex AI didn't have native support for specifying preemptible instances directly through the Python SDK or REST API for custom model training. The ability to use preemptible instances for training wasn't directly exposed in the API or SDK, and users had to manually manage this aspect themselves, typically by launching preemptible instances separately and setting up the training environment manually.

However, since technology and services can evolve rapidly, it's possible that there have been updates or new features introduced since then. The IAM quotas you mentioned might indicate that Google has introduced or is planning to introduce more explicit support for preemptible instances in custom model training.

carlthome

Any updates on this limitation for 2025? Seems like a strange choice to not let users rely on preemptible instances with gcloud ai custom-jobs create