Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Usage of spot machines while training in Vertex AI

Hello GCP community, I have the following question, I am training in Vertex using a custom container, I am porting pipelines that were in Kubeflow to vertex and using this to train:

from google.cloud import aiplatform

job = aiplatform.CustomContainerTrainingJob(display_name="training-job", container_uri=container_uri)
# define training code arguments
training_args = ["--num-epochs", "2", ]
model = job.run(
replica_count=1,
machine_type="n1-standard-8",
accelerator_type="NVIDIA_TESLA_V100",
accelerator_count=1,
args=training_args,
sync=False,
)

It looks ok, but here is my question is there anyway in which I can do the training but in a SPOT machine to try to reduce my training costs.

Thanks! 

Solved Solved
0 2 2,053
1 ACCEPTED SOLUTION

Hi David

No unfortunately there is no support for spot / preemptible instances with Vertex AI. 

View solution in original post

2 REPLIES 2

Hi David

No unfortunately there is no support for spot / preemptible instances with Vertex AI. 

Is there a way to attach persistent disks when submitting a custom training job using Vertex AI?