Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Time to train a new model in Vertex AI

Hello community, 

I'm trying to train my first model to classify images with Vertex AI. I set 8 hours as the budget for maximum node hours but the process run for 18 hours and I wasn't sure if I was going to be billed for the 18 hours so I canceled the training. The "Enable early stopping" was enabled. 
In addition, I used 13 images by label but I read that the recommended quantity is 100. I guess that the quantity of images might impact the training time.
Thank you! training-18hours-vs-8budget.png

2 4 4,431
4 REPLIES 4

Hi @MaxiChava

To my understanding, you are wondering why the training progressed more than the budgeted 8 node hours, given that the "Enable early stopping" is activated hence cancelling the training because it has been running more than 18 hours and you are worried to be billed accordingly.

Let me itemize these components:

  • Budget/ Node hours - is the unit of measure used to track Vertex AI resources usage. For example, running a prediction job that uses 1 machine for 1 hour, then you are charged for 1 node hour. Another example is, if an endpoint was deployed with 4 machines and if it is active for 24 hours, then you will be charged for 96 node hours. See Prediction and explanation document for detailed pricing information. 
    A node hour represents the time a virtual machine spends running your prediction job or waiting in an active state (an endpoint with one or more models deployed) to handle prediction or explanation requests.

  • Enable early stopping - if enabled, Vertex AI will monitor the metric that you specified and stop training the model when it reaches the threshold or the number of steps that you specified. This can help you save time and money because you will not be paying for the used model-training resources beyond the specified metric threshold or steps.

  • Billing - unfortunately you will still be charged because it is a user-initiated cancellation, see published Pricing for AutoML models document for your reference. 
    You pay only for compute hours used; if training fails for any reason other than a user-initiated cancellation, you are not billed for the time. You are charged for training time if you cancel the operation. 

To sum it up, the node hours will not trigger the early stopping feature because it is not related and you will still be billed for the incurred hours since you initiated the cancellation.

Hope this helps.

Thank you for your response @lsolatorio ! Very helpful! My question was more related to future uses of Vertex:
1. If I set the budget to 8 hours but I see that it continues processing for 10, 18 or 50 hours, will I be charged only for the 8 hours and I should let the process continue? 
2. I had read the "Prediction and explanation" link you sent. As far as I understand I was only charged for 8 hours x $3.465 for training in AutoML mode.

Thanks again!

@lsolatorio Did you find an answer for why it didn't stop at 8 hour mark?

Hi everyone,

I've faced with the same problem you're facing and reached out to Google Customer Support. What did I learn from them? Just check the link below, you won't be charged for the extra time that Vertex AI used. The budget hours value in pre-training step will be the maximum amount that will be spent on training.

https://cloud.google.com/vertex-ai/docs/tabular-data/forecasting/train-model#google-cloud-console:~:...