Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Custom Training Job at Vertex AI fails

Dear all,

I am trying to fine tune a model in Vertex AI using custom training option. During the execution, I had observed 3 issues.

1. The model is not stored in GCS despite giving the following options.

  • output_dir specification in TrainingArguments of Transformers library
  • train.save_model(output_dir)

2. During the training when the model reports its progress as shown in the screenshot, it flags that line as Error though there is no error message shown. What's the error here? How do I understand this msg or resolve this? There are 100s of failures similar to this which don't appear to be truly a failure.

sureshAZ_0-1733201702987.png

3. The job shows failed at the end without specifying the reason. Is it because of all these errors reported as shown in 2.

Please advise. Thanks for your help.

 

Suresh

 

0 2 397
2 REPLIES 2