Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Custom Training Job with custom container failed with error 'Cant find specification for module...'

Hi everyone,

For some reason, my custom training Jobs with custom container keep failing on vertex AI but the local run is working fine (I verified by running with local run as indicated in the docs here. I also built the image and run it manually and it works fine).  

Error log when running custom job: 

<code>

{
"insertId": "2s7rqvfjzoq4v",
"jsonPayload": {
"attrs": {
"tag": "workerpool0-0"
},
"message": "/opt/conda/bin/python: Error while finding module specification for 'trainer.train' (ModuleNotFoundError: No module named 'trainer')\n",
"levelname": "ERROR"
},

</code>

 

0 2 1,055
2 REPLIES 2