When processing a model tuning in Vertex AI, during the pipeline creation, i recieved an Internal error message.:
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/xxxxxxxxx/locations/xxxxx-xxxxx/pipelineJobs/tune-large-model-xxxxxxxxxxx current state:
PipelineState.PIPELINE_STATE_PENDING
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-5-591a9a561b04> in <cell line: 4>()
2 vertexai.init(project="xxxxxxxxx", location="us-central1")
3 model = TextGenerationModel.from_pretrained("text-bison@001")
----> 4 model.tune_model(
5 training_data=training_data,
6 train_steps=100,
3 frames
/usr/local/lib/python3.10/dist-packages/google/cloud/aiplatform/pipeline_jobs.py in _block_until_complete(self)
500 # JOB_STATE_FAILED or JOB_STATE_CANCELLED.
501 if self._gca_resource.state in _PIPELINE_ERROR_STATES:
--> 502 raise RuntimeError("Job failed with:\n%s" % self._gca_resource.error)
503 else:
504 _LOGGER.log_action_completed_against_resource("run", "completed", self)
RuntimeError: Job failed with:
code: 13
message: "Internal error encountered. Please try again"
The error doesn't show more details, so i can't figure out what's wrong, any ideas about what is wrong?
Hi @nicolasferr,
The error code 13 occurs for a number of reasons, it is sometimes caused by a system error, an issue with the model training code or a problem with the Vertex AI Service account.
One best practice is to review your cloud console logs for any errors that relate to Vertex AI, you might find entries that clearly define what is causing the issue. As a workaround, you might need to resubmit the tuning job, update the training code or even reconfigure your Vertex AI account.
You can reference this article for troubleshooting guide or you can proceed contacting Google Cloud Support for help.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |