Internal error (code 13) when tuning a model in Ve...

nicolasferr · 07-25-2023 09:51 AM

When processing a model tuning in Vertex AI, during the pipeline creation, i recieved an Internal error message.:

INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/xxxxxxxxx/locations/xxxxx-xxxxx/pipelineJobs/tune-large-model-xxxxxxxxxxx current state:
PipelineState.PIPELINE_STATE_PENDING
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-591a9a561b04> in <cell line: 4>()
      2 vertexai.init(project="xxxxxxxxx", location="us-central1")
      3 model = TextGenerationModel.from_pretrained("text-bison@001")
----> 4 model.tune_model(
      5     training_data=training_data,
      6     train_steps=100,

3 frames
/usr/local/lib/python3.10/dist-packages/google/cloud/aiplatform/pipeline_jobs.py in _block_until_complete(self)
    500         # JOB_STATE_FAILED or JOB_STATE_CANCELLED.
    501         if self._gca_resource.state in _PIPELINE_ERROR_STATES:
--> 502             raise RuntimeError("Job failed with:\n%s" % self._gca_resource.error)
    503         else:
    504             _LOGGER.log_action_completed_against_resource("run", "completed", self)

RuntimeError: Job failed with:
code: 13
message: "Internal error encountered. Please try again"

The error doesn't show more details, so i can't figure out what's wrong, any ideas about what is wrong?

lsolatorio

Hi @nicolasferr,

The error code 13 occurs for a number of reasons, it is sometimes caused by a system error, an issue with the model training code or a problem with the Vertex AI Service account.

One best practice is to review your cloud console logs for any errors that relate to Vertex AI, you might find entries that clearly define what is causing the issue. As a workaround, you might need to resubmit the tuning job, update the training code or even reconfigure your Vertex AI account.

You can reference this article for troubleshooting guide or you can proceed contacting Google Cloud Support for help.

Internal error (code 13) when tuning a model in Vertex AI