Seeing internal error when starting Generative AI tuning job

Hi,

I'm seeing Internal Error with no details when I try to start a Tuning Job in Generative AI Studio console. The JSONL file uploaded should be correct since I've validated it in code. I'm using 1000 training steps and default learning rate multiplier of 1. I've selected us-central-1 as location for tuning. 

Screenshot 2023-08-17 at 9.03.25 AM.png

Any idea what's happening here?

Solved Solved
1 5 1,636
1 ACCEPTED SOLUTION

It worked after I 'Enabled All Recommended APIs' in Vertex AI Dashboard per this answer: https://stackoverflow.com/questions/76297835/internal-error-encountered-data-fetching-exception-vert...

View solution in original post

5 REPLIES 5

The JSONL should follow this format, 1 example (record) per row:

jsonl.png

Here is a sample code to generate it:

import torch
import json
from datasets import load_dataset

train_dataset = load_dataset("tatsu-lab/alpaca", split="train")

df = train_dataset.to_pandas()
df["input_text"]=df.text.astype(str)+': '+df.instruction.astype(str)
df["output_text"]=df.output.astype(str)
df=df[["input_text","output_text"]]

data_list = df.to_dict(orient='records')
with open('output_alpaca.jsonl', 'w') as file:
    for example in data_list:
        file.write(json.dumps(example) + '\n')

 

My data is already in the correct JSONL format. It's 11 MB in size.

Used https://jsonlines.org/validator/ for validation.

Seeing this error when I try using Vertex AI SDK for Python

Creating PipelineJob

Traceback (most recent call last):
  File "/Users/addarsh/virtualenvs/work-buddy/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 72, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/Users/addarsh/virtualenvs/work-buddy/lib/python3.8/site-packages/grpc/_channel.py", line 1161, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Users/addarsh/virtualenvs/work-buddy/lib/python3.8/site-packages/grpc/_channel.py", line 1004, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "Internal error encountered."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:142.250.191.42:443 {created_time:"2023-08-17T21:48:08.818915-07:00", grpc_status:13, grpc_message:"Internal error encountered."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "tune_google_model.py", line 45, in <module>
    tuning()
  File "tune_google_model.py", line 34, in tuning
    model.tune_model(
  File "/Users/addarsh/virtualenvs/work-buddy/lib/python3.8/site-packages/vertexai/language_models/_language_models.py", line 185, in tune_model
    pipeline_job = _launch_tuning_job(
  File "/Users/addarsh/virtualenvs/work-buddy/lib/python3.8/site-packages/vertexai/language_models/_language_models.py", line 1134, in _launch_tuning_job
    job = _launch_tuning_job_on_jsonl_data(
  File "/Users/addarsh/virtualenvs/work-buddy/lib/python3.8/site-packages/vertexai/language_models/_language_models.py", line 1198, in _launch_tuning_job_on_jsonl_data
    job.submit()
  File "/Users/addarsh/virtualenvs/work-buddy/lib/python3.8/site-packages/google/cloud/aiplatform/pipeline_jobs.py", line 418, in submit
    self._gca_resource = self.api_client.create_pipeline_job(
  File "/Users/addarsh/virtualenvs/work-buddy/lib/python3.8/site-packages/google/cloud/aiplatform_v1/services/pipeline_service/client.py", line 1347, in create_pipeline_job
    response = rpc(
  File "/Users/addarsh/virtualenvs/work-buddy/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 113, in __call__
    return wrapped_func(*args, **kwargs)
  File "/Users/addarsh/virtualenvs/work-buddy/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 74, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InternalServerError: 500 Internal error encountered.

It worked after I 'Enabled All Recommended APIs' in Vertex AI Dashboard per this answer: https://stackoverflow.com/questions/76297835/internal-error-encountered-data-fetching-exception-vert...

Hi @addarsh . Have you got resource exhausted error after that??