Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

[Notebooks API] Got error when we get result of execution with notebooks_v1 API

Hi all, I need a support or suggestion from everyone,

I 'm using library  google-cloud-notebooks==1.7.0. The under is my example code to create and health check a execution.

 

from google.cloud.notebooks_v1 import CreateExecutionRequest, GetExecutionRequest
from google.cloud.notebooks_v1.services.notebook_service import NotebookServiceClient

# Create client
client = NotebookServiceClient(credentials=credential)
# Create request template
request_create_execution = CreateExecutionRequest(
                               parent=PARENT,
                               execution_id=f"trigger_vertex_notebook_{uuid.uuid4().hex}",
                               execution=EXECUTION_TEMPLATE,
)
# Create a execution
operation = client.create_execution(request=request_create_execution, timeout=120)
operation_result = operation.result()
# Create template
request_get_execution = GetExecutionRequest(name=operation_result.name)

while True:

  execution_status = client.get_execution(request=request_get_execution)
  if execution_status.state == Execution.State.SUCCEEDED:
    break
  elif execution_status.state == Execution.State.FAILED:
    raise RuntimeError("Execution failed")
  time.sleep(60)

 

I create a DAG on Airflow for schedule job. I got a error. This error is not common, about 1 error out of 5s runs. The under is log of this error.

 

Traceback (most recent call last):
  File "/opt/python3.8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 72, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/opt/python3.8/lib/python3.8/site-packages/grpc/_channel.py", line 1030, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/opt/python3.8/lib/python3.8/site-packages/grpc/_channel.py", line 910, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "An internal error has occurred (72adca73-89c5-4634-9d3a-0644405e1e64)"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:142.250.75.10:443 {created_time:"2023-08-05T05:33:36.591466057+00:00", grpc_status:13, grpc_message:"An internal error has occurred (72adca73-89c5-4634-9d3a-0644405e1e64)"}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/airflow/gcs/dags/trigger_vertex_notebook.py", line 88, in trigger
    execution_status = client.get_execution(request=request_get_execution, timeout=timeout)
  File "/opt/python3.8/lib/python3.8/site-packages/google/cloud/notebooks_v1/services/notebook_service/client.py", line 3970, in get_execution
    response = rpc(
  File "/opt/python3.8/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 113, in __call__
    return wrapped_func(*args, **kwargs)
  File "/opt/python3.8/lib/python3.8/site-packages/google/api_core/timeout.py", line 120, in func_with_timeout
    return func(*args, **kwargs)
  File "/opt/python3.8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 74, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InternalServerError: 500 An internal error has occurred (72adca73-89c5-4634-9d3a-0644405e1e64

 

Can you give me some suggestion to investigate the issue, here?

0 2 615
2 REPLIES 2

Hi @ltduong

The error messages that you are receiving all points to an internal error but with limited information about it. This kind of error usually indicates that there is something going on in the Google Cloud Server, it may be caused by a hardware failure, a software bug or a network issue.

You can also wait a little then try again as the error might get resolved on its own. Sometimes, internal errors are temporary and will go away eventually.

Here are some usable resources that might be of help.

Hope this helps.

 

Hi @lsolatorio ,

Thank you for your support,

I have some extra information, here.

- When we got this error in Airflow, the executor on Vertex AI kept running without any problems until finished. So, I guess that is a network issue.

- I tried to increase the timeout for a request. And try to retry and sleep (1, 2, 3, 4, 5 minutes) again, but I still have this problem.

 

MAX_NUM_RETRY = 5
retry = 0
while True:
    try:
        timeout = 600
        execution_status = client.get_execution(request=request_get_execution, timeout=timeout)

        if execution_status.state == Execution.State.SUCCEEDED:
            break
        elif execution_status.state == Execution.State.FAILED:
            raise RuntimeError("Execution failed")
        time.sleep(5)

    except Exception as e:
        retry += 1
        if retry >= MAX_NUM_RETRY:
            raise RuntimeError(f"Execution failed with too many retries: {retry}")
        time.sleep(60*retry)

 

Do you have any suggestions for me to try or investigate further?