Create TPU Node - Malformed Name

Hi! Im trying to create a Google Cloud TPU node using TPU client API and I cannot figure out the parent resource name of a TPU node in Google Cloud.I tried all the possible combinations, for example:

And I always get the same error (google.api_core.exceptions.InvalidArgument: 400 Malformed name) :

 

 

 

 

Traceback (most recent call last):
  File "C:\Users\Smarthank\anaconda3\lib\site-packages\google\api_core\grpc_helpers.py", line 67, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "C:\Users\Smarthank\anaconda3\lib\site-packages\grpc\_channel.py", line 923, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "C:\Users\Smarthank\anaconda3\lib\site-packages\grpc\_channel.py", line 826, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INVALID_ARGUMENT
	details = "Malformed name: 'projects/my-project-id/locations/europe-west4-a/nodes/'"
	debug_error_string = "{"created":"@1645878700.379000000","description":"Error received from peer ipv4:142.250.179.170:443","file":"src/core/lib/surface/call.cc","file_line":1068,"grpc_message":"Malformed name: 'projects/my-project-id/locations/europe-west4-a/nodes/'","grpc_status":3}"

 

 

 

 

Below you can find the full code I'm using to create the node. Im using Python 3.8, google-cloud-tpu v1.2.1, on a Conda virtualenv.

 

from google.cloud import tpu_v2alpha1

def sample_create_node():
    # Create a client
    client = tpu_v2alpha1.TpuClient()

    # Initialize request argument(s)
    node = tpu_v2alpha1.Node()
    node.accelerator_type = "accelerator_type_value"
    node.runtime_version = "runtime_version_value"

    request = tpu_v2alpha1.CreateNodeRequest(
        parent="parent_value",
        node=node,
    )

    # Make the request
    operation = client.create_node(request=request)

    print("Waiting for operation to complete...")

    response = operation.result()

    # Handle the response
    print(response)

Any help would be much apprecciated!

1 5 580
5 REPLIES 5

It appears that you have created a StackOverflow thread where a Google Cloud Platform Engineer has already replied. 

He has suggested you that you can find the expected format of parent in the documentation for the underlying API method: projects.locations.nodes.create.parent should be formatted as projects/*/locations/*. That is, change zones to locations and remove the /tpus from the end which you had included at the StackOverflow thread.

The Google Cloud Platform Engineer has further suggested you to remove nodes from the path. i.e. change projects/my-project-id/locations/europe-west4-a/nodes/ that is shown at the stack trace to projects/my-project-id/locations/europe-west4-a/.

Hi!

As I answered in the same stackoverflow, it appears that following the recommended parent=projects/*/locations/* (to be 100% clear: without /nodes/ ) does not work and gives the error actually shared by the authors.

We cannot remove a /nodes/ that we do not set in the first place.

Libraries version:
google-api-core 2.6.0
google-auth 2.6.0
google-cloud-tpu 1.3.1
googleapis-common-protos 1.55.0

The Google Cloud Engineer has updated the response along with the code here. Please let us know if you can use the code and whether that works. 

Hi, it worked.

When cleaning the resources though, there seems to be an issue with the lib:

 

NAME = f"projects/{manifest.tpu.gcpProject}/locations/{manifest.tpu.gcpZone}/nodes/{manifest.name}"

client = tpu_v2alpha1.TpuClient()
        
request = tpu_v2alpha1.DeleteNodeRequest(
    name=NAME,
)

# Make the request
operation = client.delete_node(request=request)

logging.info("Waiting for operation to complete...")
response = operation.result()

 


The TPU VM is successfully deleted, but the python code eventually fails:

 

Cleaning TPU
Waiting for operation to complete...
Traceback (most recent call last):
File "/argo/staging/script", line 29, in <module>
response = operation.result()
File "/root/.local/lib/python3.9/site-packages/google/api_core/future/polling.py", line 132, in result
self._blocking_poll(timeout=timeout, **kwargs)
File "/root/.local/lib/python3.9/site-packages/google/api_core/future/polling.py", line 110, in _blocking_poll
retry_(self._done_or_raise)(**kwargs)
File "/root/.local/lib/python3.9/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func
return retry_target(
File "/root/.local/lib/python3.9/site-packages/google/api_core/retry.py", line 190, in retry_target
return target()
File "/root/.local/lib/python3.9/site-packages/google/api_core/future/polling.py", line 88, in _done_or_raise
if not self.done(**kwargs):
File "/root/.local/lib/python3.9/site-packages/google/api_core/operation.py", line 170, in done
self._refresh_and_update(retry)
File "/root/.local/lib/python3.9/site-packages/google/api_core/operation.py", line 159, in _refresh_and_update
self._set_result_from_operation()
File "/root/.local/lib/python3.9/site-packages/google/api_core/operation.py", line 130, in _set_result_from_operation
response = protobuf_helpers.from_any_pb(
File "/root/.local/lib/python3.9/site-packages/google/api_core/protobuf_helpers.py", line 65, in from_any_pb
raise TypeError(
TypeError: Could not convert Any to Node

 

 

@Mohammad_I could you please have a look at my response, I also created a github ticket: https://github.com/googleapis/python-tpu/issues/92
It creates wrong fail warnings in our Pipelines today