Apologies for the beginners questions, I just wanted to get started with Cloud TPUs, but it seems nothing I try works.
I've been trying to follow https://cloud.google.com/tpu/docs/run-calculation-jax, but it fails at the first command to create a TPU VM.
I bumped into several issues:
1. Attempt 1:
$ gcloud compute tpus tpu-vm create gomlx-tpu \
--zone=europe-west4-a \
--accelerator-type=v2-8 \
--version=tpu-ubuntu2204-base
Create request issued for: [gomlx-tpu]
Waiting for operation [projects/gomlx-392709/locations/europe-west4-a/operations/operation-1690529132849-60186fc722abe-8f3e7b49-d7def2be] to complete...failed.
ERROR: (gcloud.compute.tpus.tpu-vm.create) {
"code": 7,
"message": "User does not have permission to access the OS image used by this Cloud TPU runtime version. [EID: 0x2770e45e99305f9e]"
}
What is the issue with permission to use the Ubuntu image ? Where do I get it ?
Attempt 2, 3 and 4:
$ gcloud compute tpus tpu-vm create gomlx-tpu \
--zone=europe-west4-a \
--accelerator-type=v2-8 \
--version=tpu-ubuntu2204-base
Create request issued for: [gomlx-tpu]
Waiting for operation [projects/gomlx-392709/locations/europe-west4-a/operations/operation-1690529420564-601870d985980-063be324-16c82c83] to complete...failed.
ERROR: (gcloud.compute.tpus.tpu-vm.create) {
"code": 8,
"message": "There is no more capacity in the zone \"europe-west4-a\"; you can try in another zone where Cloud TPU Nodes are offered (see https://cloud.google.com/tpu/docs/r
egions) [EID: 0x7f281e57cca4ac51]"
}
And then i tried in different regions with similar results. Do I need to try all combinations of TPU types and zones myself ? Can't they create a page that lists me what is available instead ?
Notice the link given (https://cloud.google.com/tpu/docs/regions) doesn't list availability.
Attempt 5, 6, 7, ...:
So I manually created a TPU "something" (? "TPU Node" ? What is this ? The term is not linked in the console) in the console UI. But the "SSH" link just says "This TPU's architecture is not TPU VM". So what ... how do I ssh to it ? Or otherwise how I interact with this to follow on the tutorial ?
So I tried again the command line:
$ gcloud compute tpus tpu-vm ssh gomlx-tpu --zone=us-central1-b
ERROR: (gcloud.compute.tpus.tpu-vm.ssh) Invalid value for [TPU]: this command is only available for Cloud TPU VM nodes. To access this node, please see https://cloud.google.com/tpu/docs/creating-deleting-tpus.
Following the documentation I tried:
$ gcloud compute ssh gomlx-tpu --zone=us-central1-b
ERROR: (gcloud.compute.ssh) Could not fetch resource:
- The resource 'projects/gomlx-392709/zones/us-central1-b/instances/gomlx-tpu' was not found
But it is, I see it in my console ...
Thanks in advance for any pointers!
Check your quotas. In my case it was necessary to ask for TPU, emailing Google Cloud at: cloud-tpu-pm-team@google.com
Regarding your question about SSH - Cloud TPU has two different VM architectures: TPU Nodes and TPU VMs. It sounds like you created a TPU with the Node architecture in the console. Newer TPU versions don't support the TPU Node architecture.
SSHing to a TPU with the Node architecture is not supported in the console. You can SSH using the command line, but the command is different for TPU Nodes: gcloud compute ssh <tpu-name>. See Connecting to a Cloud TPU for more info.
Hey @jan0000 , I am also facing the same issue you faced during creation of a TPU v5 VM in us-west-4-a region. Did you solve this issue? Please let me know.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |