Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Vertex AI data lost on VM stop

I am new to Vertex AI and wanted to try it out for a Kaggle competition. I was able to get a GPU machine up and running, as well as download the data to the machine. The download script was automatically generated when uploading my notebook to Vertex AI. I ran the script and 5 hours later all of the data was there successfully (to the boot disk -  standard persistent disk with 1000 GB). I then ran a first iteration of my model and everything worked great. When I was done, I went back to GCP and stopped my VM, assuming all of my data would be saved. It was not!

I then started over and once the data was on the machine I took a snapshot so I wouldn't have to redownload the data a third time. I then made some edits to my model and ran it again. After I was done, I again stopped my VM to not leave it running. All of the data was lost again, but less surprisingly this time. 

I thought a snapshot could be used as a backup to the original machine, but the documentation makes it seem like it is only for creating a new VM from the boot disk. I then made a new machine but cannot figure out how to use it. I also tried looking for a way to make a new notebook on Vertex with the disk snapshot, but it did not look possible. 

Questions:

  1. Why was my data deleted on stopping the VM? (not resetting or deleting the VM)
  2. How can you back up data for Vertex/how can you use that backup?
  3. How can I use the VM created by my snapshot?

 

0 2 1,205
2 REPLIES 2

  1. Create a snapshot

gcloud compute snapshots create SNAPSHOT_NAME \

--source-disk SOURCE_DISK \

--source-disk-zone SOURCE_DISK_ZONE

  1. ssh instance and run command: sudo umount /dev/disk/by-id/google-<INSTANCE NAME>
  2. Stop the instance
  3. Detach data disk

gcloud compute instances detach-disk $INSTANCE_NAME --disk $DATA_DISK_NAME --zone $ZONE

  1. Delete data disk

gcloud compute disks delete $DATA_DISK_NAME --zone $ZONE

  1. Create the new disk using the snapshot created: gcloud compute disks create $DATA_DISK_NAME $DATA_DISK_SIZE --source-snapshot=$SNAPSHOT_NAME $DATA_DISK_TYPE --zone $ZONE
  2. Attach the disk into the notebook instances: gcloud compute instances attach-disk $INSTANCE_NAME --disk $DATA_DISK_NAME --zone $ZONE
  3. Create directory that serves as the mount point sudo mkdir -p /mnt/disks/MOUNT_DIR
  4. Mount the disk sudo mount -o discard,defaults /dev/DEVICE_NAME /mnt/disks/MOUNT_DIR
  5. Start the VM

Thanks, that is helpful but I still do not understand why the data was deleted in the first place? It says it is a persistent disk