I made a job that mounts a new disk and it worked well. But when I specify a snapshot for that disk, the job fails with the following error:
mount: /mnt/disks/conda_disk: special device UUID= does not exist.
Not sure how to debug this. Does the snapshot feature work for mounting disks? Thanks!
Hello JimmyPinks,
A machine image can be used to backup multiple disks at a time. A persistent disk snapshot can only backup a single disk at a time.
You can check this documentation as a reference for restrictions.
Sorry, I don’t see any restrictions around snapshots from that link? I’m able to manually create many PDs from a single snapshot. I don’t understand what restriction you mean here.
Hi JimmyPinks,
Thanks for asking this question! It turns out to be a permission setting issue on our side. I had an internal bug for it and will keep you posted.
At the same time, you can try to use an image as an alternative. It is similar to a snapshot for the purpose of creating a new disk with existing data.
Best,
Wen
Here's a copy of the code I'm using to launch:
from google.cloud import batch_v1
def create_script_job(project_name:str, job_name: str, snapshot_name: str, use_snapshot: bool) -> batch_v1.Job:
client = batch_v1.BatchServiceClient()
task = batch_v1.TaskSpec()
runnable = batch_v1.Runnable()
runnable.script = batch_v1.Runnable.Script()
runnable.script.text = "ls /mnt/disks/conda"
task.runnables = [runnable]
pd_volume = batch_v1.Volume()
pd_volume.device_name = "conda_disk"
pd_volume.mount_path = '/mnt/disks/conda'
pd_volume.mount_options = ["ro", "noload"]
task.volumes = [pd_volume]
resources = batch_v1.ComputeResource()
resources.cpu_milli = 500
resources.memory_mib = 16
task.compute_resource = resources
task.max_run_duration = "3600s"
group = batch_v1.TaskGroup()
group.task_count = 4
group.task_spec = task
allocation_policy = batch_v1.AllocationPolicy()
policy = batch_v1.AllocationPolicy.InstancePolicy()
policy.machine_type = "c2-standard-4"
disk = allocation_policy.Disk()
if use_snapshot:
disk.snapshot = f"projects/{project_name}/global/snapshots/{snapshot_name}"
disk.type_ = "pd-standard"
disk.size_gb = 100
attached_disk = allocation_policy.AttachedDisk()
attached_disk.new_disk = disk
attached_disk.device_name = "conda_disk"
policy.disks = [attached_disk]
instances = batch_v1.AllocationPolicy.InstancePolicyOrTemplate()
instances.policy = policy
allocation_policy.instances = [instances]
job = batch_v1.Job()
job.task_groups = [group]
job.allocation_policy = allocation_policy
job.labels = {"env": "testing", "type": "script"}
job.logs_policy = batch_v1.LogsPolicy()
job.logs_policy.destination = batch_v1.LogsPolicy.Destination.CLOUD_LOGGING
create_request = batch_v1.CreateJobRequest()
create_request.job = job
create_request.job_id = job_name
create_request.parent = f"projects/{project_name}/locations/us-central1"
return client.create_job(create_request)
Images could work as a hack, but it's not as modular as you have to take everything or nothing from the machine giving the image.
Thanks for finding the bug! Let me know when it's fixed.
Yeah, it is. Sorry for the inconvenience!
Thanks for the detailed code snippet! Will keep you posted.
Hi JimmyPinks,
This bug should be fixed now, plz let me know if it works for you.
Thanks!