Announcements
This site is in read only until July 22 as we migrate to a new platform; refer to this community post for more details.
Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Batch - Creating new disk fails when using snapshot

I made a job that mounts a new disk and it worked well. But when I specify a snapshot for that disk, the job fails with the following error:

mount: /mnt/disks/conda_disk: special device UUID= does not exist.

Not sure how to debug this. Does the snapshot feature work for mounting disks? Thanks!

0 7 566
7 REPLIES 7

dionv
Former Googler

Hello JimmyPinks,

A machine image can be used to backup multiple disks at a time. A persistent disk snapshot can only backup a single disk at a time.

You can check this documentation as a reference for restrictions.

Sorry, I don’t see any restrictions around snapshots from that link? I’m able to manually create many PDs from a single snapshot. I don’t understand what restriction you mean here.

Hi JimmyPinks, 

Thanks for asking this question! It turns out to be a permission setting issue on our side. I had an internal bug for it and will keep you posted.

At the same time, you can try to use an image as an alternative. It is similar to a snapshot for the purpose of creating a new disk with existing data.

Best,

Wen

Here's a copy of the code I'm using to launch:

from google.cloud import batch_v1


def create_script_job(project_name:str, job_name: str, snapshot_name: str, use_snapshot: bool) -> batch_v1.Job:
    client = batch_v1.BatchServiceClient()

    task = batch_v1.TaskSpec()

    runnable = batch_v1.Runnable()
    runnable.script = batch_v1.Runnable.Script()
    runnable.script.text = "ls /mnt/disks/conda"

    task.runnables = [runnable]

    pd_volume = batch_v1.Volume()
    pd_volume.device_name = "conda_disk"
    pd_volume.mount_path = '/mnt/disks/conda'
    pd_volume.mount_options = ["ro", "noload"]

    task.volumes = [pd_volume]

    resources = batch_v1.ComputeResource()
    resources.cpu_milli = 500
    resources.memory_mib = 16
    task.compute_resource = resources
    task.max_run_duration = "3600s"

    group = batch_v1.TaskGroup()
    group.task_count = 4
    group.task_spec = task

    allocation_policy = batch_v1.AllocationPolicy()
    policy = batch_v1.AllocationPolicy.InstancePolicy()
    policy.machine_type = "c2-standard-4"

    disk = allocation_policy.Disk()
    if use_snapshot:
        disk.snapshot = f"projects/{project_name}/global/snapshots/{snapshot_name}"
    disk.type_ = "pd-standard"
    disk.size_gb = 100

    attached_disk = allocation_policy.AttachedDisk()
    attached_disk.new_disk = disk
    attached_disk.device_name = "conda_disk"

    policy.disks = [attached_disk]
    instances = batch_v1.AllocationPolicy.InstancePolicyOrTemplate()
    instances.policy = policy
    allocation_policy.instances = [instances]

    job = batch_v1.Job()
    job.task_groups = [group]
    job.allocation_policy = allocation_policy
    job.labels = {"env": "testing", "type": "script"}
    job.logs_policy = batch_v1.LogsPolicy()
    job.logs_policy.destination = batch_v1.LogsPolicy.Destination.CLOUD_LOGGING

    create_request = batch_v1.CreateJobRequest()
    create_request.job = job
    create_request.job_id = job_name
    create_request.parent = f"projects/{project_name}/locations/us-central1"

    return client.create_job(create_request)

 

Images could work as a hack, but it's not as modular as you have to take everything or nothing from the machine giving the image.

Thanks for finding the bug! Let me know when it's fixed.

Yeah, it is. Sorry for the inconvenience!

Thanks for the detailed code snippet! Will keep you posted.

Hi JimmyPinks,

This bug should be fixed now, plz let me know if it works for you. 

Thanks!