Re: Batch - Creating new disk fails when using sna...

JimmyPinks · 12-18-2022 03:44 PM

I made a job that mounts a new disk and it worked well. But when I specify a snapshot for that disk, the job fails with the following error:

mount: /mnt/disks/conda_disk: special device UUID= does not exist.

Not sure how to debug this. Does the snapshot feature work for mounting disks? Thanks!

dionv

Hello JimmyPinks,

A machine image can be used to backup multiple disks at a time. A persistent disk snapshot can only backup a single disk at a time.

You can check this documentation as a reference for restrictions.

JimmyPinks

Sorry, I don’t see any restrictions around snapshots from that link? I’m able to manually create many PDs from a single snapshot. I don’t understand what restriction you mean here.

Wen_gcp

Hi JimmyPinks,

Thanks for asking this question! It turns out to be a permission setting issue on our side. I had an internal bug for it and will keep you posted.

At the same time, you can try to use an image as an alternative. It is similar to a snapshot for the purpose of creating a new disk with existing data.

Best,

Wen

JimmyPinks

Here's a copy of the code I'm using to launch:

from google.cloud import batch_v1


def create_script_job(project_name:str, job_name: str, snapshot_name: str, use_snapshot: bool) -> batch_v1.Job:
    client = batch_v1.BatchServiceClient()

    task = batch_v1.TaskSpec()

    runnable = batch_v1.Runnable()
    runnable.script = batch_v1.Runnable.Script()
    runnable.script.text = "ls /mnt/disks/conda"

    task.runnables = [runnable]

    pd_volume = batch_v1.Volume()
    pd_volume.device_name = "conda_disk"
    pd_volume.mount_path = '/mnt/disks/conda'
    pd_volume.mount_options = ["ro", "noload"]

    task.volumes = [pd_volume]

    resources = batch_v1.ComputeResource()
    resources.cpu_milli = 500
    resources.memory_mib = 16
    task.compute_resource = resources
    task.max_run_duration = "3600s"

    group = batch_v1.TaskGroup()
    group.task_count = 4
    group.task_spec = task

    allocation_policy = batch_v1.AllocationPolicy()
    policy = batch_v1.AllocationPolicy.InstancePolicy()
    policy.machine_type = "c2-standard-4"

    disk = allocation_policy.Disk()
    if use_snapshot:
        disk.snapshot = f"projects/{project_name}/global/snapshots/{snapshot_name}"
    disk.type_ = "pd-standard"
    disk.size_gb = 100

    attached_disk = allocation_policy.AttachedDisk()
    attached_disk.new_disk = disk
    attached_disk.device_name = "conda_disk"

    policy.disks = [attached_disk]
    instances = batch_v1.AllocationPolicy.InstancePolicyOrTemplate()
    instances.policy = policy
    allocation_policy.instances = [instances]

    job = batch_v1.Job()
    job.task_groups = [group]
    job.allocation_policy = allocation_policy
    job.labels = {"env": "testing", "type": "script"}
    job.logs_policy = batch_v1.LogsPolicy()
    job.logs_policy.destination = batch_v1.LogsPolicy.Destination.CLOUD_LOGGING

    create_request = batch_v1.CreateJobRequest()
    create_request.job = job
    create_request.job_id = job_name
    create_request.parent = f"projects/{project_name}/locations/us-central1"

    return client.create_job(create_request)

JimmyPinks

Images could work as a hack, but it's not as modular as you have to take everything or nothing from the machine giving the image.

Thanks for finding the bug! Let me know when it's fixed.

Wen_gcp

Yeah, it is. Sorry for the inconvenience!

Thanks for the detailed code snippet! Will keep you posted.

Wen_gcp

Hi JimmyPinks,

This bug should be fixed now, plz let me know if it works for you.

Thanks!

Batch - Creating new disk fails when using snapshot