Solved: Re: Seeing new error mounting GCS bucket on Google...

seandavi · 09-23-2022 02:35 PM

I've been using Google Cloud Batch for a couple of weeks and have had no real problems adopting it for a number of workloads. However, starting today, I have been getting errors like this when mounting a GCS bucket:

Command error:
mkdir: cannot create directory '/mnt/MY-BUCKET': Read-only file system

I looked at log entries from the same types of jobs yesterday and did not see the same error, despite the same mount command being in the logs. Has anyone else seen a change in behavior? Any suggestions?

Added detail: This error appears to only affect container jobs, not script jobs, at least based on a little testing.

Shamel

There was a recent change to the Batch in which Container-Optimized OS (COS) is used for container only jobs. We are in the process of updating the documentation to reflect this change. In the meantime, the workaround is to mount the GCS bucket under a writable path in COS as "/mnt/disks/share" instead of "/mnt/share". We'll reply to this thread once the documentation is updated.

View solution in original post

Shamel

Hi @seandavi - can you send details on your job and the job UID to gcp-batch-preview@google.com? We can look into this to gather more details.

Shamel

There was a recent change to the Batch in which Container-Optimized OS (COS) is used for container only jobs. We are in the process of updating the documentation to reflect this change. In the meantime, the workaround is to mount the GCS bucket under a writable path in COS as "/mnt/disks/share" instead of "/mnt/share". We'll reply to this thread once the documentation is updated.

r4ruixi

Hey, Shamel. Is this `/mnt/disks/share` still a must when mounting a disk for GCP batch run?

pabjusae

Hi,

Can't access my storage bucket data using that writable path "ls: cannot access '/mnt/disks': No such file or directory". I am triggering a task with GPU, could this be a conflict with the volumes used by the container?

{
    "taskGroups":[
        {
        "taskSpec":{
            "computeResource": {
                "cpuMilli": "500",
                "memoryMib": "500"

            },
            "runnables": [
            {
                "container": {
                    "image_uri": "tensorflow/tensorflow:2.11.0",
                    "commands": ["-c", "echo $(ls /mnt/disks)"],
                    "entrypoint": "/bin/sh",
                    "volumes": ["/var/lib/nvidia/lib64:/usr/local/nvidia/lib64", "/var/lib/nvidia/bin:/usr/local/nvidia/bin"],
                    "options": "--privileged"
                  }
            }
            ],
            "volumes": [
            {
                "gcs": {
                "remotePath": "MYBUCKET"
                },
                "mountPath": "/mnt/disks/MYBUCKET"
            }
            ]
        },
        "taskCount": 1
        }
    ],
    "allocation_policy": {
        "instances": [
            {
                "installGpuDrivers": true,
                "policy": {
                    "machineType": "n1-standard-2",
                    "accelerators": [
                        {
                            "type": "nvidia-tesla-t4",
                            "count": 1
                        }
                    ]
                }
            }
        ]
      },
      "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}

pabjusae

I think I found the reason, the bucket is mounted at VM level so the container has no access. Adding it as a volume solves the problem

bolianyin

@pabjusae Mounting the path from VM to container will solve the issue. Note that Batch can automatically mount the bucket to containers to the same path as the host VM. But it only does that if the "volumes" field in container is empty.

Seeing new error mounting GCS bucket on Google Cloud Batch