I've been using Google Cloud Batch for a couple of weeks and have had no real problems adopting it for a number of workloads. However, starting today, I have been getting errors like this when mounting a GCS bucket:
Command error:
mkdir: cannot create directory '/mnt/MY-BUCKET': Read-only file system
I looked at log entries from the same types of jobs yesterday and did not see the same error, despite the same mount command being in the logs. Has anyone else seen a change in behavior? Any suggestions?
Added detail: This error appears to only affect container jobs, not script jobs, at least based on a little testing.
Solved! Go to Solution.
There was a recent change to the Batch in which Container-Optimized OS (COS) is used for container only jobs. We are in the process of updating the documentation to reflect this change. In the meantime, the workaround is to mount the GCS bucket under a writable path in COS as "/mnt/disks/share" instead of "/mnt/share". We'll reply to this thread once the documentation is updated.
Hi @seandavi - can you send details on your job and the job UID to gcp-batch-preview@google.com? We can look into this to gather more details.
There was a recent change to the Batch in which Container-Optimized OS (COS) is used for container only jobs. We are in the process of updating the documentation to reflect this change. In the meantime, the workaround is to mount the GCS bucket under a writable path in COS as "/mnt/disks/share" instead of "/mnt/share". We'll reply to this thread once the documentation is updated.
Hey, Shamel. Is this `/mnt/disks/share` still a must when mounting a disk for GCP batch run?
Hi,
Can't access my storage bucket data using that writable path "ls: cannot access '/mnt/disks': No such file or directory". I am triggering a task with GPU, could this be a conflict with the volumes used by the container?
{
"taskGroups":[
{
"taskSpec":{
"computeResource": {
"cpuMilli": "500",
"memoryMib": "500"
},
"runnables": [
{
"container": {
"image_uri": "tensorflow/tensorflow:2.11.0",
"commands": ["-c", "echo $(ls /mnt/disks)"],
"entrypoint": "/bin/sh",
"volumes": ["/var/lib/nvidia/lib64:/usr/local/nvidia/lib64", "/var/lib/nvidia/bin:/usr/local/nvidia/bin"],
"options": "--privileged"
}
}
],
"volumes": [
{
"gcs": {
"remotePath": "MYBUCKET"
},
"mountPath": "/mnt/disks/MYBUCKET"
}
]
},
"taskCount": 1
}
],
"allocation_policy": {
"instances": [
{
"installGpuDrivers": true,
"policy": {
"machineType": "n1-standard-2",
"accelerators": [
{
"type": "nvidia-tesla-t4",
"count": 1
}
]
}
}
]
},
"logsPolicy": {
"destination": "CLOUD_LOGGING"
}
}
I think I found the reason, the bucket is mounted at VM level so the container has no access. Adding it as a volume solves the problem
@pabjusae Mounting the path from VM to container will solve the issue. Note that Batch can automatically mount the bucket to containers to the same path as the host VM. But it only does that if the "volumes" field in container is empty.