Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to reduce overhead of running container jobs in Batch?

I have the following volume mounting setup for my Batch jobs.

host machine (my development server that I use to submit Batch job request):  bucket_name -> mount_path

guest machine (the machine provisioned by Batch to actually run batch task): bucket_name -> mount_path

docker (container to run task that sits on guest_machine): mount_path -> mount_path

The reason I do this is that the same command echo "hello" > /mount_path/hello.txt   (in reality, the command will be much more complicated and the use of container will be much more justified, here just an example) that I run locally on the host machine would work the same as in Batch and the file will be  written to  to GCS.

However, this creates a lot of  overhead because

1.we need to gcsfuse on every guest machine

2.download the docker container image to guest machine every time.

I am wondering if there is a way to achieve the same effect and reduce overhead? For example, maybe keep a machine image where it is already gcsfuse-d and the docker image is already there? My job config is as follows. Thanks! 

 

 

"taskSpec": {
  "runnables": [
    {
      "container": {
        "imageUri": image_uri,
        "commands": echo hello > mount_path/hello.txt,
        "volumes": ["mount_path:mount_path"]
      },
    }
  ],
  "volumes": [
    {
      "mountPath": mount_path,
      "gcs": {"remotePath": bucket_name}
    }
  ]
}

 

 

2 2 417
2 REPLIES 2

Or if there is a way to directly mount GCS to docker, skipping an intermediate step? But not sure if how much this will help in reducing overhead

@gradientopt  

Do you build and use your own images for Batch jobs or specify other images?  If not, Batch images typically have gcsfuse builtin.  If you do, you sure can put all your dependencies on it.

As for container download, we'll soon have the container image streaming feature, which could help.