When running a job with a fairly large image (few GBs) transition time from SCHEDULED to RUNNING takes about 6 mins.
When enabling container runnable image streaming with `enableImageStreaming` I don't really observe any difference in startup time. What is the expected improvement?
When describing the job the only change I observe is an extra label is added
```
labels {
key: "goog-batch-managed-container"
value: "enabled"
}
```
And the container runnable volumes string appends a `:false` to the supplied path strings.
The improvement of `enableImageStreaming` comes from accessing files in the container image without waiting for the whole image being pulled. Therefore, it depends on access pattern. Does your task use most content in the container is relatively short time (which will see less benefits)?
To add more points, there are some limitations for using image streaming. Here are some common limitations:
Besides, there is a image pulling log which is specifically for Image Streaming in the Cloud Logging. An example format is
Pulling images us-central1-docker.pkg.dev/batch-project/test/image:test...
You can roughly get an idea of the image streaming time from it.
If the image streaming requirements are not met, will the submit return an error, or will the job proceed without image streaming enabled?
One can tell based on the log format?
If image streaming is not met, we will fall back pulling images without image streaming. Currently, the most straight forward way checking if the image is streamed or not is by the VM system log.
journalctl -u snapshotter
The above command will give you logs if your image is streamed. Something like
image xxx is backed by image streaming
We have logging improvement in our backlog to make all image streaming info more accessible though the cloud logging.