I currently have a training task that loads sharded CSV files from GCS
using TorchData library (training code in Pytorch). However I notice
that my GPU usage has like ~ 2-3 minutes of 0% utilisation after each
epochs, which I presume is due to I/O is...