I am looking for a way to import multiple 200mb+ image files into a folder of a storage bucket. My problem is that I don't control the source and have to deal with their exports which I will get via an API call.
My go-to way would be a cloud function requesting the images and then storing them in a bucket. After the upload of all files is finished I would like to run Cloud Run to analyze said images for objects. The code is in Python and I don't want to deal with cloudflow or vertex unless there is no other way around it.
My concern is that the limitations of cloud functions when it comes to the maximum runtime of said upload which will have to happen over the internet.
To sum it up:
Looking to either a way to parallelize the upload of multiple files from a single API or any other ideas that would help me stay under the one-hour limit of cloud functions. Buckets are used as multiple teams will be working with the same raw data and they are best familiar with Python.
Thank you
Hi @Aftermath5428,
Welcome to Google Cloud Community!
I would suggest the following including their functionality:
For resumable uploads, please take note of the following:
blob.chunk_size
property.storage.BlobWriter
or the method storage.Blob.open(mode='w')
. For these methods, the default buffer size is 40 MiB. You can also use Resumable Media to manage resumable uploads.For streaming uploads, please check the following documentation and sample code for your reference:
Hope this helps.
Thank you for the reply. The thing with resumable downloads is that I don't know how many files will be there and I don't even know their sizes. I will be downloading satellite data every month at least once. I define the area in my request and the provider does the rest in his response.
So I would lean towards streaming rather than resumable downloads. The biggest unknown yet is the size of the total download and the size of each chunk of said download. I still do belive that streaming the data will be the desired outcome.
The other thing I am concerned with is the time it will take to download all images, the whole catalog.
Cloud Functions v2 should have a max timeout of 1 hour. I hope that this is enough. Is there anything memory-related that I would have to take care of when streaming?
thank you