Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Cloud function Memory limit exceeded | using bigquery.Client().load_table_from_csv

Hello everyone,

I have an issue with different aspects here with a cloud function. 

What I am trying to do: a cloud function that takes tabular data from a CSV file that is in cloud storage bucket and uploads it to some dataset at Bigquery. The current memory I am allocating for the function is 256 MB.

The function works as follows: read the CSV from the bucket, read it as a data frame (pandas.read_csv), defining a job, and then upload it into BQ using bigquery.Client().load_table_from_csv. 

The main issue: The function itself works well on small files but when I tried to read a 12 MB file from the bucket, the memory limit is exceeded.

So far: I have tried each way I found to trace memory usage on the function when it runs to figure the problem but I couldn't fully figure it out. I have also tried to deallocate the memory using "del" and "gc.collect()" to remove any extra used space directly before it stucks (before load_table_from_csv) through the function.

but what I found: using several print(), I found that the function exceeds the limit on bigquery.Client().load_table_from_csv. I have also found that on converting the content of the read CSV into pandas Dataframe, its size increases from 12 MB to about 60 MB, however, this shouldn't also make it exceed the limit.

What I need:

     - Any advice to trace the memory usage properly through the cloud function would be helpful.

     - any information about bigquery.Client().load_table_from_csv and if it does such an inflation to the data

     - Any recommendations, in general, to make the system works or how to determine the best and least memory allocation knowing the maximum file CSV file could be read, without wasting resources.

     - I have also tried psutil.Process(os.getpid()) to trace the RAM between different checkpoints in the function code but did not really get differences that gives me a helpful insight about the issue.

   

 

0 1 2,180
1 REPLY 1

From what I found it looks like there’s a 10 MB limit [1], so it seems that when you try sending csv files larger than 10 MB you’re hitting the limit and it’s causing.

I did find another link showing how this can be done through GCS with BigQuery directly [2] if you would think it is helpful.

If you aren’t already using Cloud Trace then it would be helpful to use it to track exactly what’s going on with the request [3]. Here’s a quickstart on setting it up [4].

As for the initial out of memory issue that’s likely due to the limit as mentioned in the public documentation [1].

[1]https://cloud.google.com/functions/quotas#resource_limits
[2]https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv
[3]https://cloud.google.com/trace/docs/setup#instrumenting_tracing_for_applications
[4]https://codelabs.developers.google.com/codelabs/cloud-function-logs-traces#0