Hello, I was wondering if it is possible that the result of the query on a sqlx file is exported to a csv and it is saved to a bucket, I was trying to search for this but could only find about importing csv to dataform, appreciate any references or brief examples. In my case I am planning to daily run a query that will return in average 300m-500m records, I think dataform will be able to handle this with no problem since it is using directly bigquery, but im not sure if it is possible to leave this results on a single csv file at the bucket
You could try using an EXPORT DATA statement in an operations type sqlx like this:
config {
type: "operations"
}
EXPORT DATA [WITH CONNECTION connection_name] OPTIONS (export_option_list) AS query_statement
You can only export up to 1 GB of data to a single file, and you might need to use a wildcard to export more than 1 GB of data to multiple files.
Disclaimer: I have not done this before but this is how I would approach the problem if I had to do it in Dataform.
Thanks this looks like the correct thing to do, just a few questions if maybe you can confirm:
* In my use case I need 1 single csv with all the results, so for this I was planning to use "storage composer" https://cloud.google.com/storage/docs/composing-objects
I would need to create an eventarc cloud function so that it gets called when the dataform finishes exporting all the csvs (I would need to use the wildcard you mention to export more than 1GB of data), after that I think that in the cloud function I would need to check how many files were generated (composer seems to have a limit of 32 objects per api call, so I will need to think of some logic here because it will for sure be more than 32 csvs in some heavy cases as a result from the dataform export) and according to this use the compose to join all the files into 1 single csv, do you see any other approach or what do you think?
You could also use Colab Enterprise or Workflows (which is actually also powered by Dataform) to run a notebook with a %%bigquery magic command and save the result as csv like this:
%%bigquery df query_statement
df.to_csv('name_of_csv.csv')
<insert code to move/save this file to a bucket here>
This seems more straightforward to me and you would probably find this easier than orchestrating multiple services.