I am trying to bring data from a .csv file from Google Cloud Storage to Bigquery using Dataform, however the jobs fail when a field of type STRING finds an escaped quote of the form \" I have tried several ways adding --quote-- but no It works for me, .. is there a way to do it from dataform?
Handling escaped quotes in CSV files can be challenging, especially when loading data from Google Cloud Storage into BigQuery using Dataform. The default CSV parsing might not correctly interpret escape sequences like \", leading to job failures.
To address this, you need to configure CSV parsing options directly within the SQLX script that defines the external table or query in Dataform. Unfortunately, Dataform does not support a csv_options configuration within the dataform.json file.
The correct approach involves setting options such as quote, allow_quoted_newlines, and field_delimiter in your SQLX script. For instance, you might use the following configuration:
config {
type: "table",
external: {
source: {
type: "gcs",
uris: ["gs://your-bucket/path/to/yourfile.csv"],
format: "CSV",
options: {
skip_leading_rows: 1,
allow_quoted_newlines: true,
field_delimiter: ",",
quote: "\"",
allow_jagged_rows: true,
encoding: "UTF-8"
}
}
}
}
This setup should enable Dataform to correctly parse the CSV file, handling escaped quotes appropriately. It's crucial to ensure consistency in your CSV file format; otherwise, you may need to preprocess the file to avoid parsing errors. After making these adjustments, re-run your Dataform job to see the changes in effect.