Hi there,
I am new to BigQuery, all my files are CSV files with a simple schema:
DateTime:TIMESTAMP,
Mth_yr:STRING,
kWh_gas:FLOAT,
kWh_ekec:FLOA
However when I try upload a file like this:
I am getting this error:
Error while reading data, error message: Could not parse 'DateTime' as TIMESTAMP for field DateTime (position 0) starting at location 0 with message 'Could not parse 'DateTime' as a timestamp. Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]] or YYYY/MM/DD HH:MM[:SS[.SSSSSS]]'
But this is the beginning of my file which looks as expected
DateTime,Mth-yr,kWh_gas,kWh_elec
2019-01-01 00:00:00,Jan-19,,
2019-01-01 00:30:00,Jan-19,,
Notice that those 2 kWhvalues are nullable.
I am not sure what I am doing wrong.
The error message indicates that BigQuery is encountering an issue when trying to parse a value in the DateTime
column as a TIMESTAMP. The required format for TIMESTAMP values in BigQuery is YYYY-MM-DD HH:MM[:SS[.SSSSSS]]
or YYYY/MM/DD HH:MM[:SS[.SSSSSS]]
.
From the sample data you provided, the format of the DateTime
column appears correct. One potential issue could be that BigQuery is mistakenly trying to interpret the header "DateTime" as a TIMESTAMP value, which would lead to the error. It's also possible that there are other rows in your dataset with incorrect or missing DateTime
values.
Before resorting to more complex solutions, ensure the following:
DateTime
format.If the issue persists, a workaround is to first load the data as STRING and then cast it to TIMESTAMP:
CREATE TABLE staging_table (
DateTime STRING,
Mth_yr STRING,
kWh_gas FLOAT,
kWh_elec FLOAT
);
bq load staging_table my_csv_file.csv
CREATE TABLE final_table (
DateTime TIMESTAMP,
Mth_yr STRING,
kWh_gas FLOAT,
kWh_elec FLOAT
);
INSERT INTO final_table (
DateTime,
Mth_yr,
kWh_gas,
kWh_elec
)
SELECT
CAST(DateTime AS TIMESTAMP),
Mth_yr,
kWh_gas,
kWh_elec
FROM staging_table;
Hello thanks for the suggestions, this is definitely a bug because:
a) all my rows are formatted correctly (it is produced from pandas)
b) I have other files where by chance the first rows are not null and correctly inferred with a proper schema
I can show a quick simple file to prove my point.
In terms of ingestion I have a lot of files and I would need to do it via Python, do you have any pointers to show what API should I call?
If you believe it's a bug, especially given the consistency in your data formatting, it might be worth raising the issue with Google Cloud Support or checking BigQuery's known issues to see if others have encountered a similar problem.
In the meantime, you can use the following Python code to ingest your large number of CSV files into BigQuery:
import io
import os
from google.cloud import bigquery
# Create a BigQuery client.
client = bigquery.Client()
# Create a dataset to store your data in.
dataset = client.create_dataset(
"my_dataset",
location="US",
)
# Get a list of all the CSV files in the current directory.
csv_files = os.listdir()
# Iterate over the CSV files and load them into BigQuery.
for csv_file in csv_files:
if csv_file.endswith(".csv"):
with io.open(csv_file, "rb") as f:
data = f.read()
# Create a BigQuery load job.
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
# Load the data into BigQuery.
job = client.load_table_from_uri(
data,
dataset.table(csv_file[:-4]),
job_config=job_config,
)
# Wait for the job to finish.
job.result()
# Print a success message.
print("All CSV files have been loaded into BigQuery successfully!")
This code will iterate over all the CSV files in the current directory and load them into BigQuery, truncating the existing tables if they already exist. You can modify the code to suit your specific needs, such as changing the destination dataset or the write disposition.
User | Count |
---|---|
5 | |
1 | |
1 | |
1 | |
1 |