I have a very weird problem trying to create an EXT new-line formatted JSON table. I say weird because I was able to create the same exact EXT table in another Project-Dataset just a week ago.
I created a NEW Project and Dataset and when I try to create the table I get this:
"Failed to create table: Invalid field name "completed-in". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 300 characters long."
The NEW Dataset looks like this:
The ORIGINAL Dataset (where I was able to create the table) looks as follows:
The ORIGINAL EXT Table looks like this:
Why was I able to create the same exact EXT Table (not a single change) in the original dataset but now I cannot create it in a NEW Project/Dataset?
Totally perplexed...
Solved! Go to Solution.
Ok, I am going to close this issue. I believe that this does indeed have to do with changes that have been made since creating the original Project. There is no other explanation that makes sense.
Below is how I was finally able to create the EXT JSON table. NOTE: Fortunately, I dont need the col with the offending '-' char. If I did, then I dont this would be possible since you need to match the schema with the fields in the JSON in order to query those values.
CREATE OR REPLACE EXTERNAL TABLE `ford-analytics-408918.Daily_JSON_Logs.day_01`
(
meta STRUCT
<
count INTEGER--,
--`completed-in` INTEGER # CANNOT MATCH THIS FIELD
>,
data ARRAY <
STRUCT <
country STRING,
language STRING,
entityTags ARRAY <STRING>,
modelTags ARRAY <STRING>,
makeTags ARRAY <STRING>,
sa STRING,
messageType STRING,
classTags ARRAY <STRING>,
published TIMESTAMP,
author STRING,
content STRING,
link STRING,
queue STRING,
userTags ARRAY <STRING>,
title STRING,
dataSource STRING,
threadId STRING,
dataCategory STRING,
id STRING
>
>
)
OPTIONS (
format = 'NEWLINE_DELIMITED_JSON',
uris = ['https://drive.google.com/open?id=1YhNpquVz9YqU8PODkL7EiTn2NOTh7eYU'],
ignore_unknown_values = TRUE
);
And now, querying the table yeilds a valid result set:
Thank you for your help with this matter.
Happy Holidays! 🎄⛄🎅🏼🤶🦌❄
There a few Potential Causes for the issue you are seeing:
Field Naming Restrictions:
Project/Dataset Configuration:
Steps to resolve:
Rename the Field:
Check Project/Dataset Permissions:
Review Configuration:
Thanks for the quick reply.
Yes indeed, there is a '-' in one of the JSON tags. However, why was I able to create these exact same tables on my other Project-Dataset? The perms for both projects/datasets are identical and I am owner of both. The issue is that it will be be difficult to change the tag name w/out some ETL in Python. And bc this process will be managed by non-technical users (BA's), the expectation is that they will just download a daily JSON, e.g., 'day01'-'day31' into a commom shared folder. So changing the tag name in the CREATE process is really not feasible (the col name is changed downstream though).
So my real question is this - Why was I able to create these JSON EXT tables in one (original) dataset but not another (new one). All the files are identical, as are the permissions.
NEW Dataset Perms:
ORIGINAL Dataset Perms:
While permissions and files appear identical, there are likely subtle configuration differences or other factors at play. Here are key areas to investigate further:
Case Sensitivity:
Default Dataset Setting:
Creation Method:
Alternative Approach:
Additional Considerations:
Testing with Known Good Data:
Remember, consistency in file format, schema definitions, and creation methods is key in managing BigQuery datasets effectively.
1. The schema was intended to be auto-detected. This is how I created the EXT tables in the original dataset.
2. Not exactly sure what a "default dataset" is. I've never set or changed the default. I will check.
3. Everything is literally identical except for a) the Project and b) the dataset. The ext json file has not changed at all.
4. I am not sure I understand. If I can't create the table how can I create a view to query and change the col name. Maybe I am missing what you mean....
5. I am not sure I know how to get "Project" level settings. I dont see any properties at the Project level except server region, which in both projects is set exactly the name. I may end up creating the schemas directly in sql. I have yet to try that and was hoping to avoid. The actual jsons must be fine as they read perfectly in the other Project/DS and also in Python. I will also use another json file and test creation. I suspect it will work if no '-' char. At this point I am just baffled as to why I was able to create in my other project. The one diff I can see is the orig project was created several yrs ago and the new proj was just created. As for the Datasets, both were created just recently. In the one that sits within the older project, the json - with the '-' char is auto read and schema'd just fine. In the one, I get the error.
Thanks for clarification. Here’s a an approach to troubleshooting the issue with creating external tables from JSON in BigQuery:
Schema Auto-Detection:
Default Dataset:
Project and Dataset Differences:
Creating a View:
Project-Level Settings:
Direct Schema Definition:
Testing with Different JSON Files:
Version or Feature Updates in BigQuery:
Next Steps:
Ok, I am going to close this issue. I believe that this does indeed have to do with changes that have been made since creating the original Project. There is no other explanation that makes sense.
Below is how I was finally able to create the EXT JSON table. NOTE: Fortunately, I dont need the col with the offending '-' char. If I did, then I dont this would be possible since you need to match the schema with the fields in the JSON in order to query those values.
CREATE OR REPLACE EXTERNAL TABLE `ford-analytics-408918.Daily_JSON_Logs.day_01`
(
meta STRUCT
<
count INTEGER--,
--`completed-in` INTEGER # CANNOT MATCH THIS FIELD
>,
data ARRAY <
STRUCT <
country STRING,
language STRING,
entityTags ARRAY <STRING>,
modelTags ARRAY <STRING>,
makeTags ARRAY <STRING>,
sa STRING,
messageType STRING,
classTags ARRAY <STRING>,
published TIMESTAMP,
author STRING,
content STRING,
link STRING,
queue STRING,
userTags ARRAY <STRING>,
title STRING,
dataSource STRING,
threadId STRING,
dataCategory STRING,
id STRING
>
>
)
OPTIONS (
format = 'NEWLINE_DELIMITED_JSON',
uris = ['https://drive.google.com/open?id=1YhNpquVz9YqU8PODkL7EiTn2NOTh7eYU'],
ignore_unknown_values = TRUE
);
And now, querying the table yeilds a valid result set:
Thank you for your help with this matter.
Happy Holidays! 🎄⛄🎅🏼🤶🦌❄