Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Please Help! Unable to Create an External JSON Table

I have a very weird problem trying to create an EXT new-line formatted JSON table. I say weird because I was able to create the same exact EXT table in another Project-Dataset just a week ago.

I created a NEW Project and Dataset and when I try to create the table I get this:

"Failed to create table: Invalid field name "completed-in". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 300 characters long."

The NEW Dataset looks like this:

 mcancell_1-1703301469639.png

The ORIGINAL Dataset (where I was able to create the table) looks as follows: 

mcancell_0-1703301218303.png

The ORIGINAL EXT Table looks like this:

mcancell_2-1703301672889.png

Why was I able to create the same exact EXT Table (not a single change) in the original dataset but now I cannot create it in a NEW Project/Dataset?

Totally perplexed...

Solved Solved
0 6 1,543
1 ACCEPTED SOLUTION

Ok, I am going to close this issue. I believe that this does indeed have to do with changes that have been made since creating the original Project. There is no other explanation that makes sense.

Below is how I was finally able to create the EXT JSON table. NOTE: Fortunately,  I dont need the col with the offending '-' char. If I did, then I dont this would be possible since you need to match the schema with the fields in the JSON in order to query those values.

CREATE OR REPLACE EXTERNAL TABLE `ford-analytics-408918.Daily_JSON_Logs.day_01`
(
  meta STRUCT 
  <
    count INTEGER--,	
    --`completed-in` INTEGER # CANNOT MATCH THIS FIELD
  >,
  data ARRAY <
    STRUCT <
      country       STRING,
      language      STRING,
      entityTags    ARRAY <STRING>,
      modelTags     ARRAY <STRING>,
      makeTags      ARRAY <STRING>,
      sa	          STRING,
      messageType	  STRING,
      classTags     ARRAY <STRING>,
      published	    TIMESTAMP,	
      author	      STRING,
      content	      STRING,	
      link	        STRING,
      queue	        STRING,	
      userTags      ARRAY <STRING>,
      title	        STRING,
      dataSource	  STRING,	
      threadId	    STRING,
      dataCategory  STRING,	
      id	          STRING
    >
  >
)
OPTIONS (
  format = 'NEWLINE_DELIMITED_JSON',
  uris = ['https://drive.google.com/open?id=1YhNpquVz9YqU8PODkL7EiTn2NOTh7eYU'],
  ignore_unknown_values = TRUE
);

 And now, querying the table yeilds a valid result set:

mcancell_0-1703418421117.png

Thank you for your help with this matter.

Happy Holidays! 🎄🎅🏼🤶🦌

 

View solution in original post

6 REPLIES 6

There a few Potential Causes for the issue you are seeing:

  1. Field Naming Restrictions:

    • The field name "completed-in" includes a hyphen (-), which BigQuery does not permit in field names.
    • Permissible characters are letters, numbers, and underscores.
    • The field name must start with a letter or underscore.
    • The maximum length for a field name is 300 characters.
  2. Project/Dataset Configuration:

    • Possible differences in configuration, permissions, or settings between the new and original projects/datasets might be causing this issue.

Steps to resolve:

  1. Rename the Field:

    • Modify "completed-in" to a BigQuery-compliant name, such as "completed_in".
  2. Check Project/Dataset Permissions:

    • Verify that you have the necessary permissions to create tables in the new project/dataset.
  3. Review Configuration:

    • Compare the settings between the two projects/datasets to identify any discrepancies.

Thanks for the quick reply. 

Yes indeed, there is a '-' in one of the JSON tags. However, why was I able to create these exact same tables on my other Project-Dataset? The perms for both projects/datasets are identical and I am owner of both. The issue is that it will be be difficult to change the tag name w/out some ETL in Python. And bc this process will be managed by non-technical users (BA's), the expectation is that they will just download a daily JSON, e.g., 'day01'-'day31' into a commom shared folder. So changing the tag name in the CREATE process is really not feasible (the col name is changed downstream though).    

So my real question is this - Why was I able to create these JSON EXT tables in one (original) dataset but not another (new one). All the files are identical, as are the permissions.

NEW Dataset Perms:

mcancell_2-1703342286998.png

ORIGINAL Dataset Perms:

mcancell_1-1703341717578.png

 

While permissions and files appear identical, there are likely subtle configuration differences or other factors at play. Here are key areas to investigate further:

  1. Case Sensitivity:

    • BigQuery treats JSON field names as case-sensitive. Ensure that the JSON field names in your files exactly match the schema definitions.
    • If case sensitivity is causing issues, verify the consistency of case usage in both your JSON files and schema definitions.
  2. Default Dataset Setting:

    • While the "default dataset" setting primarily affects table references in queries, it's worth checking for any indirect impacts on table creation from JSON files.
  3. Creation Method:

    • Examine the exact method used to create the tables in the original dataset. Ensure that the same process, tools, and parameters are used in the new dataset.
  4. Alternative Approach:

    • If renaming fields in the JSON is not feasible, consider creating a view on top of the problematic table. This view can rename fields during the SELECT process, providing a workaround without altering the underlying data.
  5. Additional Considerations:

    • Project-Level Settings: Review any project-level settings that might differ between the original and new datasets and could impact behavior.
    • Schema Validation: Investigate if the auto-detection of schemas in BigQuery is functioning differently in the new dataset. In some cases, defining the schema manually can be more reliable.
    • File Integrity: Confirm that the JSON files are well-formed and consistent in structure, as discrepancies can lead to issues in table creation.
  6. Testing with Known Good Data:

    • To further diagnose the issue, try creating tables with a simple, known-good JSON file in the new dataset. This can help determine if the problem is with the specific files or the system setup.

Remember, consistency in file format, schema definitions, and creation methods is key in managing BigQuery datasets effectively.

1. The schema was intended to be auto-detected. This is how I created the EXT tables in the original dataset.

2. Not exactly sure what a "default dataset" is. I've never set or changed the default. I will check.

3. Everything is literally identical except for a) the Project and b) the dataset. The ext json file has not changed at all.

 4. I am not sure I understand. If I can't create the table how can I create a view to query and change the col name. Maybe I am missing what you mean....

5.  I am not sure I know how to get "Project" level settings. I dont see any properties at the Project level except server region, which in both projects is set exactly the name. I may end up creating the schemas directly in sql. I have yet to try that and was hoping to avoid. The actual jsons must be fine as they read perfectly in the other Project/DS and also in Python. I will also use another json file and test creation. I suspect it will work if no '-' char. At this point I am just baffled as to why I was able to create in my other project. The one diff I can see is the orig project was created several yrs ago and the new proj was just created. As for the Datasets, both were created just recently. In the one that sits within the older project, the json - with the '-' char is auto read and schema'd just fine. In the one, I get the error.

 

Thanks for clarification. Here’s a an approach to troubleshooting the issue with creating external tables from JSON in BigQuery:

  1. Schema Auto-Detection:

    • Since the schema was auto-detected in the original dataset, ensure that this setting is consistent in the new dataset. Sometimes, auto-detection might behave differently due to updates in BigQuery's algorithms or settings.
  2. Default Dataset:

    • If you haven't set a default dataset, it's unlikely to be the cause of the issue. This setting typically affects query execution rather than table creation.
  3. Project and Dataset Differences:

    • Given that the only differences are the project and dataset, it's possible that there have been changes in BigQuery's handling of external data sources, especially if the original project was created several years ago.
  4. Creating a View:

    • The suggestion to create a view was under the assumption that the table could be created but with issues in field names. Since the table creation itself is failing, this approach isn't applicable.
  5. Project-Level Settings:

    • If both projects have the same server region and no other visible differences in settings, it's less likely that project-level configurations are causing the issue.
  6. Direct Schema Definition:

    • Defining the schema directly in SQL might be a viable workaround. This approach can sometimes bypass issues encountered with auto-detection, especially with complex JSON structures or special characters like '-'.
  7. Testing with Different JSON Files:

    • Testing with another JSON file, particularly one without a '-' character, is a good diagnostic step. If this works, it suggests that the issue is related to how BigQuery's auto-detection handles certain characters.
  8. Version or Feature Updates in BigQuery:

    • Since the original project was created several years ago, it's possible that BigQuery has undergone updates that changed how external JSON data is processed. These updates might affect new projects or datasets differently.
  9. Next Steps:

    • If direct schema definition resolves the issue, it might be the most straightforward solution.
    • If testing with a different JSON file works, consider preprocessing your JSON files to replace or remove problematic characters.
    • If the issue persists, reaching out to BigQuery support with specific details of your case (including the error messages and differences between the two projects) could provide more targeted assistance.

 

Ok, I am going to close this issue. I believe that this does indeed have to do with changes that have been made since creating the original Project. There is no other explanation that makes sense.

Below is how I was finally able to create the EXT JSON table. NOTE: Fortunately,  I dont need the col with the offending '-' char. If I did, then I dont this would be possible since you need to match the schema with the fields in the JSON in order to query those values.

CREATE OR REPLACE EXTERNAL TABLE `ford-analytics-408918.Daily_JSON_Logs.day_01`
(
  meta STRUCT 
  <
    count INTEGER--,	
    --`completed-in` INTEGER # CANNOT MATCH THIS FIELD
  >,
  data ARRAY <
    STRUCT <
      country       STRING,
      language      STRING,
      entityTags    ARRAY <STRING>,
      modelTags     ARRAY <STRING>,
      makeTags      ARRAY <STRING>,
      sa	          STRING,
      messageType	  STRING,
      classTags     ARRAY <STRING>,
      published	    TIMESTAMP,	
      author	      STRING,
      content	      STRING,	
      link	        STRING,
      queue	        STRING,	
      userTags      ARRAY <STRING>,
      title	        STRING,
      dataSource	  STRING,	
      threadId	    STRING,
      dataCategory  STRING,	
      id	          STRING
    >
  >
)
OPTIONS (
  format = 'NEWLINE_DELIMITED_JSON',
  uris = ['https://drive.google.com/open?id=1YhNpquVz9YqU8PODkL7EiTn2NOTh7eYU'],
  ignore_unknown_values = TRUE
);

 And now, querying the table yeilds a valid result set:

mcancell_0-1703418421117.png

Thank you for your help with this matter.

Happy Holidays! 🎄🎅🏼🤶🦌