Solved: Problem for querying a table in one region and ins...

aalshikh2 · 12-05-2024 02:23 AM

Hello,

I'm currently working on a script that retrieves information from datasets in different projects and inserts this information into a new table, in another dataset, in another project.

The script follows the steps below:
- retrieve the list of projects to which you have access
- create a table suffixed with _currentDate, in an existing dataset of a project dedicated to metrics
- for each project:
- retrieve the list of regions and sub-regions for which datasets exist
- make a query to extract the information with `{project_id}`.`region-{location}`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS
- insert the query response in the project table dedicated to metrics

Sometimes it works, but sometimes I get a 404 POST error Table:bigquery_storage.metrics_20241205 not found.

However, the table is well created and is also well found when I do a :

try:
        client.get_table(my_table_ref)

just after its creation.

What's also strange is that in this scenario :
- I create the metric table in dataset A of project A.
- my python script starts by processing the dataset from project A, then from project B, C, etc.

then it works.

Conversely, the following scenario :
- I create the metric table in dataset A of project A.
- my python script starts by processing datasets from a project other than A

then it doesn't work

Can you help me please ?

aalshikh2

Hello Caryna,

thank you for your response and the helpful resources!

The potential cause of the short delay in table creation is interesting, and I appreciate the suggestion. However, I have already implemented a time.sleep(40) after the table creation, but the issue persists. Is there a more reliable way to ensure the table is fully available for write operations immediately after creation?

In my script, I use the following functions to retrieve and insert data:

from google.cloud import bigquery

...
def execute_query_for_project_and_datasets(dataset_id, table_id, project_client_id, project_id, location):
    client = bigquery.Client(project=project_client_id)

    query = f"""
    my_request_using_project_id_and_location
    """
    
    try:
        query_job = client.query(query)
        results = query_job.result()

        rows_to_insert = []

        for row in results:
            rows_to_insert.append({ ... })

        target_client = bigquery.Client(project=project_client_id)
        table_ref = target_client.dataset(dataset_id).table(table_id)
        errors = target_client.insert_rows_json(table_ref, rows_to_insert)
        if errors:
            print(f"Error inserting rows: {errors}")
        else:
            print(f"Success")

    except Exception as e:
        print(f"Error querying in project")
        return []

Could the issue be related to the differences in regions? Specifically, data is retrieved from one region and stored in memory before being written into a table in another project and region. Could this regional difference impact the process, even though the data is handled in-memory before insertion?

Thanks again for your help, and I look forward to your insights!

View solution in original post

caryna

Hi @aalshikh2,

Welcome to Google Cloud Community!

The 404 error could be caused by a short delay in creating a table and the availability to write operations. You can review this documentation which provides a description of the error that might have caused the issue and the potential solution.

Check this related discussion: BigQuery: 404 table not found even when the table exists this might be also helpful based on your scenario.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

aalshikh2

Hello Caryna,

thank you for your response and the helpful resources!

The potential cause of the short delay in table creation is interesting, and I appreciate the suggestion. However, I have already implemented a time.sleep(40) after the table creation, but the issue persists. Is there a more reliable way to ensure the table is fully available for write operations immediately after creation?

In my script, I use the following functions to retrieve and insert data:

from google.cloud import bigquery

...
def execute_query_for_project_and_datasets(dataset_id, table_id, project_client_id, project_id, location):
    client = bigquery.Client(project=project_client_id)

    query = f"""
    my_request_using_project_id_and_location
    """
    
    try:
        query_job = client.query(query)
        results = query_job.result()

        rows_to_insert = []

        for row in results:
            rows_to_insert.append({ ... })

        target_client = bigquery.Client(project=project_client_id)
        table_ref = target_client.dataset(dataset_id).table(table_id)
        errors = target_client.insert_rows_json(table_ref, rows_to_insert)
        if errors:
            print(f"Error inserting rows: {errors}")
        else:
            print(f"Success")

    except Exception as e:
        print(f"Error querying in project")
        return []

Could the issue be related to the differences in regions? Specifically, data is retrieved from one region and stored in memory before being written into a table in another project and region. Could this regional difference impact the process, even though the data is handled in-memory before insertion?

Thanks again for your help, and I look forward to your insights!

Problem for querying a table in one region and inserting the answer in a new table in another region