using system variable dataset_project_id while exe...

crazyrevol

Hi,

Best-practice question -

When executing a stored procedure, how can I set the dataset_project_id system variable ? Is it recommended to set the variable right before the CALL of procedure or inside stored procedure by handling it as an IN param in the procedure ?

example of what I was thinking. What if the stored procedure is not in the same project of tables that am using inside the stored procedure. SHould I set the dataset_project_id inside stored procedure again to the project where the table reside ?

What if I also want to set @@project_id at session level or if I have a different billing project_id already setup in the connection object? Where does BQ look to find the dataset where procedure is and where does it look to find the dataset that is referenced inside the procedure ?

-- Set the default project for dataset references
SET @@dataset_project_id = 'your-project-id';
 

-- Now call the stored procedure
CALL your_dataset.your_procedure(...);

ruthseki

Hi crazyrevol,

Welcome to Google Cloud Community!

Best practice is to be explicit. A stored procedure should be self-contained and not depend on the caller's session state. The procedure's code should contain all the information it needs (like full project IDs for tables) to run correctly on its own. It should not require the user to run "setup" commands like SET beforehand, because that makes the procedure's behavior unpredictable and creates a hidden dependency that is easy to forget and leads to serious bugs.

You can try to use fully qualified names inside the procedure. This is the most robust and recommended approach. The procedure explicitly states which project's data it operates on, making it completely independent of the calling environment.

Another good practice when you need to create a generic, reusable procedure that is designed to operate on data in different projects is to pass the Project ID as a parameter.

On the other hand, set @@dataset_project_id before the CALL works is considered fragile for production code because it creates a hidden dependency and is prone to failures.

To directly answer your final question: Where does BQ look to find the dataset where the procedure is and where does it look to find the dataset that is referenced inside the procedure ?

To find the Stored Procedure:
- BigQuery looks at the CALL statement (e.g., CALL proj.dataset.proc()).
- If you provide a full project.dataset.procedure path, it uses that.
- If you only provide dataset.procedure, it uses the current @@project_id (billing project) to resolve the project. CALL my_dataset.proc() is interpreted as CALL my-billing-project.my_dataset.proc().
To find Tables/Views referenced inside the Procedure:
- BigQuery first checks if the table name is fully qualified (e.g., my-data-project.my_dataset.my_table). If it is, that path is used, and the search stops.
- If the table name is not fully qualified (e.g., my_dataset.my_table), BigQuery takes the value of the @@dataset_project_id session variable and prepends it.
- If @@dataset_project_id has not been explicitly set in the session, its value defaults to the current @@project_id (the billing project).
- Check these documentations for reference: Introduction to datasets and Big Query public datasets

Feel free to check this similar case, you might find this useful:

How can I use the @@dataset_project_id system variable within a stored procedure?

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

using system variable dataset_project_id while executing stored procedure