Re: PK and FK syntax to make my script idempotent

crazyrevol

I have below script where I create or replace tables in my pipeline. I add PK and FK constraint after the table are created. How can I make the PK FK statements idempotent? I can't seem to add IF NOT EXISTS as indicated in the docs.

CREATE OR REPLACE TABLE `dataset.X`
PARTITION BY xxxx
CLUSTER BY xxxx AS
SELECT *
FROM
`work_table` where <condition>;

CREATE OR REPLACE TABLE `dataset.Y`
PARTITION BY xxxx
CLUSTER BY xxxx AS
SELECT *
FROM
`work_table2` where <condition>
;

ALTER table dataset.X
ADD primary key(Key) NOT ENFORCED;

ALTER table dataset.Y
ADD FOREIGN KEY(Key) references dataset.X(Key) NOT ENFORCED;

ibaui

Hi @crazyrevol,

Welcome to Google Cloud Community!

Currently, BigQuery doesn't support the use of IF NOT EXISTS in ALTER TABLE statements for adding primary or foreign key constraints. Instead, you need to check whether the constraints already exist before attempting to add them. You may want to look into these limitations.

Because of the CREATE OR REPLACE TABLE and the declarative nature of the constraints, you have to manage the constraint application outside the standard ALTER TABLE commands. Here's an approach you can consider for addressing your use case:

Check for Constraint Existence using INFORMATION_SCHEMA: Query the INFORMATION_SCHEMA.TABLE_CONSTRAINTS (for the primary key) and INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS (for the foreign key) views to see if the constraints already exist. You can refer to the documents here. If they do, skip the ALTER TABLE commands.
Create Stored Procedures: Improve your pipeline's organization and reusability by using stored procedures to handle both table creation and constraint application.You can refer to the main documentation page for BigQuery stored procedures, along with the detailed reference for the procedural language they support.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

crazyrevol

got it! Thank you for your reply @ibaui

We are using this in pipelines where we "create or replace table" to create data assets in our gold layer yesterday. The pipelines do not have access to INFORMATION_SCHEMA. some kind of security think imposed by devOps. Not sure why! But a side question - does INFORMATION_SCHEMA contain any sensitive data ? am be for queries using PII?

So, the way am making this idempotent is by following these steps

1. DELETE constraint if already exits.
ALTER TABLE `<table>` DROP CONSTRAINT IF EXISTS Key;
2. CREATE OR REPLACE table statement.
3. Create constraint.
ALTER table `<table>` ADD primary key(Key) NOT ENFORCED;

Since constraints dont create any new data structures, there is no cost compute or storage wise for us as far as I understand. Is this approach valid to make our pipeline idempotent ? Do you see any problems ?