Hello,
I am trying to create a datastream from postgres to Big Query. I followed the instructions. Logical decoding is enabled, replication / publication slots are there and the PG user account can run selects on the tables.
When I create the stream and validate I get all check marks except for the backfill permissions. When I try to run the stream it says the table is not there or the user account does not have permissions.
Everything seems to be there and there is nothing I have not tried.
Does anyone know what is going on?
Thank you
Here is a through a comprehensive checklist to ensure that all necessary permissions and configurations are correctly set up:
Verify PostgreSQL User Permissions:
Ensure that the PostgreSQL user configured for Datastream has the following permissions:
You can verify and grant these permissions with the following SQL commands:
GRANT SELECT ON ALL TABLES IN SCHEMA your_schema TO your_datastream_user;
GRANT USAGE ON SCHEMA your_schema TO your_datastream_user;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA your_schema TO your_datastream_user;
-- Only if functions/triggers are used
GRANT REPLICATION TO your_datastream_user;
Check Logical Replication Setup:
Ensure logical replication is enabled in your PostgreSQL configuration (postgresql.conf
) :
wal_level = logical
max_replication_slots = [number of slots needed]
max_wal_senders = [number of senders needed]
Validate Publication and Slot Configuration:
Ensure that the publication and replication slot are correctly set up and the user has the necessary permissions:
-- Create publication if not already done
CREATE PUBLICATION your_publication FOR ALL TABLES;
-- Verify publication
SELECT * FROM pg_publication;
-- Check replication slots
SELECT * FROM pg_replication_slots;
-- Verify that tables are in the publication:
SELECT * FROM pg_publication_tables WHERE pubname = 'your_publication';
Ensure Datastream Service Account Permissions:
Ensure the Datastream service account has the required permissions to access and manage the PostgreSQL database. This typically involves granting IAM roles for Cloud SQL if using a managed PostgreSQL instance.
Network Configuration:
Ensure that there are no network issues preventing Datastream from accessing your PostgreSQL instance. Check firewall rules, VPC peering, and ensure that the Datastream IP ranges are whitelisted.
Verify Table Existence and Visibility:
Ensure that the tables are present in the database and that there are no schema visibility issues. Sometimes, tables might not be visible to the user if the search path is not set correctly.
Review Datastream Logs and Metrics:
Check the logs and metrics in the Google Cloud Console for Datastream. Look for any specific error messages or warnings that could provide more details on why the backfill permissions check is failing.
Check for PostgreSQL Extensions and Dependencies:
Ensure that any required extensions or dependencies are installed and configured properly in your PostgreSQL instance (e.g., PostGIS for geospatial data).
Example Configuration Check:
The example code provided earlier is an excellent way to test the publication and replication slot creation.
Additional Debugging Steps:
If the issue persists, consider the following additional debugging steps:
If all else fails, contact Google Cloud Support with detailed logs and error messages for further assistance.
Thank you for your response, I really appreciate it.
Verify PostgreSQL User Permissions:
I logged into the postgres instance as the user datastream uses and the user can select all for the tables I am trying to pull. Verified the replication slot and that the user has replication rights. I ran the USAGE rights for the user and it was successful so that is not an issue. For the EXECUTE, is this required? We are not running any functions or triggers.
Check Logical Replication Setup:
I know this works because we had CDC set up with a 3rd party and it was working. We are using datastream as a potential replacement.
Validate Publication and Slot Configuration:
I validated the pub slot and ran the sql to show which tables it has. Oddly, only a select number of tables were available. I re-ran the stream with one of the tables that it listed in the pub and it still failed.
Ensure Datastream Service Account Permissions:
For the DB all the permissions were made in cloud sql. Does datastream need other IAM permissions?
Network Configuration:
We just went with the IP whitelisting and they are listed in the cloud sql instance.
Verify Table Existence and Visibility
Not sure what you mean by this. Is there specific things I should look at?
Review Datastream Logs and Metrics:
There are no useful logs from cloud sql or datastream. I did meet with google support and, when running the stream, they did not find any logs or warnings from the backend which helped.
Check for PostgreSQL Extensions and Dependencies:
As mentioned before, we used a 3rd party tool which was able to run CDC so I assume there is nothing that needs to be added. Are there any extensions or add ons besides logical decoding that need to be added / enabled.
As mentioned I did meet with google support and they could not find what was going on. They submitted a bug to the product team and I am waiting to hear back.
If you have any additional suggestions please let me know. It has been close to 2 weeks I have been trying to get this running and it is something our analytics team needs.
Thank you.
Please provide the case number so I can check the status internally.
To make sure you're not missing anything , please double-check the following:
search_path
:
SHOW search_path;
DROP PUBLICATION your_publication;
CREATE PUBLICATION your_publication FOR ALL TABLES;
GRANT USAGE ON SCHEMA your_schema TO your_datastream_user;
GRANT SELECT ON TABLE your_schema.your_table TO your_datastream_user;
your_schema
and your_table
with your actual values)roles/datastream.admin
roles/cloudsql.client
roles/bigquery.dataEditor
Specific Checks:
SELECT * FROM pg_replication_slots;
log_statement = 'all'
log_connections = on
log_disconnections = on
pgoutput
extension is enabled:
SELECT * FROM pg_available_extensions WHERE name = 'pgoutput';
For the service account for datastream, there was nothing about granting roles / permissions to the SA. Is that something we have to do?
Yes, granting the right roles and permissions to the service account used by Datastream is crucial for its proper functioning. If the service account lacks the necessary permissions, it may lead to issues such as the "Backfill Permissions Failed" error you're encountering.
Here's how to ensure that the Datastream service account has the required permissions:
1. Identify the Service Account:
2. Grant Required IAM Roles:
Roles Needed:
Assign Roles Using the Console:
Assign Roles Using gcloud:
# Replace [SERVICE_ACCOUNT] with your service account email
SERVICE_ACCOUNT=[SERVICE_ACCOUNT]
# Grant Datastream Admin
gcloud projects add-iam-policy-binding [PROJECT_ID] \
--member=serviceAccount:$SERVICE_ACCOUNT \
--role=roles/datastream.admin
# Grant Cloud SQL Client
gcloud projects add-iam-policy-binding [PROJECT_ID] \
--member=serviceAccount:$SERVICE_ACCOUNT \
--role=roles/cloudsql.client
# Grant BigQuery Data Editor
gcloud projects add-iam-policy-binding [PROJECT_ID] \
--member=serviceAccount:$SERVICE_ACCOUNT \
--role=roles/bigquery.dataEditor
3. Verify Network Access and Firewall Rules:
4. Verify Service Account Scope and Credentials:
5. Check Datastream Configuration:
Troubleshooting Steps:
@RyanSchulte were you ever able to resolve this issue? running into the same.
Hi @RyanSchulte it seems like you’ve covered most of the requirements for configuring a Datastream between PostgreSQL and BigQuery. However, the issue with backfill permissions indicates that some table-level or user-level settings might need extra attention. Let’s go through the steps to resolve it:
1. Validate PostgreSQL Configuration
Ensure PostgreSQL Settings Are Properly Configured
Enable Logical Decoding
Confirm that logical decoding is enabled in your PostgreSQL server by checking the postgresql.conf file. Make sure the following settings are in place:
After updating these settings, restart PostgreSQL.
Replication Slot and Publication
Ensure the replication slot and publication include the tables you need:
Verify User Permissions
Make sure the PostgreSQL user that Datastream uses has the necessary privileges:
2. Check Datastream Backfill Configuration
Backfill Permissions
Handle Schema Changes
3. Troubleshoot the "Table Not Found" Error
Verify Table Visibility
Check Network Connectivity
If the issue persists, you might want to explore third-party tools for data replication. These tools often simplify the setup process and handle errors more effectively.
For instance, Windsor.ai provides connectors for PostgreSQL and BigQuery, making data transfer seamless without requiring complex configurations. This could be a time-saving alternative to Google Datastream.
@RyanSchulte for reference, we were able to get this working when we explicitly granted SELECT permissions to the Postgres user we set up forDatastream; for some reason Datastream did not recognize that it had permissions when granted via a role.