Solved: Re: Using log CDC for cloud sql datastream replica...

cbuckley

Hello,

We are currently using Datastream and Sql Server's CDC to replicate data to a dataset in bigquery. We often have schema changes (new columns usually to support new features) and have to do some manual steps to get the new columns replicating:

1) Remove the tables from the source connection in the datastream

2) CDC Disable, then enable the tables with the new columns in sql server instance

3) Re-add the tables (this initiates a backfill for the tables)

We have some rather large tables and it takes some time to backfill (not to mention taking time away from other tasks)

That's the background

When researching we found that the log method should automatically pick up new columns and alter the schema so no manual intervention needed.

The issue we are running into is we cannot grant the appropriate permissions for any sql server login/user to access the fn_dblog system function and the sqlserver account does not have permissions either.

Is there any known workaround outside of standing up a physical sql server instance?

reference:

https://cloud.google.com/datastream/docs/configure-cloudsql-sqlserver#additional_steps_required_for_...

cbuckley

Strangely enough, after having issues with sqlserver user account, the dev ops person tried again and was able to set it up using the sqlserver user. Wondering if we had something set differently before? Either way, thank you for your input

View solution in original post

alisha47

Hey cbuckley,

Yes, I totally understand your frustration here.— schema changes with CDC and Datastream are notoriously manual.

You're correct: log-based replication (vs. table-based CDC) is supposed to detect schema changes automatically. However, as you found, fn_dblog access is required, and on managed SQL Server (especially on cloud platforms), that access is typically restricted and non-grantable, even to sysadmin-level users.

Microsoft intentionally blocks access to fn_dblog in managed environments (e.g., Azure SQL, RDS for SQL Server), making log-based change capture impractical without full server access.

Here are a few alternative strategies that helped me:

Switch to timestamp-based delta extraction: If your tables include a reliable updated_at field, you can skip full backfills and just pull incremental changes via a custom ETL or Dataflow job. Much faster and avoids CDC restarts.
Consider using a separate change log table: Some teams maintain shadow tables or triggers to log changes explicitly, which lets you decouple from CDC entirely.
If you're on a managed cloud SQL Server, unfortunately, there's no workaround to access fn_dblog short of running your own VM-based SQL Server where you have full control. That might be necessary if zero-downtime schema evolution is critical.

Let me know if you want a sample Dataflow or custom ETL setup — happy to share what worked for us.

cbuckley

Strangely enough, after having issues with sqlserver user account, the dev ops person tried again and was able to set it up using the sqlserver user. Wondering if we had something set differently before? Either way, thank you for your input

Using log CDC for cloud sql datastream replication