Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Google DataStream Fails with “UNKNOWN ERROR” When Connecting DigitalOcean PostgreSQL to BigQuery

🔹 Problem Description

I am trying to set up Google Cloud DataStream to replicate data from DigitalOcean Managed PostgreSQL to BigQuery. However, DataStream fails with “UNKNOWN ERROR”, and I am unable to get detailed logs or insights into the root cause.

 

🔹 Setup Details

Google Cloud Project ID: parang-preseller-903a7

Source: DigitalOcean PostgreSQL (db-sfa-phpr-prod-do-user-2002914-0.c.db.ondigitalocean.com)

Destination: BigQuery Dataset (phapros_prod)

Google Cloud Region: asia-southeast2

PostgreSQL Version: 14

WAL Level: Logical (SHOW wal_level; returns logical)

Replication Slot: datastream_slot (Created manually)

Publication: datastream_publication (Created manually)

Authentication Method: MD5 (SHOW password_encryption; returns MD5)

Google DataStream Service Account IAM Roles:

roles/datastream.streamsAdmin

roles/datastream.connectionsAdmin

roles/bigquery.admin

 

🔹 Steps Taken & Verifications

 

1. Verified Logical Replication & Slots

SHOW wal_level;  -- Returns 'logical'

SHOW max_replication_slots;  -- Returns > 0

SHOW max_wal_senders;  -- Returns > 0

SELECT * FROM pg_publication;  -- Shows 'datastream_publication'

SELECT * FROM pg_replication_slots;  -- Shows 'datastream_slot'

👉 Logical replication appears to be enabled correctly.

 

2. Verified Network Connectivity

Checked PostgreSQL connection from Google Cloud Shell:

 

PGPASSWORD="my-password" psql -h db-sfa-phpr-prod-do-user-2002914-0.c.db.ondigitalocean.com -U doadmin -d preseller-phpr -p 5432

Successfully connected. No firewall issues.

 

Whitelisted Google Cloud’s IPs in DigitalOcean firewall.

 

curl -s https://www.gstatic.com/ipranges/goog.json | jq '.prefixes[] | select(.service=="Cloud") | .ipv4Prefix'

 

Ensured Google Cloud can access PostgreSQL.

 

3. Checked Google Cloud Logs

 

gcloud logging read "resource.type=datastream.googleapis.com" --limit 50 --format=json

 

Empty result

 

4. Verified IAM Permissions for DataStream

 

gcloud projects get-iam-policy parang-preseller-903a7 --flatten="bindings[].members" --format="table(bindings.role, bindings.members)"

 

Service account has correct roles (datastream.streamsAdmin, connectionsAdmin, bigquery.admin).

 

5. Tried Using the DataStream API Instead of gcloud

 

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \

     -H "Content-Type: application/json" \

     -d '{ ... }' \

     "https://datastream.googleapis.com/v1/projects/parang-preseller-903a7/locations/asia-southeast2/strea..."

 

Still fails with “UNKNOWN ERROR”.

 

🔹 Latest Error Log

 

gcloud datastream operations describe operation-1738566109614-62d377761d365-01899bfc-f2590ebb --location=asia-southeast2

 

🔴 Error Output:

 

done: true

error:

  code: 2

  details:

  - '@type': type.googleapis.com/google.rpc.ErrorInfo

    domain: datastream.googleapis.com

    metadata:

      message: An unknown error occurred. Please try again. If the error persists,

        contact Google support.

      originalMessage: ''

      time: '2025-02-03T07:01:54.870652Z'

      uuid: 4913cd5a-e8b8-4774-ab8d-4210dfa7daef

    reason: UNKNOWN

  message: Unknown Error.

metadata:

  '@type': type.googleapis.com/google.cloud.datastream.v1.OperationMetadata

  apiVersion: v1

  createTime: '2025-02-03T07:01:49.937024429Z'

  endTime: '2025-02-03T07:01:56.194644089Z'

  requestedCancellation: false

  target: projects/parang-preseller-903a7/locations/asia-southeast2/streams/PhaprosProdToBigQuery

  verb: create

name: projects/parang-preseller-903a7/locations/asia-southeast2/operations/operation-1738566109614-62d377761d365-01899bfc-f2590ebb

 

🔹 Questions

1.Does Google DataStream officially support DigitalOcean PostgreSQL?

2.Are there additional logs/debugging tools to see why DataStream fails?

3.Does DataStream require access to shared_preload_libraries?

•(I cannot check this on DigitalOcean as it requires pg_read_all_settings or superuser.)

4.Is there a workaround for DigitalOcean’s limited superuser permissions?

5.Would migrating to Cloud SQL (Google Managed PostgreSQL) be the only solution?

6.Has anyone successfully used Google DataStream with DigitalOcean PostgreSQL?

 

🔹 Any Help is Appreciated!

Solved Solved
0 1 125
1 ACCEPTED SOLUTION

Hi @mfhanif,

Welcome to Google Cloud Community!

Thank you for providing the information about your configuration and the steps you took to troubleshoot—it’s clear you’ve done an excellent job pinpointing this issue. I understand why this is frustrating, particularly with the “UNKNOWN ERROR” and lack of helpful logs.

The problem is probably related to the restrictions of DigitalOcean’s managed PostgreSQL, not an issue with Google Cloud. Due to DigitalOcean limiting superuser access and specific WAL settings/extensions, DataStream may encounter permissions or configuration problems that it cannot manage correctly—leading to the ambiguous error.

Workarounds

🔹 Option 1: Use Cloud SQL as an Intermediate PostgreSQL
Since Google Cloud officially supports Cloud SQL for PostgreSQL as a DataStream source, you could:

  1. Set up a Cloud SQL instance.
  2. Use PostgreSQL’s logical replication to sync data from DigitalOcean to Cloud SQL.
  3. Then use DataStream to replicate from Cloud SQL to BigQuery like in this example.

This creates a fully supported pipeline without running into DigitalOcean’s restrictions.

🔹 Option 2: Use an Alternative ETL Approach
If setting up Cloud SQL isn’t ideal, you could try:

  • Cloud Composer (Apache Airflow): Automate incremental data pulls.
  • Cloud Functions + BigQuery Streaming API: A serverless approach for near real-time ingestion.
  • Third-party ETL tools (Fivetran, Stitch, or Airbyte): Some of these support DigitalOcean PostgreSQL directly.

To Answer Your Questions

  1. Does Google DataStream officially support DigitalOcean PostgreSQL?
    No official documentation confirms this other than what’s currently listed here. Lack of superuser access may block DataStream.
  2. Are there additional logs/debugging tools to see why DataStream fails?
    Try gcloud datastream routes list and gcloud logging read "severity>=ERROR".
  3. Does DataStream require access to shared_preload_libraries?
    Likely, but DigitalOcean may not allow it.
  4. Is there a workaround for DigitalOcean’s limited superuser permissions?
    Ask DigitalOcean support for pg_read_all_settings or migrate to Cloud SQL.
  5. Would migrating to Cloud SQL (Google Managed PostgreSQL) be the only solution?
    It’s the most reliable option if DigitalOcean’s limitations persist.
  6. Has anyone successfully used Google DataStream with DigitalOcean PostgreSQL?
    No confirmed reports; most use Cloud SQL or alternative pipelines.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

View solution in original post

1 REPLY 1

Hi @mfhanif,

Welcome to Google Cloud Community!

Thank you for providing the information about your configuration and the steps you took to troubleshoot—it’s clear you’ve done an excellent job pinpointing this issue. I understand why this is frustrating, particularly with the “UNKNOWN ERROR” and lack of helpful logs.

The problem is probably related to the restrictions of DigitalOcean’s managed PostgreSQL, not an issue with Google Cloud. Due to DigitalOcean limiting superuser access and specific WAL settings/extensions, DataStream may encounter permissions or configuration problems that it cannot manage correctly—leading to the ambiguous error.

Workarounds

🔹 Option 1: Use Cloud SQL as an Intermediate PostgreSQL
Since Google Cloud officially supports Cloud SQL for PostgreSQL as a DataStream source, you could:

  1. Set up a Cloud SQL instance.
  2. Use PostgreSQL’s logical replication to sync data from DigitalOcean to Cloud SQL.
  3. Then use DataStream to replicate from Cloud SQL to BigQuery like in this example.

This creates a fully supported pipeline without running into DigitalOcean’s restrictions.

🔹 Option 2: Use an Alternative ETL Approach
If setting up Cloud SQL isn’t ideal, you could try:

  • Cloud Composer (Apache Airflow): Automate incremental data pulls.
  • Cloud Functions + BigQuery Streaming API: A serverless approach for near real-time ingestion.
  • Third-party ETL tools (Fivetran, Stitch, or Airbyte): Some of these support DigitalOcean PostgreSQL directly.

To Answer Your Questions

  1. Does Google DataStream officially support DigitalOcean PostgreSQL?
    No official documentation confirms this other than what’s currently listed here. Lack of superuser access may block DataStream.
  2. Are there additional logs/debugging tools to see why DataStream fails?
    Try gcloud datastream routes list and gcloud logging read "severity>=ERROR".
  3. Does DataStream require access to shared_preload_libraries?
    Likely, but DigitalOcean may not allow it.
  4. Is there a workaround for DigitalOcean’s limited superuser permissions?
    Ask DigitalOcean support for pg_read_all_settings or migrate to Cloud SQL.
  5. Would migrating to Cloud SQL (Google Managed PostgreSQL) be the only solution?
    It’s the most reliable option if DigitalOcean’s limitations persist.
  6. Has anyone successfully used Google DataStream with DigitalOcean PostgreSQL?
    No confirmed reports; most use Cloud SQL or alternative pipelines.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.