Solved: Re: Datastream to Mysql SQL private IP not working...

Nikita_G · 04-08-2024 11:03 PM

I'm using Terraform to create a DataStream that connects to a private MySQL instance in Cloud SQL. However, I'm encountering connection errors despite seemingly correct configurations.

My Terraform provisions the following resources:

Google Cloud Networking resources (likely VPC and subnets)
Cloud SQL Private MySQL instance
Google DataStream Private Connection
Compute Engine VM instance

I don't understand why it is not able connect MySQL server:

ms4446

Here are some steps to troubleshoot the connection issues between Datastream and your private Cloud SQL MySQL instance:

1. Fundamentals: Network and Cloud SQL Setup

VPC Peering: Double-check the peering connection status between your Datastream and Cloud SQL VPCs. Ensure routes are in place for traffic to flow correctly.
Firewall Rules: Verify that firewall rules allow traffic on port 3306 from Datastream's IP ranges to the Cloud SQL instance.
Authorized Networks: In your Cloud SQL settings, make sure the Datastream private connection endpoint's IP address is whitelisted. If you're using Cloud SQL Auth Proxy, add the proxy's IP and ensure Datastream points to it.

2. DataStream Configuration

Connection Profile: Review the hostname (Cloud SQL private IP), port, username, and password. Even small typos here can prevent connections.
Connectivity Method: Ensure "Private Connectivity" is selected in your connection profile.

3. Hands-on Testing

Compute Engine VM: Create a VM in the same VPC (or a peered VPC) as Datastream.
MySQL Client: Install mysql-client on the VM and try connecting: mysql -u your_username -p -h [Cloud SQL private IP]
Telnet: Check basic port access: telnet [Cloud SQL private IP] 3306

4. Deeper Checks

Datastream Logs: Look for detailed error messages in the Datastream logs.
IAM Roles: Ensure Datastream and your VM have the correct roles (e.g., roles/cloudsql.client).

Additional Tips

Simplify: If things are complex, try isolating the issue by setting up a minimal test environment. This can help eliminate unrelated configuration problems.
Plan Ahead: Use terraform plan to get a preview of your infrastructure changes before applying them.

View solution in original post

ms4446

The "connection refused" error from Google Cloud Datastream to your MySQL server (192.168.59.52) likely stems from one of these:

Incorrect hostname resolution (xxxxxx.com)
MySQL server not listening on the expected port/interface
Firewall rules blocking access
Misconfiguration in your Terraform setup

Troubleshooting Steps:

Verify Hostname Resolution:
- From a VM in the same network, run nslookup xxxxxx.com. The output should match your MySQL server's private IP (192.168.59.52).
- If it doesn't, correct your DNS configuration.
Check MySQL Server Accessibility:
- Ensure MySQL is listening on the correct interface (all or the specific private IP). Check the bind-address in your MySQL config file (my.cnf or my.ini).
- Verify firewall rules allow inbound traffic on port 3306 (or your custom MySQL port) from Datastream's internal Google services or IP ranges.
Test Direct Connectivity:
- From a VM in the same network, try mysql -h xxxxxx.com -u your_username -p to connect directly to the MySQL server using the hostname.
- If this fails, there's likely an issue with the server itself (not Datastream).
Review Terraform Configuration:
- In your google_datastream_connection_profile resource, ensure the hostname under mysql_profile directly points to your MySQL server's hostname (not a Compute Engine instance unless it's specifically forwarding MySQL traffic).

Example:

 

resource "google_datastream_connection_profile" "source_connection_profile" {
    # ... other attributes ...

    mysql_profile {
        hostname = "your_mysql_server_hostname"  # Use the actual MySQL hostname
        # ... other attributes ...
    }
}

Further Debugging:

If the issue persists:

Check MySQL server and network logs for connection attempts/rejections.
Contact Google Cloud Support for assistance, especially if you suspect a Datastream service issue.

View solution in original post

ms4446

Yes, you can use the same Cloud SQL Auth Proxy, for multiple Google Cloud Datastream instances.

View solution in original post

ms4446

The error "BIGQUERY_DYNAMIC_DATASET_LOCATION_CONFLICT", "level": "ERROR", "message": "Datastream can't create one or more BigQuery datasets in the specified location, because datasets with the same name already exist in a different location.", "metadata": {"dataset_conflicts":"ss_v2_db (US) "}}],"state": "FAILED" }" is related to a location conflict for BigQuery datasets. According to the error message, it appears that Datastream is attempting to create or write to a dataset in a location that conflicts with existing datasets.

Datastream can't create one or more BigQuery datasets in the specified location, because datasets with the same name already exist in a different location.
The BigQuery dataset you are trying to write to via Datastream is set up in a different geographic location than your Datastream destination configuration specifies.

Here’s how to troubleshoot and potentially resolve this issue:

Check Dataset Location:
- Ensure that the BigQuery dataset you want Datastream to write to is in the same region as specified in your Datastream configuration. You can check the dataset's location in the BigQuery UI or using the bq command-line tool:
  
  bq show --format=prettyjson your-dataset
- Look for the "location" field in the output.
Update Datastream Configuration:
- If the locations mismatch, you have two options:
  1. Change the Datastream Configuration: Modify the location in your dataset_template to match the existing BigQuery dataset's location. Here is the modified section of your Terraform script:
    
    bigquery_destination_config { source_hierarchy_datasets { dataset_template { location = "YOUR_DATASET_LOCATION" # Match this with your BigQuery dataset location } } }
  2. Migrate the BigQuery Dataset: If feasible, you might consider moving your BigQuery dataset to the desired location. However, this can be complex and is not commonly recommended due to the potential for data transfer costs and complexities.
Reapply Terraform Configuration:
- After making necessary changes to your Terraform configuration, apply the changes:
  
  terraform apply
- Monitor the output for any errors to confirm that the issue has been resolved.
Validate and Monitor:
- Once Terraform successfully applies your configuration, monitor the Datastream and BigQuery to ensure data replication starts and operates as expected.
- Check for any delays or issues in the replication process, and verify data integrity in the destination tables.
Logging and Debugging:
- If problems persist, check the Google Cloud Console under Datastream and BigQuery sections for logs and error messages. These can provide more detailed insights into what might be going wrong.

If you continue to encounter difficulties, review the Google Cloud Datastream documentation for any updates or notices regarding limitations or configuration specifics.

- - As a last resort, contacting Google Cloud support can help, especially if the configurations and setup seem correct but errors persist.

View solution in original post

ms4446

Here are some steps to troubleshoot the connection issues between Datastream and your private Cloud SQL MySQL instance:

1. Fundamentals: Network and Cloud SQL Setup

VPC Peering: Double-check the peering connection status between your Datastream and Cloud SQL VPCs. Ensure routes are in place for traffic to flow correctly.
Firewall Rules: Verify that firewall rules allow traffic on port 3306 from Datastream's IP ranges to the Cloud SQL instance.
Authorized Networks: In your Cloud SQL settings, make sure the Datastream private connection endpoint's IP address is whitelisted. If you're using Cloud SQL Auth Proxy, add the proxy's IP and ensure Datastream points to it.

2. DataStream Configuration

Connection Profile: Review the hostname (Cloud SQL private IP), port, username, and password. Even small typos here can prevent connections.
Connectivity Method: Ensure "Private Connectivity" is selected in your connection profile.

3. Hands-on Testing

Compute Engine VM: Create a VM in the same VPC (or a peered VPC) as Datastream.
MySQL Client: Install mysql-client on the VM and try connecting: mysql -u your_username -p -h [Cloud SQL private IP]
Telnet: Check basic port access: telnet [Cloud SQL private IP] 3306

4. Deeper Checks

Datastream Logs: Look for detailed error messages in the Datastream logs.
IAM Roles: Ensure Datastream and your VM have the correct roles (e.g., roles/cloudsql.client).

Additional Tips

Simplify: If things are complex, try isolating the issue by setting up a minimal test environment. This can help eliminate unrelated configuration problems.
Plan Ahead: Use terraform plan to get a preview of your infrastructure changes before applying them.

Nikita_G

Got it! Thank you.

Nikita_G

I have created resources using Terraform for DataStream to MySQL private connectivity. Previously, I created a source connection profile and used an IP address instead of a hostname with same configuration. Here is an example:

But now I want to create it using a hostname instead of an IP address. I have written Terraform code for that; however, I'm facing issues while creating it using a hostname.

Could you please help me with this issue?

Error:

 Error: Error waiting for Updating ConnectionProfile: {"@type":"type.googleapis.com/google.rpc.ErrorInfo","domain":"datastream.googleapis.com","metadata":{"message":"We can't connect to the data source using the hostname that you provided. Make sure that the hostname is correct.","originalMessage":"(2003, \"Can't connect to MySQL server on '192.168.59.52' ([Errno 111] Connection refused)\")","time":"2024-04-11T12:42:25.686357Z","uuid":"0ff7fc6f-cd38-42f0-b76f-498e0e3bbfa1"},"reason":"WRONG_HOSTNAME"}
│ {"code":"VALIDATE_CONNECTIVITY","description":"Validates that Datastream can connect to the source database.","message":[{"code":"WRONG_HOSTNAME","level":"ERROR","message":"We can't connect to the data source using the hostname that you provided. Make sure that the hostname is correct.","metadata":{"original_error":"(2003, \"Can't connect to MySQL server on '192.168.59.52' ([Errno 111] Connection refused)\")"}}],"state":"FAILED"}

│   with google_datastream_connection_profile.source_connection_profile,
│   on main.tf line 143, in resource "google_datastream_connection_profile" "source_connection_profile":
│  143: resource "google_datastream_connection_profile" "source_connection_profile" {

Terraform :

resource "google_compute_instance" "custom_hostname_instance" {
  name         = "custom-hostname-instance-name"
  project      = var.project
  machine_type = var.proxy_machine_type
  zone         = var.zone

  # Set a custom hostname below
  hostname = "xxxxxx.com"

  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }

  network_interface {
    network    = google_compute_network.network.name
    subnetwork = google_compute_subnetwork.private-1.self_link
    access_config {
      // Ephemeral public IP
    }
  }

resource "google_datastream_connection_profile" "source_connection_profile" {
  display_name          = "Source connection profile"
  location              = var.region
  connection_profile_id = "source-profile"
  project               = var.project

  mysql_profile {
    hostname = google_compute_instance.custom_hostname_instance.hostname
    username = var.user_name
    password = var.user_password
  }

  private_connectivity {
    private_connection = google_datastream_private_connection.private.id
  }

  # depends_on = [google_datastream_private_connection.private]
}

ms4446

The "connection refused" error from Google Cloud Datastream to your MySQL server (192.168.59.52) likely stems from one of these:

Incorrect hostname resolution (xxxxxx.com)
MySQL server not listening on the expected port/interface
Firewall rules blocking access
Misconfiguration in your Terraform setup

Troubleshooting Steps:

Verify Hostname Resolution:
- From a VM in the same network, run nslookup xxxxxx.com. The output should match your MySQL server's private IP (192.168.59.52).
- If it doesn't, correct your DNS configuration.
Check MySQL Server Accessibility:
- Ensure MySQL is listening on the correct interface (all or the specific private IP). Check the bind-address in your MySQL config file (my.cnf or my.ini).
- Verify firewall rules allow inbound traffic on port 3306 (or your custom MySQL port) from Datastream's internal Google services or IP ranges.
Test Direct Connectivity:
- From a VM in the same network, try mysql -h xxxxxx.com -u your_username -p to connect directly to the MySQL server using the hostname.
- If this fails, there's likely an issue with the server itself (not Datastream).
Review Terraform Configuration:
- In your google_datastream_connection_profile resource, ensure the hostname under mysql_profile directly points to your MySQL server's hostname (not a Compute Engine instance unless it's specifically forwarding MySQL traffic).

Example:

 

resource "google_datastream_connection_profile" "source_connection_profile" {
    # ... other attributes ...

    mysql_profile {
        hostname = "your_mysql_server_hostname"  # Use the actual MySQL hostname
        # ... other attributes ...
    }
}

Further Debugging:

If the issue persists:

Check MySQL server and network logs for connection attempts/rejections.
Contact Google Cloud Support for assistance, especially if you suspect a Datastream service issue.

Nikita_G

Thank you for the guidance! I have another question: Can we use the same proxy for multiple DataStream's?

ms4446

Yes, you can use the same Cloud SQL Auth Proxy, for multiple Google Cloud Datastream instances.

Nikita_G

That's Great! Thank you.

Nikita_G

I have multiple instances can i use same Cloud SQL Auth Proxy for that multiple Google Cloud Datastream instances.
eg: I have 3 MySQL instances and i want to create multiple datastream for each instances for that also can use the same Cloud SQL Auth Proxy?

resource "google_compute_instance" "private" {
  project      = var.gcp_project_id
  name         = "datastream-proxy"
  machine_type = var.proxy_machine_type
  zone         = var.gcp_zone_c

  boot_disk {
    initialize_params {
      image = "debian-cloud/debian-11"
    }
  }

  network_interface {
    # network    = data.google_compute_network.network.name
    # subnetwork = data.google_compute_subnetwork.private-1.self_link
    network    = var.vpc_name
    subnetwork = var.subnet_name
    network_ip = google_compute_address.static.address
    access_config {
      // Ephemeral public IP
    }
  }


  metadata_startup_script = <<EOT
#!/bin/sh
apt-get update
sudo apt-get install wget -y
wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O cloud_sql_proxy
chmod +x cloud_sql_proxy
./cloud_sql_proxy -instances=${data.google_sql_database_instance.main_primary.connection_name}=tcp:0.0.0.0:3306
EOT

  service_account {
    scopes = ["cloud-platform"]
  }
}

ms4446

Yes, you can indeed use a single Cloud SQL Auth Proxy to handle connections for multiple Google Cloud Datastream instances, even if these Datastream instances are intended to connect to different MySQL instances hosted on Google Cloud SQL. The Cloud SQL Auth Proxy is designed to manage connections to multiple Cloud SQL instances, making it an efficient solution for scenarios where multiple databases need to be accessible through a centralized, secure channel.

In your Terraform configuration, you can adapt the Cloud SQL Auth Proxy setup to handle connections to multiple Cloud SQL instances by specifying each instance in the startup script of the proxy server. Here’s how you can modify your Terraform script to connect to multiple MySQL instances:

Modify the metadata_startup_script to include multiple Cloud SQL instance connections. You can specify multiple instances in the Cloud SQL Auth Proxy command by separating them with commas. Here's an example of how to modify the script to handle three MySQL instances:

 

metadata_startup_script = <<EOT
#!/bin/sh
apt-get update
sudo apt-get install wget -y
wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O cloud_sql_proxy
chmod +x cloud_sql_proxy
# Assuming three Cloud SQL instances
./cloud_sql_proxy -instances=${data.google_sql_database_instance.instance1.connection_name}=tcp:3306,${data.google_sql_database_instance.instance2.connection_name}=tcp:3307,${data.google_sql_database_instance.instance3.connection_name}=tcp:3308
EOT

In this script:

Each Cloud SQL instance is assigned a different local port (3306, 3307, 3308). This configuration allows Datastream (or any other client) to connect to different databases by targeting different ports on the same proxy server.
Ensure that the database instance data sources (instance1, instance2, instance3) are defined correctly in your Terraform configurations to pull the connection names.

Adjust Network Settings:

Make sure the VM has proper network settings to communicate within your VPC and can reach the Google Cloud SQL instances. The configurations for network and subnetwork should correctly link to your VPC and subnet.
The network_ip should be set if you are assigning a static internal IP address, ensuring it is accessible from where the Datastream instances will be running.

Service Account Permissions:

The service account attached to the Compute Engine instance running the Cloud SQL Auth Proxy should have appropriate permissions (roles/cloudsql.client or broader as necessary) to connect to all specified Cloud SQL instances.

Security Considerations:

Even though you can use public ports (3306, 3307, 3308) as in the example, ensure that only authorized services within your network can access these ports. Use VPC firewall rules to restrict access as needed.

Testing and Validation:

After setting up, test the connectivity from a client within the same network to ensure that it can connect to each MySQL instance via the respective ports specified in the proxy configuration.

Nikita_G

@ms4446

I have configured multiple instances in Datastream and created multiple DataStream's for each one.

In my case, I already have multiple datasets and tables in BigQuery.

Now, I want to replicate newly added data into BigQuery in the same dataset, but I am not able to replicate it. I am facing the following issues:

Is datastream not able to replicate data in same dataset in same table or it is creating new dataset and table for each?

resource "google_datastream_connection_profile" "source_connection_profile1" {
  display_name          = "mysql-bq-source1"
  project               = var.gcp_project_id
  location              = var.gcp_region
  connection_profile_id = "source-profile1"

  mysql_profile {
    hostname = google_compute_instance.private.network_interface.0.network_ip
    username = data.google_secret_manager_secret_version.app_user_secret.secret_data
    password = data.google_secret_manager_secret_version.app_pwd_secret.secret_data
  }

  private_connectivity {
    private_connection = google_datastream_private_connection.private.id
  }

  depends_on = [ 
    google_compute_address.static,
    data.google_secret_manager_secret_version.app_pwd_secret,
    data.google_secret_manager_secret_version.app_user_secret
  ]
}

resource "google_datastream_connection_profile" "destination_connection_profile1" {
  display_name          = "mysql-bq-destination1"
  location              = var.gcp_region
  connection_profile_id = "destination-profile1"
  project               = var.gcp_project_id

  bigquery_profile {}
}

# Create Datastream Stream

resource "google_datastream_stream" "mysql_to_bigquery1" {
  display_name  = "mysql_to_bigquery1"
  location      = var.gcp_region
  project       = var.gcp_project_id
  stream_id     = "mysql_to_bigquery1"
  desired_state = "NOT_STARTED"

  source_config {
    source_connection_profile = google_datastream_connection_profile.source_connection_profile1.id
    mysql_source_config {
      include_objects {
        mysql_databases {
          database = "ss_v2_db"

          mysql_tables {
            table = "transaction_monitors"
          }
        }
      }
    }
  }
  destination_config {
    destination_connection_profile = google_datastream_connection_profile.destination_connection_profile1.id
    bigquery_destination_config {
      source_hierarchy_datasets {
        dataset_template {
          location = var.gcp_region
          # kms_key_name = "bigquery-kms-name"
        }
      }
    }
  }

  backfill_none {
  }
}

ms4446

The error "BIGQUERY_DYNAMIC_DATASET_LOCATION_CONFLICT", "level": "ERROR", "message": "Datastream can't create one or more BigQuery datasets in the specified location, because datasets with the same name already exist in a different location.", "metadata": {"dataset_conflicts":"ss_v2_db (US) "}}],"state": "FAILED" }" is related to a location conflict for BigQuery datasets. According to the error message, it appears that Datastream is attempting to create or write to a dataset in a location that conflicts with existing datasets.

Datastream can't create one or more BigQuery datasets in the specified location, because datasets with the same name already exist in a different location.
The BigQuery dataset you are trying to write to via Datastream is set up in a different geographic location than your Datastream destination configuration specifies.

Here’s how to troubleshoot and potentially resolve this issue:

Check Dataset Location:
- Ensure that the BigQuery dataset you want Datastream to write to is in the same region as specified in your Datastream configuration. You can check the dataset's location in the BigQuery UI or using the bq command-line tool:
  
  bq show --format=prettyjson your-dataset
- Look for the "location" field in the output.
Update Datastream Configuration:
- If the locations mismatch, you have two options:
  1. Change the Datastream Configuration: Modify the location in your dataset_template to match the existing BigQuery dataset's location. Here is the modified section of your Terraform script:
    
    bigquery_destination_config { source_hierarchy_datasets { dataset_template { location = "YOUR_DATASET_LOCATION" # Match this with your BigQuery dataset location } } }
  2. Migrate the BigQuery Dataset: If feasible, you might consider moving your BigQuery dataset to the desired location. However, this can be complex and is not commonly recommended due to the potential for data transfer costs and complexities.
Reapply Terraform Configuration:
- After making necessary changes to your Terraform configuration, apply the changes:
  
  terraform apply
- Monitor the output for any errors to confirm that the issue has been resolved.
Validate and Monitor:
- Once Terraform successfully applies your configuration, monitor the Datastream and BigQuery to ensure data replication starts and operates as expected.
- Check for any delays or issues in the replication process, and verify data integrity in the destination tables.
Logging and Debugging:
- If problems persist, check the Google Cloud Console under Datastream and BigQuery sections for logs and error messages. These can provide more detailed insights into what might be going wrong.

If you continue to encounter difficulties, review the Google Cloud Datastream documentation for any updates or notices regarding limitations or configuration specifics.

- - As a last resort, contacting Google Cloud support can help, especially if the configurations and setup seem correct but errors persist.

Nikita_G

@ms4446
In my implementation there are two cases:
1. I have existing datasets and tables i just want to migrate MySQL data into its respective dataset and table.

2. In second condition i don't have dataset and tables in BQ side for that i need to create that in BQ

my question ....is there any need in 2nd case to give dataset and table name in destination configuration ,because when i start stream my dataset and table both are not created and migrated, can you analyze both configuration and suggests ,what's wrong with my configuration.
is my both destination configuration are correct?
or need to give only us region for BQ ?
1 case terraform :

resource "google_datastream_connection_profile" "source_connection_profile6" {
  display_name          = "mysql-bq-source6"
  project               = var.gcp_project_id
  location              = var.gcp_region
  connection_profile_id = "source-profile6"

  mysql_profile {
    hostname = google_compute_instance.private.network_interface.0.network_ip
    username = data.google_secret_manager_secret_version.app_user_secret.secret_data
    password = data.google_secret_manager_secret_version.app_pwd_secret.secret_data
    port     = 3306

  }

  private_connectivity {
    private_connection = google_datastream_private_connection.private.id
  }

  depends_on = [
    google_compute_address.static,
    data.google_secret_manager_secret_version.app_pwd_secret,
    data.google_secret_manager_secret_version.app_user_secret
  ]
}

resource "google_datastream_connection_profile" "destination_connection_profile6" {
  display_name          = "mysql-bq-destinatio6"
  location              = var.gcp_region
  connection_profile_id = "destination-profile6"
  project               = var.gcp_project_id

  bigquery_profile {}
}

resource "google_datastream_stream" "mysql_to_bigquery6" {
  display_name  = "ss-prd1-v2-rds-redshift"
  location      = var.gcp_region
  project       = var.gcp_project_id
  stream_id     = "mysql_to_bigquery6"
  desired_state = "NOT_STARTED"

  source_config {
    source_connection_profile = google_datastream_connection_profile.source_connection_profile6.id
    mysql_source_config {
      include_objects {
        mysql_databases {
          database = "ss_v2_db"
          dynamic "mysql_tables" {
            for_each = var.ss-prd1-v2-rds-redshift-tables
            content {
              table = mysql_tables.value
            }
          }
        }
      }
    }
  }

  destination_config {
    destination_connection_profile = google_datastream_connection_profile.destination_connection_profile6.id
    bigquery_destination_config {
      data_freshness = "900s"
      single_target_dataset {
        dataset_id = "sample:ss_v2_db"
      }
    }
  }

  backfill_all {
  }

}

2 case:

resource "google_datastream_connection_profile" "source_connection_profile7" {
  display_name          = "mysql-bq-source7"
  project               = var.gcp_project_id
  location              = var.gcp_region
  connection_profile_id = "source-profile7"

  mysql_profile {
    hostname = google_compute_instance.private.network_interface.0.network_ip
    username = data.google_secret_manager_secret_version.engagement_user_secret.secret_data
    password = data.google_secret_manager_secret_version.engagement_pwd_secret.secret_data
    port     = 3308

  }

  private_connectivity {
    private_connection = google_datastream_private_connection.private.id
  }

  depends_on = [
    google_compute_address.static,
    data.google_secret_manager_secret_version.engagement_pwd_secret,
    data.google_secret_manager_secret_version.engagement_user_secret
  ]
}

resource "google_datastream_connection_profile" "destination_connection_profile7" {
  display_name          = "mysql-bq-destinatio7"
  location              = var.gcp_region
  connection_profile_id = "destination-profile7"
  project               = var.gcp_project_id

  bigquery_profile {}
}

resource "google_datastream_stream" "mysql_to_bigquery7" {
  display_name  = "exp-prod-engagement-rds-dms"
  location      = var.gcp_region
  project       = var.gcp_project_id
  stream_id     = "mysql_to_bigquery7"
  desired_state = "NOT_STARTED"

  source_config {
    source_connection_profile = google_datastream_connection_profile.source_connection_profile7.id
    mysql_source_config {
      include_objects {
        mysql_databases {
          database = "ss_v2_engagement_db"
          dynamic "mysql_tables" {
            for_each = var.exp-prod-engagement-rds-dms-tables
            content {
              table = mysql_tables.value
            }
          }
        }
      }
    }
  }

  destination_config {
    destination_connection_profile = google_datastream_connection_profile.destination_connection_profile7.id
    bigquery_destination_config {
      source_hierarchy_datasets {
        dataset_template {
          location = var.gcp_region
        }
      }
    }
  }

  backfill_none {
  }
}

ms4446

Hi @Nikita_G ,

To successfully replicate data from MySQL to BigQuery using Datastream in both scenarios (existing or new BigQuery datasets and tables), it's important to configure Datastream accurately. Here's some recommendations based on your Terraform configurations:

Case 1: Existing Datasets and Tables

When replicating into existing BigQuery resources, ensure precise mapping in Datastream:

Dataset and Table Names: Verify that the dataset_id in your single_target_dataset configuration exactly matches the existing dataset name in BigQuery (including the project ID if applicable). Datastream automatically creates BigQuery tables with the same names as your MySQL tables, so no explicit table configuration is needed here.
Data Freshness and Backfill: Your data_freshness setting (900 seconds) and backfill_all choice are generally sound. Ensure these align with your requirements for data consistency and historical data replication.

Case 1:

 

bigquery_destination_config {
  data_freshness = "900s"
  single_target_dataset {
    dataset_id = "sample:ss_v2_db"  // Potential issue: Should likely be just 'ss_v2_db'
  }
}

Dataset Format Correction: The dataset_id should typically be just the dataset name itself (e.g., "ss_v2_db"). The "sample:" prefix might be a typo, as Google Cloud dataset IDs typically don't use colons except in fully qualified resource identifiers.

Case 2: Creating New Datasets and Tables

To automatically create BigQuery datasets and tables during replication:

Dataset Location: Ensure the location within dataset_template (under source_hierarchy_datasets) matches your desired BigQuery region. Inconsistent locations will lead to errors.
Automatic Resource Creation: Your use of source_hierarchy_datasets is correct for this purpose. Datastream will mirror the MySQL schema hierarchy in BigQuery, creating datasets and tables as needed.

Case 2:

 

bigquery_destination_config {
  source_hierarchy_datasets {
    dataset_template {
      location = var.gcp_region  // Ensure this is a valid BigQuery region
    }
  }
}

Region Validation: Double-check that var.gcp_region resolves to a valid BigQuery location (e.g., "US", "EU").

Recommendations:

Before deploying, use the "Validate" button in the Datastream console for both source and destination connection profiles. This helps catch configuration errors early.
Ensure the Datastream service account has the following roles in the destination project:
- roles/bigquery.dataEditor
- roles/bigquery.datasetAdmin (if creating new datasets)
- roles/bigquery.jobUser
Actively monitor Datastream logs and metrics in the Google Cloud Console to diagnose any replication issues that may arise.

Snoshone07

Hi @Nikita_G It looks like the issue is related to connectivity between Google DataStream and your Cloud SQL Private MySQL instance. Since you're using a private instance, here are a few things you might want to check to resolve the problem:

1️⃣ Check VPC and Subnet Configuration

🔹 Make sure that Cloud SQL and DataStream are on the same VPC or that there’s a proper connection established between them.

2️⃣ Review IAM Permissions

🔹 Google DataStream requires the right permissions to access Cloud SQL. Double-check that the service account has the necessary access.

3️⃣ Enable Private Access

🔹 Since the IP 192.168.58.84 appears to be a private address, ensure that you've properly configured a Cloud SQL Private IP and that DataStream has access to it.

4️⃣ Check Firewall Rules

🔹 Verify that the required ports are open in your VPC firewall rules. For MySQL, the default port is 3306.

5️⃣ Test Connectivity from a VM

🔹 Try connecting to the Cloud SQL instance from a Compute Engine VM within the same network. This can help identify if the issue is related to network connectivity.

Looking for an Easier Alternative?

If you need to move data from Cloud SQL to BigQuery or other platforms without dealing with complex Google DataStream configurations, you might want to explore Windsor.ai. It simplifies data integration and automation without requiring advanced network infrastructure management.

Hope this helps!

Datastream to Mysql SQL private IP not working over private connectivity