Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Replicate data from MySQL to BQ using DataStream

Hello,

Previously, I successfully conducted a POC using DataStream to replicate data from MySQL to BigQuery (BQ) using Cloud Auth Proxy. In my POC project, I utilized Terraform to create all necessary resources, including the MySQL instance, VPC, DataStream connection profile, DataStream private connection, Cloud Auth Proxy, and other required components. This setup worked well, allowing me to establish a connection profile for DataStream successfully.

However, in a subsequent attempt to replicate this approach using an existing VPC network and SQL instance, I encountered same issue . Although I integrated the required parts using Terraform and ensured that the Cloud Auth Proxy successfully connects to the MySQL database, I consistently face an issue during the creation of the DataStream source connection profile. The error repeatedly indicates that the MySQL server is unable to establish a connection. I don't able to understand why it is like this?

connected to mysql using proxy.png

If you have any insights or suggestions on resolving this connectivity issue, I would greatly appreciate your guidance.

Solved Solved
0 1 594
1 ACCEPTED SOLUTION

Given the connectivity issue you're encountering with DataStream while attempting to create the source connection profile, here are several troubleshooting steps and considerations to help resolve the problem:

1. Verify Network Configuration

  • VPC Peering: Ensure that your existing VPC network has a proper VPC peering connection established with the Datastream VPC. This peering is essential for private connectivity between the Datastream service and your MySQL instance.
  • Firewall Rules: Verify that your firewall rules in both VPCs (Datastream and your existing VPC) allow the necessary traffic. This includes:
    • Ingress on MySQL Instance: Allow TCP traffic on port 3306 (or your MySQL port) from the Datastream IP range.
    • Egress from Datastream VPC: Allow traffic to your MySQL instance's private IP on port 3306.
  • Cloud Auth Proxy: Confirm that the Cloud Auth Proxy is running correctly and listening on the appropriate port (typically 3306 for MySQL). Test connectivity to your MySQL instance through the proxy using a tool like mysql from the proxy VM:
 
mysql -u <your_user> -p -h 127.0.0.1 --port <proxy_port>
  1. Review Connection Profile:
  • Private IP: In the Datastream connection profile, ensure you're using the private IP address of your MySQL instance.
  • Authentication: Double-check the database username and password in the connection profile. Ensure the user has the necessary privileges to access the databases you want to replicate.
  • Connection Name: If your MySQL instance uses a Unix socket file for connections, make sure the Connection Name in the Datastream connection profile matches the socket file path. However, typically for network-based connections, the private IP and port are more relevant.
  1. Advanced Troubleshooting:
  • Cloud Logging: Examine Datastream and Cloud SQL logs for any error messages that might shed light on the connection issue.
  • Connectivity Test: If you're still having trouble, try connecting to your MySQL instance from a VM within the Datastream VPC (using the private IP). This can help isolate network issues.
  • Terraform Configuration: If you're using Terraform, carefully review your configuration for any misconfigurations related to VPC peering, firewall rules, or the Datastream connection profile.

Example (Terraform):

Here's a snippet to ensure your VPC peering and firewall rules are set up correctly in Terraform:

 
resource "google_compute_network_peering" "datastream_peering" {
  name         = "datastream-to-my-vpc-peering"
  network      = "my-existing-vpc"  # Replace with your VPC name
  peer_network = "datastream-vpc-network"
}

resource "google_compute_firewall" "allow_mysql" {
  name    = "allow-mysql-traffic"
  network = "my-existing-vpc"

  allow {
    protocol = "tcp"
    ports    = ["3306"]
  }

  source_ranges = ["<datastream-vpc-cidr-range>"]  # Replace with actual range
}

 

View solution in original post

1 REPLY 1

Given the connectivity issue you're encountering with DataStream while attempting to create the source connection profile, here are several troubleshooting steps and considerations to help resolve the problem:

1. Verify Network Configuration

  • VPC Peering: Ensure that your existing VPC network has a proper VPC peering connection established with the Datastream VPC. This peering is essential for private connectivity between the Datastream service and your MySQL instance.
  • Firewall Rules: Verify that your firewall rules in both VPCs (Datastream and your existing VPC) allow the necessary traffic. This includes:
    • Ingress on MySQL Instance: Allow TCP traffic on port 3306 (or your MySQL port) from the Datastream IP range.
    • Egress from Datastream VPC: Allow traffic to your MySQL instance's private IP on port 3306.
  • Cloud Auth Proxy: Confirm that the Cloud Auth Proxy is running correctly and listening on the appropriate port (typically 3306 for MySQL). Test connectivity to your MySQL instance through the proxy using a tool like mysql from the proxy VM:
 
mysql -u <your_user> -p -h 127.0.0.1 --port <proxy_port>
  1. Review Connection Profile:
  • Private IP: In the Datastream connection profile, ensure you're using the private IP address of your MySQL instance.
  • Authentication: Double-check the database username and password in the connection profile. Ensure the user has the necessary privileges to access the databases you want to replicate.
  • Connection Name: If your MySQL instance uses a Unix socket file for connections, make sure the Connection Name in the Datastream connection profile matches the socket file path. However, typically for network-based connections, the private IP and port are more relevant.
  1. Advanced Troubleshooting:
  • Cloud Logging: Examine Datastream and Cloud SQL logs for any error messages that might shed light on the connection issue.
  • Connectivity Test: If you're still having trouble, try connecting to your MySQL instance from a VM within the Datastream VPC (using the private IP). This can help isolate network issues.
  • Terraform Configuration: If you're using Terraform, carefully review your configuration for any misconfigurations related to VPC peering, firewall rules, or the Datastream connection profile.

Example (Terraform):

Here's a snippet to ensure your VPC peering and firewall rules are set up correctly in Terraform:

 
resource "google_compute_network_peering" "datastream_peering" {
  name         = "datastream-to-my-vpc-peering"
  network      = "my-existing-vpc"  # Replace with your VPC name
  peer_network = "datastream-vpc-network"
}

resource "google_compute_firewall" "allow_mysql" {
  name    = "allow-mysql-traffic"
  network = "my-existing-vpc"

  allow {
    protocol = "tcp"
    ports    = ["3306"]
  }

  source_ranges = ["<datastream-vpc-cidr-range>"]  # Replace with actual range
}