Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Debezium for MySQL

I want to configure debezium for capturing the data from MySQL and then Sending it to Pub/Sub and then I want to configure Cloud Function for Event Trigger on Pub/Sub and writes the changes to BigQuery. 
Main goal for this is I want to replicate the data in shorter Timestamp ( max 5 sec )
Is Debezium useful to get shorter Timestamp. 
MySQL ( Private ) ==> Debezium ==> Pub/Sub ==> Cloud Function ( Event Trigger on Pub/Sub) ==> BigQuery 

As my MySQL Instance is private I have configured VM Instance for Cloud SQL Auth proxy. 

Can you please help me with the detailed steps  for configuring the debezium for MySQL ? also the configuration file for debezium ( appplication.properties )

I was refering this documentation https://medium.com/nerd-for-tech/debezium-server-to-cloud-pubsub-a-kafka-less-way-to-stream-changes-...
But I was getting error for server name which I have passed Cloud SQL MySQL Instance name. But getting the error as resource not found 

1 3 988
3 REPLIES 3

To set up Debezium for capturing data from MySQL and sending it to Pub/Sub, followed by using Cloud Functions to write the changes to BigQuery, follow these detailed steps:

1. Set Up the Environment

  • MySQL Instance: Ensure you have a private MySQL instance on Cloud SQL.
  • VM Instance: Create a VM instance on Google Compute Engine. This will host the Cloud SQL Auth proxy and Debezium (preferably in a Docker container).
  • Cloud SQL Auth Proxy: Configure the Cloud SQL Auth proxy on the VM instance to securely connect to your private MySQL database. Refer to the official documentation for detailed instructions: https://cloud.google.com/sql/docs/mysql/connect-auth-proxy
  • Pub/Sub Topic: Create a Pub/Sub topic to which Debezium will send change events.
  • BigQuery Dataset and Table: Prepare a BigQuery dataset and table to store the replicated data.

2. Install and Configure Debezium

  • Install Docker (if not already present): Follow the instructions for your operating system to install Docker.

  • Run Debezium (using Docker):

     
    docker run -it --name debezium \
        -e DEBEZIUM_SOURCE_CONNECTOR_CLASS=io.debezium.connector.mysql.MySqlConnector \
        -e DEBEZIUM_SOURCE_DATABASE_HOSTNAME=127.0.0.1 \
        -e DEBEZIUM_SOURCE_DATABASE_PORT=3307 \
        -e DEBEZIUM_SOURCE_DATABASE_USER=[your_db_user] \
        -e DEBEZIUM_SOURCE_DATABASE_PASSWORD=[your_db_password] \
        -e DEBEZIUM_SOURCE_DATABASE_DBNAME=[your_database_name] \
        -e DEBEZIUM_SOURCE_DATABASE_SERVER_ID=[unique_server_id] \
        -e DEBEZIUM_SOURCE_DATABASE_SERVER_NAME=[logical_server_name] \
        -e DEBEZIUM_SOURCE_TABLE_INCLUDE_LIST=[your_db].[your_table] \
        -e DEBEZIUM_SINK_TYPE=pubsub \
        -e DEBEZIUM_SINK_PUBSUB_PROJECT_ID=[your_gcp_project_id] \
        -e DEBEZIUM_SINK_PUBSUB_TOPIC_ID=[your_topic_name] \
        -e DEBEZIUM_FORMAT_VALUE=json \
        debezium/server:latest
    

3. Set Up Cloud Function

  • Create a Cloud Function triggered by the Pub/Sub topic you created.
  • Write the Cloud Function code (Python example):
import base64
import json
from google.cloud import bigquery

def pubsub_to_bigquery(event, context):
    """Triggered by a Pub/Sub message."""
    try:
        client = bigquery.Client()
        table_id = "your_project.your_dataset.your_table"

        # Decode and parse the Pub/Sub message
        data = base64.b64decode(event['data']).decode('utf-8')
        row = json.loads(data)

        # Insert the row into BigQuery
        errors = client.insert_rows_json(table_id, [row])

        if errors:
            raise Exception(f"Errors inserting row into BigQuery: {errors}")

        # (Optional) Acknowledge the message if successful
        # context.ack()

    except Exception as e:
        print(f"Error processing Pub/Sub message: {e}")
        # (Optional) Handle errors (e.g., retry, log to Error Reporting, etc.)

4. Testing and Monitoring

  • Test: Make changes to your MySQL database (insert, update, delete rows) and verify the changes are replicated to BigQuery in near real-time.
  • Monitor:
    • Use the Cloud Logging to check Debezium logs for errors or warnings.
    • Monitor Pub/Sub message throughput and latency.
    • Monitor Cloud Function execution logs for errors.
    • Monitor BigQuery streaming insert statistics.

Important Considerations:

  • Latency: Achieving a consistent 5-second latency might require careful tuning of Debezium, Pub/Sub, and Cloud Function configurations, as well as optimizing your MySQL instance.

  • Schema Evolution: Plan for how you'll handle schema changes in your MySQL database and ensure the Cloud Function can adapt to them.

  • Error Handling: Implement robust error handling in your Cloud Function to deal with scenarios like BigQuery insertion failures, invalid message formats, etc.

I have successfully implemented Debezium Server to capture changes from MySQL and send them to Pub/Sub.

As I have multiple tables, I want to capture changes from multiple tables using Debezium Server (i.e., capturing changes from 2 or more tables in the same database) and send messages to a single Pub/Sub topic.

I referred to the documentation here, which provides configuration for singleTopicMode and multipleTopicMode.

Additionally, I am considering the following approaches:

  1. Creating multiple configuration files (application.properties) for Debezium, each for different tables.
  2. Setting up different Debezium Servers for each table.

Could you please provide guidance on the best practices for capturing changes from multiple tables and sending them to a single Pub/Sub topic or Multiple Pub/Sub topic or Multiple Config File/ Servers? Also, which approach would be more efficient and scalable?

As I have Installed this Debezium-server on VM , do we need to create separate VM for each Debezium Server?

Thank you!




Hi, You should be able to get data from multiple table. Look up the configurations for getting data from one than one table.