I want to configure debezium for capturing the data from MySQL and then Sending it to Pub/Sub and then I want to configure Cloud Function for Event Trigger on Pub/Sub and writes the changes to BigQuery.
Main goal for this is I want to replicate the data in shorter Timestamp ( max 5 sec )
Is Debezium useful to get shorter Timestamp.
MySQL ( Private ) ==> Debezium ==> Pub/Sub ==> Cloud Function ( Event Trigger on Pub/Sub) ==> BigQuery
As my MySQL Instance is private I have configured VM Instance for Cloud SQL Auth proxy.
Can you please help me with the detailed steps for configuring the debezium for MySQL ? also the configuration file for debezium ( appplication.properties )
I was refering this documentation https://medium.com/nerd-for-tech/debezium-server-to-cloud-pubsub-a-kafka-less-way-to-stream-changes-...
But I was getting error for server name which I have passed Cloud SQL MySQL Instance name. But getting the error as resource not found
To set up Debezium for capturing data from MySQL and sending it to Pub/Sub, followed by using Cloud Functions to write the changes to BigQuery, follow these detailed steps:
1. Set Up the Environment
2. Install and Configure Debezium
Install Docker (if not already present): Follow the instructions for your operating system to install Docker.
Run Debezium (using Docker):
docker run -it --name debezium \
-e DEBEZIUM_SOURCE_CONNECTOR_CLASS=io.debezium.connector.mysql.MySqlConnector \
-e DEBEZIUM_SOURCE_DATABASE_HOSTNAME=127.0.0.1 \
-e DEBEZIUM_SOURCE_DATABASE_PORT=3307 \
-e DEBEZIUM_SOURCE_DATABASE_USER=[your_db_user] \
-e DEBEZIUM_SOURCE_DATABASE_PASSWORD=[your_db_password] \
-e DEBEZIUM_SOURCE_DATABASE_DBNAME=[your_database_name] \
-e DEBEZIUM_SOURCE_DATABASE_SERVER_ID=[unique_server_id] \
-e DEBEZIUM_SOURCE_DATABASE_SERVER_NAME=[logical_server_name] \
-e DEBEZIUM_SOURCE_TABLE_INCLUDE_LIST=[your_db].[your_table] \
-e DEBEZIUM_SINK_TYPE=pubsub \
-e DEBEZIUM_SINK_PUBSUB_PROJECT_ID=[your_gcp_project_id] \
-e DEBEZIUM_SINK_PUBSUB_TOPIC_ID=[your_topic_name] \
-e DEBEZIUM_FORMAT_VALUE=json \
debezium/server:latest
3. Set Up Cloud Function
import base64
import json
from google.cloud import bigquery
def pubsub_to_bigquery(event, context):
"""Triggered by a Pub/Sub message."""
try:
client = bigquery.Client()
table_id = "your_project.your_dataset.your_table"
# Decode and parse the Pub/Sub message
data = base64.b64decode(event['data']).decode('utf-8')
row = json.loads(data)
# Insert the row into BigQuery
errors = client.insert_rows_json(table_id, [row])
if errors:
raise Exception(f"Errors inserting row into BigQuery: {errors}")
# (Optional) Acknowledge the message if successful
# context.ack()
except Exception as e:
print(f"Error processing Pub/Sub message: {e}")
# (Optional) Handle errors (e.g., retry, log to Error Reporting, etc.)
4. Testing and Monitoring
Important Considerations:
Latency: Achieving a consistent 5-second latency might require careful tuning of Debezium, Pub/Sub, and Cloud Function configurations, as well as optimizing your MySQL instance.
Schema Evolution: Plan for how you'll handle schema changes in your MySQL database and ensure the Cloud Function can adapt to them.
Error Handling: Implement robust error handling in your Cloud Function to deal with scenarios like BigQuery insertion failures, invalid message formats, etc.
I have successfully implemented Debezium Server to capture changes from MySQL and send them to Pub/Sub.
As I have multiple tables, I want to capture changes from multiple tables using Debezium Server (i.e., capturing changes from 2 or more tables in the same database) and send messages to a single Pub/Sub topic.
I referred to the documentation here, which provides configuration for singleTopicMode and multipleTopicMode.
Additionally, I am considering the following approaches:
Could you please provide guidance on the best practices for capturing changes from multiple tables and sending them to a single Pub/Sub topic or Multiple Pub/Sub topic or Multiple Config File/ Servers? Also, which approach would be more efficient and scalable?
As I have Installed this Debezium-server on VM , do we need to create separate VM for each Debezium Server?
Thank you!
Hi, You should be able to get data from multiple table. Look up the configurations for getting data from one than one table.