I need some help with a GCP based architecture that I am trying to design.
Here is the scenario: Some point-of-sale devices are writing to a transactional database. Suppose I want to write this sale data to a cloud data warehouse (say bigquery for OLAP). Can I use a database migration service (DMS) with CDC enabled and use this data coming into an object storage (like GCS bucket) to populate the data warehouse instead of a pub/sub + dataflow system?
It is transaction data which means I am tracking the sales at the point of sale device. In an ideal case,
I agree that this would be very scalable and flexible in terms of data transformations.
But if the sale is not too much can I just link DMS to the transaction database storing this sales-related data and replicate the changes in an object store every time there is a transaction?
My thought process is that DMS-based architecture might be much easier to implement than the pub/sub architecture and much more economical.
Is this a reasonable architecture? Please share your thoughts and correct me if I am getting anything wrong.
If it is right, what would be some bottlenecks for this architecture?
Hi @ameyabhave96,
Welcome to Google Cloud Community!
Compared to configuring Pub/Sub and Dataflow jobs, setting up DMS for CDC and writing to Google Cloud Storage bucket can be more straightforward. If your transaction is genuinely low and infrequent, the cost of DMS and Google Cloud Storage might be less than constantly running a Dataflow pipeline.
However, the Google Cloud Storage bucket is primarily an object store and is not designed for the efficient incremental updates that CDC typically requires. You'd likely end up rewriting entire files, leading to inefficiencies as your data grows.
Yes, the DMS might provide some basic transformation options, but it’s not as powerful as Dataflow for complex data manipulation if you need it in the future. This approach might seem tempting for simplicity, but if you anticipate future growth in transaction volume or need more advanced transformations, Dataflow remains a robust choice.
At the end of the day, choose the solution that best balances ease of implementation with your long term scalability and data processing requirements.
I hope the above information is helpful.