This is a repost of something I wrote on Stack Overflow a few days ago, but without any takers. Guess there's not much of a thriving CDAP/Data Fusion community there. 😕
I'm trying to read data from MongoDB, wrangle it a bit, and send it to a BigQuery sink in order to analyze it.
It works well for single collections, using the MongoDB source and BigQuery sink. However, I want to be able to send multiple collections in the pipeline.
To that end, I've tried looking into using the BigQuery Multi Table sink. However, it doesn't seem to want to accept inputs of different schemas, which make it hard for my to use multiple MongoDB sources as the resulting output schemas for the different collections (and thus input schemas for BigQuery tables) have different formats.
I see that most other people who use the BigQuery Multi Table sink use the Multiple Database Tables source. I've tried that, but this presents another problem: it uses JDBC, and there's no JDBC driver for MongoDB on the hub. I've tried finding one and uploading it, but as I'm not sure what the entrypoint/main class for the driver is and if it would even work (since the plugin is supposed to be working with relational databases after all) I'm kind of stabbing in the dark here.
Anyone who has successfully consumed multiple MongoDB collections and written them at once to BigQuery using Cloud Data Fusion?
Kind regards,
Jacob