Solved: BigQuery -> Firestore

charles_kubicek · 09-29-2023 01:31 AM

Hi, a number of our teams want to export some data being ingested and processed in BigQuery to Firestore, AKA reverse ELT.

I have seen lots about going from Firestore to BigQuery, but not vice versa. Why is that? What are we missing?

I think the results of BQ Query executed periodically by Dataflow can write each result row into a new FS document, given a column for the docId, but I don't know enough about handling a TableRow -> FSDocument transform with arbitrary nested fields in Beam/Java.

Has anyone done this before?

Thanks, Charles

ms4446

BigQuery is a data warehouse, while Firestore is a NoSQL database. Data warehouses are typically used for analytical workloads, while NoSQL databases are typically used for operational workloads. This means that it is more common to export data from a NoSQL database to a data warehouse for analysis than it is to export data from a data warehouse to a NoSQL database for operational use.

However, there are cases where you might want to export data from BigQuery to Firestore. For example:

To use Firestore to store a real-time view of a subset of the data in BigQuery.
To use Firestore to store data that needs to be highly available and responsive.

Steps to Export Data from BigQuery to Firestore:

Create a Dataflow pipeline.
Read the data from BigQuery.
Transform the BigQuery data into Firestore documents.
Write the Firestore documents to Firestore.

Transform BigQuery Data:

Use the Apache Beam Java SDK's ParDo transform to apply a function to each element of a PCollection. This function will convert each BigQuery row into a Firestore document.

To handle nested fields, you can write a custom transformation function that processes each field and subfield to construct the Firestore document.

Considerations:

Schema Mapping: Ensure that the schema of your BigQuery data matches the expected schema in Firestore. Handle any necessary type conversions.
Document IDs: Decide how you will determine document IDs in Firestore. You can use a column from BigQuery as the document ID, or generate new IDs.
Error Handling: Implement robust error handling to manage issues that arise during the transfer.

Example (Conceptual in Python):

import apache_beam as beam

class BigQueryToFirestore(beam.DoFn):
    def process(self, element):
        # Convert the BigQuery row into a Firestore document.
        firestore_document = {}
        for field in element.schema:
            value = element.get(field.name)
            # Handle nested fields here
            firestore_document[field.name] = value

        yield firestore_document

with beam.Pipeline() as pipeline:
    (
        pipeline
        | 'Read from BigQuery' >> beam.io.ReadFromBigQuery(table='my_bigquery_table')
        | 'Transform to Firestore document' >> beam.ParDo(BigQueryToFirestore())
        # Custom transform to write to Firestore
        | 'Write to Firestore' >> beam.ParDo(CustomWriteToFirestore(collection='my_firestore_collection'))
    )

pipeline.run()

Note:

CustomWriteToFirestore is a placeholder for a custom transform that writes to Firestore. You would need to implement this transform.
The process method in BigQueryToFirestore should be expanded to handle nested fields and other complexities.
Ensure to test thoroughly and handle all edge cases and errors in the actual implementation.

View solution in original post

ms4446