Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

BigQuery -> Firestore

Hi, a number of our teams want to export some data being ingested and processed in BigQuery to Firestore, AKA reverse ELT.

I have seen lots about going from Firestore to BigQuery, but not vice versa. Why is that? What are we missing?

I think the results of BQ Query executed periodically by Dataflow can write each result row into a new FS document, given a column for the docId, but I don't know enough about handling a TableRow -> FSDocument transform with arbitrary nested fields in Beam/Java.

Has anyone done this before?

Thanks, Charles

 

Solved Solved
0 1 3,799
1 ACCEPTED SOLUTION

BigQuery is a data warehouse, while Firestore is a NoSQL database. Data warehouses are typically used for analytical workloads, while NoSQL databases are typically used for operational workloads. This means that it is more common to export data from a NoSQL database to a data warehouse for analysis than it is to export data from a data warehouse to a NoSQL database for operational use.

However, there are cases where you might want to export data from BigQuery to Firestore. For example:

  • To use Firestore to store a real-time view of a subset of the data in BigQuery.
  • To use Firestore to store data that needs to be highly available and responsive.

Steps to Export Data from BigQuery to Firestore:

  1. Create a Dataflow pipeline.
  2. Read the data from BigQuery.
  3. Transform the BigQuery data into Firestore documents.
  4. Write the Firestore documents to Firestore.

Transform BigQuery Data:

Use the Apache Beam Java SDK's ParDo transform to apply a function to each element of a PCollection. This function will convert each BigQuery row into a Firestore document.

To handle nested fields, you can write a custom transformation function that processes each field and subfield to construct the Firestore document.

Considerations:

  • Schema Mapping: Ensure that the schema of your BigQuery data matches the expected schema in Firestore. Handle any necessary type conversions.
  • Document IDs: Decide how you will determine document IDs in Firestore. You can use a column from BigQuery as the document ID, or generate new IDs.
  • Error Handling: Implement robust error handling to manage issues that arise during the transfer.

Example (Conceptual in Python):

 

import apache_beam as beam

class BigQueryToFirestore(beam.DoFn):
    def process(self, element):
        # Convert the BigQuery row into a Firestore document.
        firestore_document = {}
        for field in element.schema:
            value = element.get(field.name)
            # Handle nested fields here
            firestore_document[field.name] = value

        yield firestore_document

with beam.Pipeline() as pipeline:
    (
        pipeline
        | 'Read from BigQuery' >> beam.io.ReadFromBigQuery(table='my_bigquery_table')
        | 'Transform to Firestore document' >> beam.ParDo(BigQueryToFirestore())
        # Custom transform to write to Firestore
        | 'Write to Firestore' >> beam.ParDo(CustomWriteToFirestore(collection='my_firestore_collection'))
    )

pipeline.run()

Note:

  • CustomWriteToFirestore is a placeholder for a custom transform that writes to Firestore. You would need to implement this transform.
  • The process method in BigQueryToFirestore should be expanded to handle nested fields and other complexities.
  • Ensure to test thoroughly and handle all edge cases and errors in the actual implementation.

View solution in original post

1 REPLY 1

BigQuery is a data warehouse, while Firestore is a NoSQL database. Data warehouses are typically used for analytical workloads, while NoSQL databases are typically used for operational workloads. This means that it is more common to export data from a NoSQL database to a data warehouse for analysis than it is to export data from a data warehouse to a NoSQL database for operational use.

However, there are cases where you might want to export data from BigQuery to Firestore. For example:

  • To use Firestore to store a real-time view of a subset of the data in BigQuery.
  • To use Firestore to store data that needs to be highly available and responsive.

Steps to Export Data from BigQuery to Firestore:

  1. Create a Dataflow pipeline.
  2. Read the data from BigQuery.
  3. Transform the BigQuery data into Firestore documents.
  4. Write the Firestore documents to Firestore.

Transform BigQuery Data:

Use the Apache Beam Java SDK's ParDo transform to apply a function to each element of a PCollection. This function will convert each BigQuery row into a Firestore document.

To handle nested fields, you can write a custom transformation function that processes each field and subfield to construct the Firestore document.

Considerations:

  • Schema Mapping: Ensure that the schema of your BigQuery data matches the expected schema in Firestore. Handle any necessary type conversions.
  • Document IDs: Decide how you will determine document IDs in Firestore. You can use a column from BigQuery as the document ID, or generate new IDs.
  • Error Handling: Implement robust error handling to manage issues that arise during the transfer.

Example (Conceptual in Python):

 

import apache_beam as beam

class BigQueryToFirestore(beam.DoFn):
    def process(self, element):
        # Convert the BigQuery row into a Firestore document.
        firestore_document = {}
        for field in element.schema:
            value = element.get(field.name)
            # Handle nested fields here
            firestore_document[field.name] = value

        yield firestore_document

with beam.Pipeline() as pipeline:
    (
        pipeline
        | 'Read from BigQuery' >> beam.io.ReadFromBigQuery(table='my_bigquery_table')
        | 'Transform to Firestore document' >> beam.ParDo(BigQueryToFirestore())
        # Custom transform to write to Firestore
        | 'Write to Firestore' >> beam.ParDo(CustomWriteToFirestore(collection='my_firestore_collection'))
    )

pipeline.run()

Note:

  • CustomWriteToFirestore is a placeholder for a custom transform that writes to Firestore. You would need to implement this transform.
  • The process method in BigQueryToFirestore should be expanded to handle nested fields and other complexities.
  • Ensure to test thoroughly and handle all edge cases and errors in the actual implementation.