Hi, a number of our teams want to export some data being ingested and processed in BigQuery to Firestore, AKA reverse ELT.
I have seen lots about going from Firestore to BigQuery, but not vice versa. Why is that? What are we missing?
I think the results of BQ Query executed periodically by Dataflow can write each result row into a new FS document, given a column for the docId, but I don't know enough about handling a TableRow -> FSDocument transform with arbitrary nested fields in Beam/Java.
Has anyone done this before?
Thanks, Charles
Solved! Go to Solution.
BigQuery is a data warehouse, while Firestore is a NoSQL database. Data warehouses are typically used for analytical workloads, while NoSQL databases are typically used for operational workloads. This means that it is more common to export data from a NoSQL database to a data warehouse for analysis than it is to export data from a data warehouse to a NoSQL database for operational use.
However, there are cases where you might want to export data from BigQuery to Firestore. For example:
Steps to Export Data from BigQuery to Firestore:
Transform BigQuery Data:
Use the Apache Beam Java SDK's ParDo
transform to apply a function to each element of a PCollection. This function will convert each BigQuery row into a Firestore document.
To handle nested fields, you can write a custom transformation function that processes each field and subfield to construct the Firestore document.
Considerations:
Example (Conceptual in Python):
import apache_beam as beam
class BigQueryToFirestore(beam.DoFn):
def process(self, element):
# Convert the BigQuery row into a Firestore document.
firestore_document = {}
for field in element.schema:
value = element.get(field.name)
# Handle nested fields here
firestore_document[field.name] = value
yield firestore_document
with beam.Pipeline() as pipeline:
(
pipeline
| 'Read from BigQuery' >> beam.io.ReadFromBigQuery(table='my_bigquery_table')
| 'Transform to Firestore document' >> beam.ParDo(BigQueryToFirestore())
# Custom transform to write to Firestore
| 'Write to Firestore' >> beam.ParDo(CustomWriteToFirestore(collection='my_firestore_collection'))
)
pipeline.run()
Note:
CustomWriteToFirestore
is a placeholder for a custom transform that writes to Firestore. You would need to implement this transform.process
method in BigQueryToFirestore
should be expanded to handle nested fields and other complexities.BigQuery is a data warehouse, while Firestore is a NoSQL database. Data warehouses are typically used for analytical workloads, while NoSQL databases are typically used for operational workloads. This means that it is more common to export data from a NoSQL database to a data warehouse for analysis than it is to export data from a data warehouse to a NoSQL database for operational use.
However, there are cases where you might want to export data from BigQuery to Firestore. For example:
Steps to Export Data from BigQuery to Firestore:
Transform BigQuery Data:
Use the Apache Beam Java SDK's ParDo
transform to apply a function to each element of a PCollection. This function will convert each BigQuery row into a Firestore document.
To handle nested fields, you can write a custom transformation function that processes each field and subfield to construct the Firestore document.
Considerations:
Example (Conceptual in Python):
import apache_beam as beam
class BigQueryToFirestore(beam.DoFn):
def process(self, element):
# Convert the BigQuery row into a Firestore document.
firestore_document = {}
for field in element.schema:
value = element.get(field.name)
# Handle nested fields here
firestore_document[field.name] = value
yield firestore_document
with beam.Pipeline() as pipeline:
(
pipeline
| 'Read from BigQuery' >> beam.io.ReadFromBigQuery(table='my_bigquery_table')
| 'Transform to Firestore document' >> beam.ParDo(BigQueryToFirestore())
# Custom transform to write to Firestore
| 'Write to Firestore' >> beam.ParDo(CustomWriteToFirestore(collection='my_firestore_collection'))
)
pipeline.run()
Note:
CustomWriteToFirestore
is a placeholder for a custom transform that writes to Firestore. You would need to implement this transform.process
method in BigQueryToFirestore
should be expanded to handle nested fields and other complexities.