Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Converting GA4 events in BigQuery into structured Tables

Is there a way to convert(extract) the events_ into tables by using Dataflow Jobs or any other methods in GCP itself instead of using Azure Data Factory?

If it is possible can you please provide the steps regarding it. 

Thank you

Solved Solved
1 2 293
1 ACCEPTED SOLUTION

Hello,

Thank you for contacting Google Cloud Community!

You can definitely convert (extract) events_ data into tables in Google Cloud Platform (GCP) without resorting to Azure Data Factory. Here are two effective methods using Dataflow Jobs:

Method 1: Cloud Dataflow with Apache Beam:

  • Define a Cloud Storage source in your Dataflow job configuration. This source will point to the location of your events_ data files (e.g., JSON, CSV).
  • Create an Apache Beam pipeline within your Dataflow job.
  • Use io.ReadFromText (for text files) or io.ReadFromAvro (for Avro files) to read the events_ data from Cloud Storage.
  • You can use libraries like pandas or custom logic within your Beam pipeline to parse and manipulate the data as needed.
  • Use io.WriteToBigQuery to write the processed data to a BigQuery table you define. Define schema information for the BigQuery table to ensure appropriate data types and formatting.


Method 2: Cloud Dataflow with Pub/Sub:

  • If your events_ data is already streamed to a Pub/Sub topic, configure your Dataflow job to use the Pub/Sub topic as the source.
  • Similar to Method 1, create a Beam pipeline within your Dataflow job.
  • Use io.ReadFromPubSub to read the events_ data from the Pub/Sub topic.
  • Follow steps 4 and 5 from Method 1 to parse the data and write it to a BigQuery table.

Regards,
Jai Ade

View solution in original post

2 REPLIES 2

Hello,

Thank you for contacting Google Cloud Community!

You can definitely convert (extract) events_ data into tables in Google Cloud Platform (GCP) without resorting to Azure Data Factory. Here are two effective methods using Dataflow Jobs:

Method 1: Cloud Dataflow with Apache Beam:

  • Define a Cloud Storage source in your Dataflow job configuration. This source will point to the location of your events_ data files (e.g., JSON, CSV).
  • Create an Apache Beam pipeline within your Dataflow job.
  • Use io.ReadFromText (for text files) or io.ReadFromAvro (for Avro files) to read the events_ data from Cloud Storage.
  • You can use libraries like pandas or custom logic within your Beam pipeline to parse and manipulate the data as needed.
  • Use io.WriteToBigQuery to write the processed data to a BigQuery table you define. Define schema information for the BigQuery table to ensure appropriate data types and formatting.


Method 2: Cloud Dataflow with Pub/Sub:

  • If your events_ data is already streamed to a Pub/Sub topic, configure your Dataflow job to use the Pub/Sub topic as the source.
  • Similar to Method 1, create a Beam pipeline within your Dataflow job.
  • Use io.ReadFromPubSub to read the events_ data from the Pub/Sub topic.
  • Follow steps 4 and 5 from Method 1 to parse the data and write it to a BigQuery table.

Regards,
Jai Ade

Thank you for the solution but I am not familiar with Apache Beam