Solved: Re: Nested Pubsub message to Bigquery

itsm3_vishnu · 01-02-2024 11:09 AM

Is it possible to flatten the nested pubsub payload and insert to bigquery?

Eg payload :

{"data": [
{
"field1": "some_value1",
"field2": "some_value2",
"details": {
"field3": "some_value3",
"field4": "some_value4"
}
}
]
}

I want the Bigquery columns to be field1,field2,field3,field4 etc

kolban

If the message being written to the PubSub topic was nested ... for example:

{
  "A": "Hello",
  "B": {
    "X": "Val1",
    "Y": "Val2"
  },
  "C": 123
}

And the table you want to populate looks like:

A	X	Y	C
Hello	Val1	Val2	123

Then I can't think of a way to "automatically" flatten the structured content of the topic message. What you appear to be describing is a mapping / transformation. Technology such as Dataflow, Data Fusion, Workflows, App Integration, Cloud Functions or Cloud Run could be used to sit between the topic containing messages and the insertion into the table. Alternatively, land the structured data into BigQuery directly and then perform ELT to transform within the data warehouse.

To answer your question .... my answer to "Is it possible to flatten the payload directly using schema and sink to bigquery?" is then "no" ...

View solution in original post

kolban

Maybe we take PubSub out of the equation (just for contemplation purposes). I'm sensing that you have a nested and repeated JSON document and you want to manifest the records in such a document as target of a SQL SELECT statement. We might want to look into the importation of the JSON data without transformation. We would then up with a column in a table of type JSON where you can then refer to fields such as:

SELECT doc.field1, doc.field2 .... FROM my_table

If you are looking to flatten the record structure, consider creating a view:

CREATE VIEW my_view AS (
  SELECT doc.field1 as field1, doc.field2 as field2 ... FROM my_table
)

itsm3_vishnu

Ye this works. But, Is it possible to do with topic schema? Suppose i have nested message in topic. Is it possible to flatten the payload directly using schema and sink to bigquery?

kolban

If the message being written to the PubSub topic was nested ... for example:

{
  "A": "Hello",
  "B": {
    "X": "Val1",
    "Y": "Val2"
  },
  "C": 123
}

And the table you want to populate looks like:

A	X	Y	C
Hello	Val1	Val2	123

Then I can't think of a way to "automatically" flatten the structured content of the topic message. What you appear to be describing is a mapping / transformation. Technology such as Dataflow, Data Fusion, Workflows, App Integration, Cloud Functions or Cloud Run could be used to sit between the topic containing messages and the insertion into the table. Alternatively, land the structured data into BigQuery directly and then perform ELT to transform within the data warehouse.

To answer your question .... my answer to "Is it possible to flatten the payload directly using schema and sink to bigquery?" is then "no" ...