BigQuery EXPORT DATA OPTIONS TO JSON Not retaining...

souravsingh13 · 09-07-2024 02:23 AM

We are using EXPORT DATA OPTIONS to export tables to newline delimited JSON files.

We are not able to export table rows correctly which any column with NULL value, column values with NULL value is getting removed from JSON.

A | B | C
1. 2. 3
4 NULL 6

{"A": 1, "B": 2, "C":3}
{"A": 4, "C":6}

What we need :
{"A": 1, "B": 2, "C":3}
{"A": 4, "B": None, "C":6} or {"A": 4, "B": NULL, "C":6}

jangemmar

Hi @souravsingh13,

Welcome to Google Cloud Community!

As mentioned by @Betjens in a similar case, There are two options to handle null values in exporting data in BigQuery to JSON format.

Calling directly from python using BigQuery client library

from google.cloud import bigquery
import json

client = bigquery.Client()

query = "select null as field1, null as field2"
query_job = client.query(query)

json_list = {}
for row in query_job:
    json_row = {'field1':row[0],'field2':row[1]}
    json_list.update(json_row)
    
with open('test.json','w+') as file:
    file.write(json.dumps(json_list))

You can also use Apache Beam Dataflow with python and BigQuery to handle the null values

import apache_beam as beam
from apache_beam.io import BigQuerySource
from apache_beam.io import WriteToText
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions



def add_null_field(row, field):
  if field!='skip':
    row.update({field: row.get(field, None)})
  return row


def run(argv=None, save_main_session=True):
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--output',
        dest='output',
        required=True,
        help='Output file to write results to.')
    known_args, pipeline_args = parser.parse_known_args(argv)

    pipeline_options = PipelineOptions(pipeline_args)
    pipeline_options.view_as(SetupOptions).save_main_session = save_main_session

    with beam.Pipeline(options=pipeline_options) as p:

        (p
        | beam.io.Read(beam.io.BigQuerySource(query='SELECT null as field1, null as field2'))
        | beam.Map(add_null_field, field='skip')
        | beam.Map(json.dumps)
        | beam.io.Write(beam.io.WriteToText(known_args.output, file_name_suffix='.json')))

if __name__ == '__main__': 
  run()

I hope the above information is helpful.

BigQuery EXPORT DATA OPTIONS TO JSON Not retaining cell values with value NULL