Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

BigQuery EXPORT DATA OPTIONS TO JSON Not retaining cell values with value NULL

We are using EXPORT DATA OPTIONS to export tables to newline delimited JSON files.

We are not able to export table rows correctly which any column with NULL value, column values with NULL value is getting removed from JSON.

A      | B      | C
1.      2.           3
4     NULL   6

{"A": 1, "B": 2, "C":3}
{"A": 4, "C":6}

What we need : 
{"A": 1, "B": 2, "C":3}
{"A": 4, "B": None, "C":6} or {"A": 4, "B": NULL, "C":6}

0 1 417
1 REPLY 1

Hi @souravsingh13,

Welcome to Google Cloud Community!

As mentioned by @Betjens in a similar case, There are two options to handle null values in exporting data in BigQuery to JSON format.

  • Calling directly from python using BigQuery client library
    from google.cloud import bigquery
    import json
    
    client = bigquery.Client()
    
    query = "select null as field1, null as field2"
    query_job = client.query(query)
    
    json_list = {}
    for row in query_job:
        json_row = {'field1':row[0],'field2':row[1]}
        json_list.update(json_row)
        
    with open('test.json','w+') as file:
        file.write(json.dumps(json_list))
  • You can also use Apache Beam Dataflow with python and BigQuery to handle the null values
    import apache_beam as beam
    from apache_beam.io import BigQuerySource
    from apache_beam.io import WriteToText
    from apache_beam.options.pipeline_options import PipelineOptions
    from apache_beam.options.pipeline_options import SetupOptions
    
    
    
    def add_null_field(row, field):
      if field!='skip':
        row.update({field: row.get(field, None)})
      return row
    
    
    def run(argv=None, save_main_session=True):
        parser = argparse.ArgumentParser()
        parser.add_argument(
            '--output',
            dest='output',
            required=True,
            help='Output file to write results to.')
        known_args, pipeline_args = parser.parse_known_args(argv)
    
        pipeline_options = PipelineOptions(pipeline_args)
        pipeline_options.view_as(SetupOptions).save_main_session = save_main_session
    
        with beam.Pipeline(options=pipeline_options) as p:
    
            (p
            | beam.io.Read(beam.io.BigQuerySource(query='SELECT null as field1, null as field2'))
            | beam.Map(add_null_field, field='skip')
            | beam.Map(json.dumps)
            | beam.io.Write(beam.io.WriteToText(known_args.output, file_name_suffix='.json')))
    
    if __name__ == '__main__': 
      run()

I hope the above information is helpful.