We are using EXPORT DATA OPTIONS to export tables to newline delimited JSON files.
We are not able to export table rows correctly which any column with NULL value, column values with NULL value is getting removed from JSON.
A | B | C
1. 2. 3
4 NULL 6
{"A": 1, "B": 2, "C":3}
{"A": 4, "C":6}
What we need :
{"A": 1, "B": 2, "C":3}
{"A": 4, "B": None, "C":6} or {"A": 4, "B": NULL, "C":6}
Hi @souravsingh13,
Welcome to Google Cloud Community!
As mentioned by @Betjens in a similar case, There are two options to handle null values in exporting data in BigQuery to JSON format.
from google.cloud import bigquery
import json
client = bigquery.Client()
query = "select null as field1, null as field2"
query_job = client.query(query)
json_list = {}
for row in query_job:
json_row = {'field1':row[0],'field2':row[1]}
json_list.update(json_row)
with open('test.json','w+') as file:
file.write(json.dumps(json_list))
import apache_beam as beam
from apache_beam.io import BigQuerySource
from apache_beam.io import WriteToText
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions
def add_null_field(row, field):
if field!='skip':
row.update({field: row.get(field, None)})
return row
def run(argv=None, save_main_session=True):
parser = argparse.ArgumentParser()
parser.add_argument(
'--output',
dest='output',
required=True,
help='Output file to write results to.')
known_args, pipeline_args = parser.parse_known_args(argv)
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = save_main_session
with beam.Pipeline(options=pipeline_options) as p:
(p
| beam.io.Read(beam.io.BigQuerySource(query='SELECT null as field1, null as field2'))
| beam.Map(add_null_field, field='skip')
| beam.Map(json.dumps)
| beam.io.Write(beam.io.WriteToText(known_args.output, file_name_suffix='.json')))
if __name__ == '__main__':
run()
I hope the above information is helpful.