nullMarker on empty string breaking change

samirlv · 07-26-2024 09:30 AM

Hello, we noticed a breaking change behavior in our ingestion systems on 24th July due to a change in nullmarker beahvior.

in our CSV load job parameter we always set explicitly a nullmarker to "" by default. Following the documentation this is expected to be transparent. in https://cloud.google.com/bigquery/docs/reference/rest/v2/Job

nullMarker

string

Optional. Specifies a string that represents a null value in a CSV file. For example, if you specify "\N", BigQuery interprets "\N" as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.

Before the 24 July our systems were always setting the nullmark to "" and our pipeline worked as expected on CSV file with empty string the previous behavior was keeping the empty string intact.

During July 24, The behavior changed into ingesting these empty string into null values which broke our pipeline because of non nullable data quality constraints. As a fix deleted the explicit setting to "".

Our ingestion systems weren't updated during the change but the same ingestion runtime changed behavior so we highly suspect it's coming from google side. and there was no hint in the release notes.

Do you know if there a change internally that results in this breaking change behavior, thanks