BigQuery Subscription with proto3: Wrong writing of null values to BigQuery table

Hi all,

It appears that the BigQuery Subscription writes NULL values to the BigQuery table when the proto3 default values are set in the Pub/Sub message, even though they are explicitly set.

The following setup:

Pub/Sub Topic Schema:

 

    syntax = "proto3";
    message MyMessage {
      int64 firstInt = 1;
      int64 secondInt = 2;
      string firstString = 3;
      string secondString = 4;
    }

 

BigQuery Table Schema:

 

[
  {
    "name": "firstInt",
    "type": "INTEGER"
  },
  {
    "name": "secondInt",
    "type": "INTEGER"
  },
  {
    "name": "firstString",
    "type": "STRING"
  },
  {
    "name": "secondString",
    "type": "STRING"
  }
]

 

The Node.js code to publish the proto3 message to the topic:

 

const protobuf = require('protobufjs');
const { PubSub } = require('@google-cloud/pubsub');

const topicName = 'test1'; // TODO DEVELOPER: Change this to your topic!

const pubSubClient = new PubSub();
const topic = pubSubClient.topic(topicName);

const run = async () => {
    const jsonMessage = { "firstInt": 223, "secondInt": 0, "firstString": "hello", "secondString": "" };
    const root = await protobuf.load('./my.proto');
    const myMessage = root.lookupType('MyMessage');

    const protoMessage = myMessage.create(jsonMessage);
    console.log('MyMessage:', JSON.stringify(protoMessage, null, 2));
/*
MyMessage:  {
  "firstInt": "223",
  "secondInt": "0",
  "firstString": "hello",
  "secondString": ""
}
*/
    const dataBuffer = Buffer.from(JSON.stringify(protoMessage.toJSON()));
    const ret = await topic.publishMessage({ data: dataBuffer });
    console.log(`ret: ${ret}`);
}

run();

 

The above code runs without problems, and I can see new entries in the BigQuery table:

pzwikirschuc_0-1661326113838.png

Even though the values for "secondInt" and "secondString" are NULL in BigQuery, even though I explicitly set them to "0" and "". Since those are the default values of proto3 I expect some conversion like "if value is default value, then write NULL into BigQuery table" logic on GCP side.

This behavior might be related to BigQuery Subscription: Incompatible Schema using proto3 and BigQuery Subscriber even though the issue is a bit different.

I think that the proto3 default values are not transferred over the wire (I think I read it somewhere), which is fine, even though then at least the default value should be written into BigQuery instead of just using NULL. e.g. "0" and empty string are quite valid value to have and shouldn't be substituted by NULL. Especially since proto3 doesn't have NULL values.

 

Is GCP aware of this issue and is there any estimate in terms of fixing this bug?

Thanks in advance.

0 2 711
2 REPLIES 2

Update:
The Google Product team has acknowledged this as a bug and is working on a fix.

Any updates on this?  I submitted it as an issue last year, and also haven't heard anything.  It's a shame that Google will not fully support Google products.  It is also a shame that there is no follow up or comment on this topic.

https://issuetracker.google.com/issues/294033723