Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Kafka to BigQuery dataflow flex template doesn't migrate key

Hi

I'm trying to migrate Kafka JSON messages to BigQuery, including Kafka `key` value. I'm following docs here: https://cloud.google.com/dataflow/docs/guides/templates/provided/kafka-to-bigquery . The `key` is not part of the JSON message itself, but is added by Kafka to every message (Kafka metadata like key, partition, offset, timestamp, etc.). The messages are migrated properly (the column `msg` contains proper data), but the `_key` column is null for all rows. Any ideas what could be wrong? I run the flex template like this:

```

gcloud dataflow flex-template run kafka-to-bq \
--template-file-gcs-location gs://dataflow-templates/latest/flex/Kafka_to_BigQuery_Flex \
--additional-experiments=use_runner_v2 \
--region europe-west1 \
--service-account-email=<SA_MAIL> \
--network=<NETWORK> \
--subnetwork=<SUBNETWORK> \
--additional-experiments=use_network_tags=<NET_TAG> \
--additional-experiments=use_network_tags_for_flex_templates=<NET_TAG> \
--parameters '^&^readBootstrapServerAndTopic=<KAFKA_SERVERS>;<TOPIC>' \
--parameters outputTableSpec=<PROJECT.DATASET.TABLE>,\
outputDeadletterTable=<PROJECT.DATASET2.TABLE2>,\
writeDisposition=WRITE_APPEND,\
writeMode=SINGLE_TABLE_NAME,\
useBigQueryDLQ=false,\
messageFormat=JSON,\
kafkaReadAuthenticationMode=NONE,\
javascriptTextTransformGcsPath=gs://<BUCKET>/js/udf.js,\
javascriptTextTransformFunctionName=transformRow,\
persistKafkaKey=true

```


My UDF function looks like (for simplicity):

```

function transformRow(inStr) {
return JSON.stringify({'msg': inStr})
}

```

My BigQuery table:

```
CREATE OR REPLACE TABLE `<PROJECT.DATASET.TABLE>`
(
_key BYTES,
msg STRING
);
```

What I tried so far:
- removing UDF function and creating BigQuery table with columns for all JSON fields. Result: all the columns are populated properly (with the respective JSON fields), but the column `_key` is still empty.

Thank you for any tips/help.

0 1 345
1 REPLY 1

Hi @Kossak,

Welcome to the Google Cloud Community!

It seems that the Kafka key isn't being successfully migrated to your BigQuery table, resulting in the ‘_key’ column being empty.

Here are a few workarounds you may want to explore that might help resolve your use case:

  • Verify Kafka Key Extraction: Ensure that the Kafka key is being correctly extracted in the Dataflow template.This is essential to ensure the key is included in your BigQuery table.
  • Modify UDF Function: Your current UDF function does not include the Kafka key. Update the ‘transformRow’ function to include the key value. Ensure the Kafka key is passed to the ‘transformRow’ function correctly.
  • Check Dataflow Parameters: Verify the parameters you're passing to the Dataflow template. Ensure ‘persistKafkaKey’ is set to true and confirm all other parameters are correctly configured.
  • Inspect Logs and Errors: Review the Dataflow job logs for any errors or warnings that might provide clues as to why the key isn't being populated.

You may refer to this documentation, which might help you understand how to Write data from Kafka to BigQuery with Dataflow.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.