I am writing a pipeline to migrate data from GCS to Bigtable. Data is in json format. Pipeline works fine but number of records written by dataflow job and Count I get from BigQuery using BigTable as external table doesn't match.
I have set "onlyReadLatest": false to get all records when I read from BigQuery. ``
CloudBigtableTableConfiguration bigtableTableConfig = new CloudBigtableTableConfiguration.Builder() .withProjectId(options.getBigtableProjectId()) .withInstanceId(options.getBigtableInstanceId()) .withTableId(options.getBigtableTableId()) .build(); PDone tableRows = btRow.get(successTag) .apply("WriteToBT", CloudBigtableIO.writeToTable(bigtableTableConfig));
I recommend checking the logs to rule out it's not a dataflow job issue. To do so, you may use the cloud console on dataflow. Alternatively, you may run this command using cloud shell
gcloud logging read "resource.type=dataflow_step AND resource.labels.job_id=JOB_ID" --project=PROJECT_ID