Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

BigQuery External Table Creation Error with Hudi using BigQuerySyncTool

 

Hello everyone, I'm new to Hudi, and we're using Google Cloud Dataproc to enable it. I'm trying to create an external table in BigQuery using the BigQuerySyncTool, but I keep encountering an error with the following command. Could someone please help me with the correct way to pass the parameters?Run1 Command:

 

gcloud dataproc jobs submit spark --cluster=test-streaming-2 --region=us-central1 --class org.apache.hudi.gcp.bigquery.BigQuerySyncTool --jars=file://usr/lib/hudi/tools/bq-sync-tool/hudi-gcp-bundle-0.15.0.jar --properties=hoodie.gcp.bigquery.sync.project_id=dev-test-apps,dataset-name=streaming,dataset-location=us,source-uri=gs://streaming/dataproc/trips/partitionpath=*,source-uri-prefix=gs://streaming/dataproc/trips/,base-path=gs://streaming/dataproc/trips/,partitioned-by=partitionpath,use-bq-manifest-file=true 

error: Exception in thread "main" com.beust.jcommander.ParameterException: The following options are required: [--dataset-location], [--source-uri], [--project-id], [--dataset-name]

 



Run2 Command:

 

gcloud dataproc jobs submit spark --cluster=test-streaming-2 --region=us-central1 --class org.apache.hudi.gcp.bigquery.BigQuerySyncTool --jars=file://usr/lib/hudi/tools/bq-sync-tool/hudi-gcp-bundle-0.15.0.jar -- project_id=dev-test-apps 

error: Exception in thread "main" com.beust.jcommander.ParameterException: Was passed main parameter 'project_id=dev-test-apps' but no main parameter was defined in your arg class

 



Run3 Command:

 

gcloud dataproc jobs submit spark --cluster=test-streaming-2 --region=us-central1 --class org.apache.hudi.gcp.bigquery.BigQuerySyncTool --jars=file://usr/lib/hudi/tools/bq-sync-tool/hudi-gcp-bundle-0.15.0.jar --project-id dev-test-apps 

error: ERROR: (gcloud.dataproc.jobs.submit.spark) unrecognized arguments: --project-id (did you mean '--project'?)​

 

 

0 1 247
1 REPLY 1