Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

BigQuery External Table Creation Error with Hudi using BigQuerySyncTool

 

Hello everyone, I'm new to Hudi, and we're using Google Cloud Dataproc to enable it. I'm trying to create an external table in BigQuery using the BigQuerySyncTool, but I keep encountering an error with the following command. Could someone please help me with the correct way to pass the parameters?Run1 Command:

 

gcloud dataproc jobs submit spark --cluster=test-streaming-2 --region=us-central1 --class org.apache.hudi.gcp.bigquery.BigQuerySyncTool --jars=file://usr/lib/hudi/tools/bq-sync-tool/hudi-gcp-bundle-0.15.0.jar --properties=hoodie.gcp.bigquery.sync.project_id=dev-test-apps,dataset-name=streaming,dataset-location=us,source-uri=gs://streaming/dataproc/trips/partitionpath=*,source-uri-prefix=gs://streaming/dataproc/trips/,base-path=gs://streaming/dataproc/trips/,partitioned-by=partitionpath,use-bq-manifest-file=true 

error: Exception in thread "main" com.beust.jcommander.ParameterException: The following options are required: [--dataset-location], [--source-uri], [--project-id], [--dataset-name]

 



Run2 Command:

 

gcloud dataproc jobs submit spark --cluster=test-streaming-2 --region=us-central1 --class org.apache.hudi.gcp.bigquery.BigQuerySyncTool --jars=file://usr/lib/hudi/tools/bq-sync-tool/hudi-gcp-bundle-0.15.0.jar -- project_id=dev-test-apps 

error: Exception in thread "main" com.beust.jcommander.ParameterException: Was passed main parameter 'project_id=dev-test-apps' but no main parameter was defined in your arg class

 



Run3 Command:

 

gcloud dataproc jobs submit spark --cluster=test-streaming-2 --region=us-central1 --class org.apache.hudi.gcp.bigquery.BigQuerySyncTool --jars=file://usr/lib/hudi/tools/bq-sync-tool/hudi-gcp-bundle-0.15.0.jar --project-id dev-test-apps 

error: ERROR: (gcloud.dataproc.jobs.submit.spark) unrecognized arguments: --project-id (did you mean '--project'?)​

 

 

0 1 216
1 REPLY 1

Hi @rchava3699,

Welcome to Google Cloud Community!

The error you have encountered in commands 1 to 3 indicates that the required arguments were not passed explicitly and the --properties flag is only used to configure Spark settings.

Here’s what you can do, ensure that each parameter is separated and use the separator (--) to explicitly pass each parameter to the BigQuerySyncTool. Additionally, double-check the parameter spellings for accuracy and ensure there are no extra whitespaces. For detailed guidance and correct format, refer to the provided documentation:

I hope the above information is helpful.