I am getting the below error while accessing a Google Cloud Storage bucket for the first time via Cloudera CDH 6.3.3 Hadoop Cluster. I am running the command on the edge node where Google Cloud SDK is installed. Reachability of Google Storage is only possible via HTTP proxy as of now.
Cloudera CDH 6.3.3 cluster is on-prem.
Below is the command that I run
hadoop fs -ls gs://distcppoc-2021-08-09/
Error is:
ls: Error accessing: bucket: distcppoc-2021-08-09
Last few lines when the Hadoop command is run:
21/08/10 21:07:42 DEBUG fs.FileSystem: looking for configuration option fs.gs.impl 21/08/10 21:07:42 DEBUG fs.FileSystem: FS for gs is class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem ls: Error accessing: bucket: distcppoc-2021-08-09
Below are the configurations that are added to Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml in Cloudera Manager --> HDFS --> Configurations
fs.gs.working.dir - /
fs.gs.path.encoding - uri-path
fs.gs.auth.service.account.email - serviceaccount@dummyemail.iam.gserviceaccount.com
fs.gs.auth.service.account.private.key.id - <value>52d6ad0c6ecb7f6da9
fs.gs.auth.service.account.private.key - MIIEvgIBADANBgkq<FULL PRIVATE KEY>MMASBjSOTA1j+jL
Restarted HDFS Services.
gsutil command works fine when it is run from an on-prem cluster.
Command: gsutil ls gs://distcppoc-2021-08-09 Output: gs://distcppoc-2021-08-09/sftp.png
GCS Connector is installed on all the Cloudera Cluster Hadoop nodes at below location:
Location: /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars Jar file: gcs-connector-hadoop3-1.9.10-cdh6.3.3-shaded.jar
Can I get some help here?
Hello,
You can try to gather more details on the error:
Add the "-loglevel debug" parameter in the command to determine what's causing the bucket listing to fail with hadoop command while it works well with gsutil.
Please see the following troubleshooting guide https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/INSTALL.md#troubleshooting-...
The hadoop verbosity flag: -v,--verbose might also work.
https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html