Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Error in accessing google cloud storage bucket via hadoop fs -ls that runs on Cloudera Hadoop CDH 6

I am getting the below error while accessing a Google Cloud Storage bucket for the first time via Cloudera CDH 6.3.3 Hadoop Cluster. I am running the command on the edge node where Google Cloud SDK is installed. Reachability of Google Storage is only possible via HTTP proxy as of now.

 

Cloudera CDH 6.3.3 cluster is on-prem.

 

Below is the command that I run

hadoop fs -ls gs://distcppoc-2021-08-09/

Error is:

ls: Error accessing: bucket: distcppoc-2021-08-09

Last few lines when the Hadoop command is run:

21/08/10 21:07:42 DEBUG fs.FileSystem: looking for configuration option fs.gs.impl
21/08/10 21:07:42 DEBUG fs.FileSystem: FS for gs is class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
ls: Error accessing: bucket: distcppoc-2021-08-09

Below are the configurations that are added to Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml in Cloudera Manager --> HDFS --> Configurations

fs.gs.working.dir - /
fs.gs.path.encoding - uri-path
fs.gs.auth.service.account.email - serviceaccount@dummyemail.iam.gserviceaccount.com
fs.gs.auth.service.account.private.key.id - <value>52d6ad0c6ecb7f6da9
fs.gs.auth.service.account.private.key - MIIEvgIBADANBgkq<FULL PRIVATE KEY>MMASBjSOTA1j+jL

Restarted HDFS Services.

gsutil command works fine when it is run from an on-prem cluster.

Command: gsutil ls gs://distcppoc-2021-08-09                                                                                                                     
Output: gs://distcppoc-2021-08-09/sftp.png

GCS Connector is installed on all the Cloudera Cluster Hadoop nodes at below location:

Location: /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars
Jar file: gcs-connector-hadoop3-1.9.10-cdh6.3.3-shaded.jar

Can I get some help here?

0 1 1,762
1 REPLY 1

Hello,

You can try to gather more details on the error:

Add the "-loglevel debug" parameter in the command to determine what's causing the bucket listing to fail with hadoop command while it works well with gsutil.
Please see the following troubleshooting guide https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/INSTALL.md#troubleshooting-...


The hadoop verbosity flag: -v,--verbose might also work.
https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html