Hi,
I am trying to modify an iceberg table created in bigquery using apache spark hosted using a Dataproc cluster.
Steps I have taken so far:
Issue:
I am trying to install the iceberg dependencies in the dataproc clsuter where i am getting the module not found error.
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.3.0
steps taken:
where i want to go:
I want to primarily to be able to run the below code as stated in the official docs;
let me know if you need any other details, thanks.
Solved! Go to Solution.
Hi @cassandramae,
I was able to solve this by creating a public NAT gateway and then spinning up a dataproc cluster using that gateway.
The issue was dataproc was not able to access the internet to fetch dependencies and setting up NAT helped the case.
Thanks for reaching out and I have made a note of your solution as well.