Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Need Help on Hybrid Query support with bigquery and spark

I have small use case, like the below
i have CSV file  in GCS storage, there one Huge table in bigQuery as emp_target.
now, I read this CSV file using spark like df = spark.read.format("csv").option()...load()
df.createOrReplaceTempView("empTempView")

now i need to join this view(empTempView) with emp_target like
query = "select e.empid,e.empname,e.salary,e.department, t.managerID from empTempView e inner join dataset.emp_target t on e.empid=t.empid"
now I want to execute the above query uisng 
option1:
res_df  = spark.sql(query) --> did not work giving me an error like empTempView not exists in bigquery
option 2:
res_df = spark.read.format("bigquery").option(..).option("dbtable",query)...
option 2 also giving me same error
Note: I do not have option to write tempView into bigquery and do join and i can not load emp_target into spark dataframe since it is huge
Please do help me how can i achieve joining above two different datasets in spark and process in dataproc.

Thank you
veera

 

0 1 59
1 REPLY 1

Hi @Veeraravi_DE - Have you explored the option of using external table (BQ table pointing to csv file) and then doing the join operation?