Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Vertex AI custom job training pipeline unable to query bigquery

VAC
Bronze 1
Bronze 1

I have a python script (some of the values are changed). It works in a vertex ai workbench. It works as a docker container in the workbench too. I am trying to set it up to run in vertex ai training custom job pipeline and it's where I am hitting strange hanging issues.

My custom job:

 

aiplatform.init(location="")

job = aiplatform.CustomContainerTrainingJob(
display_name = "name"
,container_uri = "location-docker.pkg.dev/project/registry/docker_container"
,staging_bucket = "myBucket"
)

job.run(replica_count = 1
,machine_type = "e2-standard-4"
,enable_web_access = True
,timeout = 900
,args=[]
)

 

I sshed into the worker and tried to run script from cli for more detailed logs. I get cuda warnings (normal, I have no gpus) and then nothing (log explorer is the same). If I end the process I get error:

 

^CTraceback (most recent call last):                                                                                                                                                                                                                                                                                  
  File "/app/main.py", line 175, in <module>                                                                                                                                                                                                                                                                          
    main()                                                                                                                                                                                                                                                                                                            
  File "/app/main.py", line 126, in main                                                                                                                                                                                                                                                                              
    BQApi.log_start()                                                                                                                                                                                                                                                                                                 
  File "/app/bqApi.py", line 78, in log_start                                                                                                                                                                                                                                                                         
    self.Client.query(sqlLogStart, project=self.project, job_config= jc)                                                                                                                                                                                                                                              
  File "/app/venv/lib/python3.12/site-packages/google/cloud/bigquery/client.py", line 3502, in query                                                                                                                                                                                                                  
    return _job_helpers.query_jobs_insert(                                                                                                                                                                                                                                                                            
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                            
  File "/app/venv/lib/python3.12/site-packages/google/cloud/bigquery/_job_helpers.py", line 159, in query_jobs_insert                                                                                                                                                                                                 
    future = do_query()                                                                                                                                                                                                                                                                                               
             ^^^^^^^^^^                                                                                                                                                                                                                                                                                               
  File "/app/venv/lib/python3.12/site-packages/google/cloud/bigquery/_job_helpers.py", line 136, in do_query                                                                                                                                                                                                          
    query_job._begin(retry=retry, timeout=timeout)  
...
 File "/app/venv/lib/python3.12/site-packages/requests/adapters.py", line 667, in send                                                                                                                                                                                                                               
    resp = conn.urlopen(                                                                                                                                                                                                                                                                                              
           ^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                              
  File "/app/venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 773, in urlopen                                                                                                                                                                                                                       
    self._prepare_proxy(conn)                                                                                                                                                                                                                                                                                         
  File "/app/venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 1042, in _prepare_proxy                                                                                                                                                                                                               
    conn.connect()                                                                                                                                                                                                                                                                                                    
  File "/app/venv/lib/python3.12/site-packages/urllib3/connection.py", line 704, in connect                                                                                                                                                                                                                           
    self.sock = sock = self._new_conn()   

 

In bqApi I define self.Client like this:

 

from google.cloud import bigquery
from google import auth

#somewhere in __init__
self.project = "project"
credentials, project = auth.default()
self.Client = bigquery.Client(project=self.project,credentials=credentials, location = "location")

 

The error seems to be happening here:

 

jc = next(self.create_bq_job_config())
self.Client.query(sqlLogStart, project=self.project, job_config= jc)

 

create_job_config() is doing this:

 

while True:
            yield bigquery.QueryJobConfig(query_parameters=[
                bigquery.ScalarQueryParameter("start_time", "DATETIME", self.startTime)
            ]
    )

 

 

I think training custom job is unable to reach BQ for some reason. Any ideas what it could be and how I could fix this?

1 REPLY 1