Request for troubleshooting assistance with AutoML...

adamanc · 04-14-2024 10:39 AM

I am trying to complete an intro to ML course for Product Managers offered online by Duke University. I believe I have correctly set up all the relevant permissions and APIs and have imported a small-ish structured dataset that is fairly clean (only 4 features and one target column, all containing numeric values with no missing entries).

When I try to run a simple regression analysis, the pipeline runs for about 4 minutes and returns the following error:

The DAG failed because some tasks failed. The failed tasks are: [exit-handler-1].; Job (project_id = optimistic-keel-413702, job_id = 2610322517956493312) is failed due to the above error.; Failed to handle the job: {project_number = 780294252185, job_id = 2610322517956493312}

Browsing the Log Explorer, I see a large number of errors. The earliest one states:

tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

The last one states:

Error: "Dataflow pipeline failed. State: FAILED, Error:\nWorkflow failed. Causes: Error:\n Message: Exceeded limit 'QUOTA_FOR_INSTANCES' on resource 'dataflow-tabular-stats-and-example-04140836-3611-harness'. Limit: 24.0\n HTTP Code: 403"

How should a non-technical user proceed to diagnose root cause for a large number of errors in the logs? Simply configure the job to fail on the first error and slowly work my way through those? I'm not sure what expectations weren't meet during the set up. Would it be quicker for me to dust off some very old programming skills and learn some Python to more directly guide the process through Google Colab?

medisagi

I'm facing the same error. I work in Asia-southeast1 (Singapore). Do you finally solve the problem?

Request for troubleshooting assistance with AutoML pipeline