I have created cloud run jobs that run at set frequency, even so often I get an failure with internal error after the job is running for a while or the job gets cancelled with message Cancelled by server. My container or code doesn't seems to be producing the error. I have python script running in the container calling an API and persisting data to BigQuery.
I have checked container CPU utilization and Memory utilization metrics and both seem to have sufficient room. Any pointers to what else could be causing this
According to this documentation, this issue might occur when the Cloud Run service agent does not exist, or when it does not have the Cloud Run Service Agent (roles/run.serviceAgent) role.
You can also take a look at this information to view any error and share it to have a more detailed description of your issue.