custom_model_training_cpus exceeding quota despite... - Page 2

jim4 · 02-24-2025 09:08 AM

I have a Python script that starts a custom Vertex AI job when a file is uploaded:

def processUpload(event, context):
    try:
        # Initialize Vertex AI with staging bucket
        print("Initializing Vertex AI...")
        aiplatform.init(
            project='MY-PROJECT',
            location='us-central1',
            staging_bucket='jazz-function-source-bucket'
        )

        # Create timestamp for versioning
        timestamp = datetime.datetime.now().strftime('%Y%m%d-%H%M%S')
        
        # Create the custom job with versioned image
        print("Creating custom job...")

job = aiplatform.CustomJob(
    display_name=f"process-text-{timestamp}",
    worker_pool_specs=[{
        "machine_spec": {
             "machine_type": "n1-standard-4",
             
        },
        "replica_count": 1,
        "container_spec": {
            "image_uri": TEXT_PROCESSOR_IMAGE,
            "args": [bucket_name, file_name]
        },
    }]
)

I'm getting an error in the logs:

process-radar-upload-v2  I0Wqs0wXjS0q  2025-02-24 16:43:27.310  Error launching job: 429 The following quota metrics exceed quota limits: aiplatform.googleapis.com/custom_model_training_cpus

When I go to "IAM" > "Quotas & System Limits" and sort by "Current usage percentage" nothing is over 70%.

If I filter for `aiplatform.googleapis.com/custom_model_training_cpus` I see "Vertex AI API" for different regions. For each one the current value is 1 and i'm not able to change it. Should this be sufficient?

custom_model_training_cpus exceeding quota despite low usage?