Issue with Quoted CSV Fields in Vertex AI Model Mo...

Andrew8888 · 02-03-2025 03:33 PM

Hi all,

I've set up Model Monitoring for my Vertex AI batch predictions, using a CSV file on GCS as the training dataset and a JSONL file on GCS as the input config.

However, I'm running into an issue where Vertex AI does not correctly handle quoted fields in the CSV dataset. Specifically, it's alerting on every string feature due to mismatched formatting:

In the CSV, feature values are quoted (e.g., "aud").
In the JSONL, feature values are unquoted (e.g., aud).

As a result, Vertex AI incorrectly detects feature drift for every string column, even though the values are the same except for the quotes.

I found that removing quotes from the CSV fixes the issue, but that’s not ideal since I still need proper escaping for certain characters.

Has anyone encountered this issue before? Is there a better way to handle it without modifying the CSV format?

Thanks in advance!

MarvinLlamas

Hi @Andrew8888,

Welcome to Google Cloud Community!

It seems like you are experiencing issues with Vertex AI Model Monitoring incorrectly detecting feature drift due to inconsistent handling of quoted fields in your CSV training dataset versus unquoted fields in your JSONL input. This mismatch causes Vertex AI to interpret the same data as different values, triggering false drift alerts.

Here are the potential ways that might help with your use case:

Data Transformation Functions: Make sure you leverage Vertex AI's flexible data transformation functions, such as tf.transform, to pre-process your serving requests on-the-fly before they are fed to your model during deployment.

Preprocess the CSV during Batch Prediction: You may want to use a Vertex AI custom prediction routine or implement a data preprocessing step in your batch prediction pipeline to remove quotes from the relevant string features before the model makes predictions.

Custom Model Monitoring Metrics: Consider creating custom metrics to compare the distribution of your unquoted values if changing the data format isn't feasible.

You can refer to the following documentation, which provides information on Google Cloud's pre-processing TensorFlow pipelines, batch prediction components in Vertex AI, and creating custom metrics for monitoring:

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Issue with Quoted CSV Fields in Vertex AI Model Monitoring