I have a training dataset with some numeric columns that are appropriately nullable. Eg think of a propensity model for employee churn where we have a feature for days_until_next_leave. Not all employees have this feature populated if they don't have leave booked, but many do.
How should I treat this in vertex autoML? It seems null numeric rows are basically excluded? I would have expected "autoML" to automatically handle nulls, but apparently that's not the case.
I can impute it myself, but since Google keeps the autoML algorithm secret I have no idea what an appropriate imputation value would be (since it depends on the algorithm family).
Hi @Jwaugh,
Welcome to Google Cloud Community!
When using Vertex AI AutoML for your propensity model, it's important to manage missing values properly. Here’s a simpler guide on how to manage null values:
yyyy-mm-dd
. For numeric columns, ensure all values are in decimal format (e.g., 0.0
instead of 0
).For better AutoML models, handle null values and prepare data thoroughly. Experiment with imputation and data prep techniques to optimize results.
I hope the above information is helpful.
Honestly it is not very helpful to say "experiment for the best results" in an "autoML" product which is supposed to be plug and play.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |