I have a training dataset with some numeric columns that are appropriately nullable. Eg think of a propensity model for employee churn where we have a feature for days_until_next_leave. Not all employees have this feature populated if they don't have leave booked, but many do.
How should I treat this in vertex autoML? It seems null numeric rows are basically excluded? I would have expected "autoML" to automatically handle nulls, but apparently that's not the case.
I can impute it myself, but since Google keeps the autoML algorithm secret I have no idea what an appropriate imputation value would be (since it depends on the algorithm family).
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |