Re: Imbalance DataSet for Tabular AutoML

holguinmora · 04-18-2022 10:26 AM

Hi, I would like to know if in case of having a tabular database, with binary data (class 0 and Class 1), that has an imbalance between class 0 and class 1, as it occurs in scenarios of fraud in financial transactions.

Does AutoML solves automatically the imbalance situation? Or is it possible to add SMOTE or ADASYN to the AutoML model? Any comments to advice more than appreciated

josegutierrez

There are several ways of handling imbalanced datasets:

Upsampling and/or Downsampling: In case of Upsampling, instances from the minority classes are duplicated in the training dataset at random. In case of Downsampling, certain instances of the majority classes are randomly left out of the training dataset. Upsampling of minority class and downsampling of the majority class can be done at the same time.
Upweighting and/or Downweighting: In Upweighting, sample weight greater than 1 is given to instances from the minority classes. In case of Downweighting, sample weight less than 1 is given to instances from the majority classes. The sample weights are taken into account when computing the loss function. Upweighting and Downweighting can be used together.
Data Augmentation: In this approach, data augmentation techniques are used to generate synthetic instances of the minority class to better balance the training dataset.

holguinmora

José hi, thanks for your answer but is not very clear.....

The question is if I can upload a data set with imbalance situation to AutoML or I need to fix somehow the situation before uploading the data into AutoML or AutoML can handle in very good way Imbalance data sets?

mahewashabdi

I am also interested in the same question.