Evaluation of Oversampling Data Balancing Techniques in the Context of Ordinal Classification
Authors
Francisco Marques
Inês Campos Monteiro Sabino Domingues
José Pedro Pereira Amorim
Hugo Duarte
João Santos
Pedro Manuel Henriques da Cunha Abreu
Inês Campos Monteiro Sabino Domingues
José Pedro Pereira Amorim
Hugo Duarte
João Santos
Pedro Manuel Henriques da Cunha Abreu
Abstract
The machine learning field has grown considerably in the last years. There are, however, some problems still to be solved. The characteristics of the training sets, for instance, are known to affect the classifiers performance. Here, and inspired by medical applications, we are interested in studying datasets that are both ordinal and imbalanced. Ordinal datasets present labels where only the relative ordering between different values is significant. Imbalanced datasets have very different quantity of examples per class.Building upon our previous work, we make three new contributions, (1) extend the number of classifiers, (2) evaluate two techniques to balance intermediate train sets in binary decomposition methods (often used in multi-class contexts and ordinal ones in particular), and (3) propose a new, iterative, classifier-based over-sampling algorithm that we name InCuBAtE. Experiments were made on 6 private datasets, concerning the assessment of response to treatment on oncologic diseases, and 15 public datasets widely used in the literature. When compared with our previous work, results have improved (or remained the same) for 4 of the 6 private datasets and for 11 out of the 15 public datasets.