CISUC

Evaluation of Oversampling Data Balancing Techniques in the Context of Ordinal Classification

Authors

Abstract

The machine learning field has grown considerably in the last years. There are, however, some problems still to be solved. The characteristics of the training sets, for instance, are known to affect the classifiers performance. Here, and inspired by medical applications, we are interested in studying datasets that are both ordinal and imbalanced. Ordinal datasets present labels where only the relative ordering between different values is significant. Imbalanced datasets have very different quantity of examples per class.

Building upon our previous work, we make three new contributions, (1) extend the number of classifiers, (2) evaluate two techniques to balance intermediate train sets in binary decomposition methods (often used in multi-class contexts and ordinal ones in particular), and (3) propose a new, iterative, classifier-based over-sampling algorithm that we name InCuBAtE. Experiments were made on 6 private datasets, concerning the assessment of response to treatment on oncologic diseases, and 15 public datasets widely used in the literature. When compared with our previous work, results have improved (or remained the same) for 4 of the 6 private datasets and for 11 out of the 15 public datasets.

Keywords

diseases;learning (artificial intelligence);medical computing;pattern classification;sampling methods;oversampling data balancing techniques;ordinal classification;data imbalance;dataset;classifier;oversampling strategies;multiclass tasks;medical applications;private datasets;data balance techniques;classification results;ordinal imbalanced datasets;public datasets;MMAE;oncologic diseases;Diseases;Toy manufacturing industry;Pipelines;Bibliographies;Automobiles;Encoding;Decoding

Conference

2018 International Joint Conference on Neural Networks (IJCNN), January 2019

DOI


Cited by

No citations found