Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches

TitleAnalysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches
Publication TypeJournal Article
Year of Publication2013
AuthorsFernandez, Alberto, López Victoria, Galar Mikel, del Jesus M. J., and Herrera Francisco
JournalKnowledge-Based Systems
Volume42
Pagination97 - 110
ISSN0950-7051
KeywordsCost-sensitive learning, Imbalanced data-sets, Multi-classification, Pairwise learning, Preprocessing
Abstract

The imbalanced class problem is related to the real-world application of classification in engineering. It is characterised by a very different distribution of examples among the classes. The condition of multiple imbalanced classes is more restrictive when the aim of the final system is to obtain the most accurate precision for each of the concepts of the problem. The goal of this work is to provide a thorough experimental analysis that will allow us to determine the behaviour of the different approaches proposed in the specialised literature. First, we will make use of binarization schemes, i.e., one versus one and one versus all, in order to apply the standard approaches to solving binary class imbalanced problems. Second, we will apply several ad hoc procedures which have been designed for the scenario of imbalanced data-sets with multiple classes. This experimental study will include several well-known algorithms from the literature such as decision trees, support vector machines and instance-based learning, with the intention of obtaining global conclusions from different classification paradigms. The extracted findings will be supported by a statistical comparative analysis using more than 20 data-sets from the KEEL repository.

URLhttp://www.sciencedirect.com/science/article/pii/S0950705113000300
DOI10.1016/j.knosys.2013.01.018