Addressing Imbalance in Multilabel Classification: Measures and Random Preprocessing Methods

This website contains additional material to the paper: F. Charte, A.J. Rivera, M.J. del Jesus, and F. Herrera "Addressing Imbalance in Multilabel Classification: Measures and Random Preprocessing Methods". Neurocomputing, Volume 163, p.3-16, (2015) .


Learning from imbalanced datasets is a problem thoroughly studied in binary classification, and to a lesser extent in multiclass classification. Although most multilabel datasets suffer from a high imbalance level, the proposals on how to measure this characteristic and how to deal with this issue are scant.

The purpose of this paper is to present measures aimed to assess the imbalance level in multilabel datasets, as well as to propose several preprocessing algorithms designed to reduce it. Two of the proposed methods are random undersampling algorithms, called LP-RUS and ML-RUS, while the other two accomplish random oversampling, LP-ROS and ML-ROS. All of them are experimentally tested and their effectiveness is statistically evaluated. From the results obtained, a set of guidelines directed to show when these methods should be applied is also provided.

Top of page

Algorithms proposed in the paper

Four preprocesing algorithms aimed to reduce the imbalance level in multilabel datasets are proposed. Two of them are based on the LP (Label Powerset) transformation, whereas the other two perform individual label imbalance analysis. All of them depend on one parameter P, which establishes the percentage of instances to remove or produce.

Top of page


The experimentation was conducted using 13 datasets from the MULAN and MEKA repositories. Each one of these datasets has been partitioned randomly twice in five separate partitions aiming to do a 2x5 folds cross validation, which means 10 runs of every algorithm for each dataset. These partitions are available to download.

Datasets and their characteristics - Download
Dataset# instances# features# labelsCard
Top of page

Experimentation results

Top of page

Page created and maintained by Francisco Charte - 2013