Concurrence among Imbalanced Labels and its Influence on Multilabel Resampling Algorithms

This website contains additional material to the paper: F. Charte, A.J. Rivera, M.J. del Jesus, and F. Herrera "Concurrence among Imbalanced Labels and its Influence on Multilabel Resampling Algorithms". Proceedings of the 9th International Conference on Hybrid Artificial Intelligent Systems (HAIS 2014), June, Volume 8480, Salamanca (Spain), p.110-121, (2014) .

Abstract

In the context of multilabel classification, the learning from imbalanced data is getting considerable attention recently. Several algorithms to face this problem have been proposed in the late five years, as well as various measures to assess the imbalance level. Some of the proposed methods are based on resampling techniques, a very well-known approach whose utility in traditional classification has been proven.

This paper aims to describe how a specific characteristic of multilabel datasets (MLDs), the level of concurrence among imbalanced labels, could have a great impact in resampling algorithms behavior. Towards this goal, a measure named SCUMBLE, designed to evaluate this concurrence level, is proposed and its usefulness is experimentally tested. As a result, a straightforward guideline on the effectiveness of multilabel resampling algorithms depending on MLDs characteristics can be inferred.

Top of page

Datasets

The experimentation was conducted using the following 6 multilabel datasets. Each one of these datasets has been partitioned randomly twice in five separate partitions aiming to do a 2x5 folds cross validation, which means 10 runs of every algorithm for each dataset. These partitions are available to download.

Datasets and their characteristics - Download
Dataset# instances# features# labelsCard
cal5005026817426.044
corel5k50004993743.522
enron17021001533.378
genbase6621186271.252
medical9781449451.245
yeast2417198144.237
Top of page

Color version of figures

Concurrence among labels in genbase dataset
Concurrence among labels in genbase dataset

Concurrence among labels in yeast dataset
Concurrence among labels in yeast dataset

SCUMBLE vs changes in imbalance level after applying LP-ROS
SCUMBLE vs changes in imbalance level after applying LP-ROS

SCUMBLE vs changes in imbalance level after applying LP-RUS
SCUMBLE vs changes in imbalance level after applying LP-RUS

Page created and maintained by Francisco Charte - 2014