On the Impact of Dataset Complexity and Sampling Strategy in Multilabel Classifiers Performance

Francisco Charte; A.J. Rivera-Rivas; M. J. del Jesus; F. Herrera

Submitted by fcharte on Thu, 28/02/2019 - 14:01

Title	On the Impact of Dataset Complexity and Sampling Strategy in Multilabel Classifiers Performance
Publication Type	Conference Paper
Year of Publication	2016
Authors	Charte, Francisco, Rivera-Rivas A.J., del Jesus M. J., and Herrera F.
Conference Name	11th International Conference on Hybrid Artificial Intelligent Systems, HAIS 2016
Pagination	500–511
Date Published	4
Conference Location	Seville (Spain)
ISBN Number	978-3-319-32033-5
Abstract	Multilabel classification (MLC) is an increasingly widespread data mining technique. Its goal is to categorize patterns in several non-exclusive groups, and it is applied in fields such as news categorization, image labeling and music classification. Comparatively speaking, MLC is a more complex task than multiclass and binary classification, since the classifier must learn the presence of various outputs at once from the same set of predictive variables. The own nature of the data the classifier has to deal with implies a certain complexity degree. How to measure this complexness level strictly from the data characteristics would be an interesting objective. At the same time, the strategy used to partition the data also influences the sample patterns the algorithm has at its disposal to train the classifier. In MLC random sampling is commonly used to accomplish this task. This paper introduces TCS (Theoretical Complexity Score), a new characterization metric aimed to assess the intrinsic complexity of a multilabel dataset, as well as a novel stratified sampling method specifically designed to fit the traits of multilabeled data. A detailed description of both proposals is provided, along with empirical results of their suitability for their respective duties.
Notes	TIN2014-57251-P,TIN2012-33856,P10-TIC-06858,P11-TIC-7765
DOI	10.1007/978-3-319-32034-2_42

Fichero:

Complexity.pdf