Synthetic Sample Generation for Label Distribution Learning

Author	M Gonzalez Julián Luengo José Ramón Cano De Amo Salvador García López
Keywords	Data pre-processing Label Distribution Learning machine learning Oversampling
Abstract	Label Distribution Learning (LDL) is a general learning framework that assigns an instance to a distribution over a set of labels rather than a single label or multiple labels. Current LDL methods have proven their effectiveness in many machine learning applications. As of the first formulation of the LDL problem, numerous studies have been carried out that apply the LDL methodology to various real-life problem solving. Others have focused more specifically on the proposal of new algorithms. The purpose of this article is to start addressing the LDL problem as of the data pre-processing stage. The baseline hypothesis is that, due to the high dimensionality of existing LDL data sets, it is very likely that this data will be incomplete and/or that poor data quality will lead to poor performance once applied to the learning algorithms. In this paper, we propose an oversampling method, which creates a superset of the original dataset by creating new instances from existing ones. Then, we apply already existing algorithms to the pre-processed training set in order to validate the effcacy of our method. The effectiveness of the proposed SSG-LDL is verified on several LDL datasets, showing significant improvements to the state-of-the-art LDL methods.
Year of Publication	2021
Journal	Information Sciences
Volume	544
Number of Pages	197-213
Date Published	01/2021
DOI	10.1016/j.ins.2020.07.071
Download citation	DOI Google Scholar BibTeX
Notes	TIN2017-89517-P
Notes	TIN2017-89517-P
Bibliography media	Document 1-s2.0-S0020025520307544-main.pdf Document 1-s2.0-S0020025520307544-main.pdf

Author

M Gonzalez

Julián Luengo

José Ramón Cano De Amo

Salvador García López

Keywords

Data pre-processing

Label Distribution Learning

machine learning

Oversampling

Abstract

Label Distribution Learning (LDL) is a general learning framework that assigns an instance to a distribution over a set of labels rather than a single label or multiple labels. Current LDL methods have proven their effectiveness in many machine learning applications. As of the first formulation of the LDL problem, numerous studies have been carried out that apply the LDL methodology to various real-life problem solving. Others have focused more specifically on the proposal of new algorithms. The purpose of this article is to start addressing the LDL problem as of the data pre-processing stage. The baseline hypothesis is that, due to the high dimensionality of existing LDL data sets, it is very likely that this data will be incomplete and/or that poor data quality will lead to poor performance once applied to the learning algorithms. In this paper, we propose an oversampling method, which creates a superset of the original dataset by creating new instances from existing ones. Then, we apply already existing algorithms to the pre-processed training set in order to validate the effcacy of our method. The effectiveness of the proposed SSG-LDL is verified on several LDL datasets, showing significant improvements to the state-of-the-art LDL methods.

Year of Publication

2021

Journal

Information Sciences

Volume

544

Number of Pages

197-213

Date Published

01/2021

DOI

10.1016/j.ins.2020.07.071

Download citation

Notes

TIN2017-89517-P

Notes

TIN2017-89517-P

Bibliography media

Document

1-s2.0-S0020025520307544-main.pdf

Document

1-s2.0-S0020025520307544-main.pdf

Synthetic Sample Generation for Label Distribution Learning

Location

Resources

User account menu

🍪 Cookie Notice