Preprocessing vague imbalanced datasets and its use in genetic fuzzy classifiers

TitlePreprocessing vague imbalanced datasets and its use in genetic fuzzy classifiers
Publication TypeConference Paper
Year of Publication2010
AuthorsPalacios, A. M., Sánchez L., and Couso I.
Conference NameInternational Conference on Fuzzy Systems
Date PublishedJuly
KeywordsClassification algorithms, Context, data handling, Euclidean distance, fuzzy set theory, Fuzzy systems, genetic algorithms, genetic fuzzy classifier, genetic fuzzy system, Genetics, imbalanced dataset preprocessing, minimum error based classification system, Nearest neighbor searches, objective function, pattern classification, Pediatrics, Training

When there is a substantial difference between the number of cases of the majority and minority classes, minimum error-based classification systems tend to overlook these last instances. This can be corrected either by preprocessing the dataset or by altering the objective function of the classifier. In this paper we analyze the first approach, in the context of genetic fuzzy systems (GFS), and in particular of those that can operate with imprecisely observed and low quality data. We will analyze the different preprocessing mechanisms of imbalanced datasets and will show the necessity of extending these for solving those problems where the data is both imprecise and im-balanced. In addition, we include a comprehensive description of a new algorithm, able to preprocess imprecise imbalanced datasets. Several real-world datasets are used to evaluate the proposal.