Complementary information for the paper submitted "The Influence of Noise on the Evolutionary Fuzzy Systems for Subgroup Discovery"

The Influence of Noise on the Evolutionary Fuzzy Systems for Subgroup Discovery

Abstract

External factors such as the presence of noise in data can affect the data mining process. This is a common problem that produces several negative consequences which involves errors in the data collection, preparation and, above all, in the results obtained by the data mining techniques employed. The capabilities of the models built under such circumstances will depend heavily on the quality of the training data. Hence, problems containing noise are complex problems and accurate solutions are often difficult to achieve. A particular supervised learning field like subgroup discovery has overlooked the analysis of noise and its impact on the descriptions obtained.

This paper presents a complete analysis of the impact of noise on the most relevant evolutionary fuzzy systems for subgroup discovery. We also focus on how filtering techniques, devised for predictive tasks, may alleviate the impact of noise on descriptive fields such as Subgroup Discovery. Specifically, the analysis is carried out using recent filtering techniques for several class noise levels. The results obtained show two different behaviour; on the one hand, the SDIGA and NMEEFSD algorithms present a decrease in the quality of the subgroups when the noise is increased, making necessary the application of noise filtering in order to compensate for this loss of quality. On the other hand, the FuGePSD algorithm demonstrates its great capacity to work in noisy environments without the necessity of using a preliminary filter. The study is completed with an analysis of the interpretability under the influence of noise focused on the number of rules and variables.

 

Experimental Analysis in the Application of Noise Filtering for Subgroup Discovery Algorithms

 

This complementary material includes the complete experimental results for all algorithms and datasets employed in the study.

Results obtained in datasets without filters

 

Results obtained in datasets with the Cross-Validated Committees filter (CVCF)

 

Results obtained in datasets with the Ensemble filter (EF)

 

Results obtained in datasets with the Iterative-Partitioning filter (IPF)

 

Paper published in Soft Computing