Abstract

Emerging pattern mining is a data mining task that belongs to the supervised descriptive rule discovery framework. Its objective is to find rules that describe emerging behaviour or differentiating characteristics with respect to a property of interest. A Multi-Objective Evolutionary Algorithm for the Extraction of Fuzzy Emerging Patterns (MOEA-EFEP) is described and analysed in this paper. MOEA-EFEP is the first multi-objective evolutionary algorithm proposed for emerging pattern mining. This approach allows us to get rules whose descriptions of the emerging phenomena are simpler than previous approaches. It is based on the well-known NSGA-II algorithm adapted for the extraction of emerging patterns. The proposal also uses fuzzy logic to deal with numeric variables in order to obtain a knowledge representation close to human reasoning.

An experimental study was performed to verify the validity of the proposal. Firstly, it presents a comparison of different rule representations and post-processing filter strategies, in order to determine an optimal configuration of the proposal. Finally, it is compared with other algorithms for emerging pattern mining in order to determine the quality of the knowledge extracted. The results show that MOEA-EFEP obtains rules with a better description of the emerging or discriminative behaviour than other algorithms of the task. The conclusions of this study are supported by the use of statistical tests.

Table of contents

Suplementary files

Below it is presented the suplementary files in order to replicate the study.

Datasets

The datasets used on this experimental study can be downloaded on this link.

For iEPMiner, numeric variables of the dataset have been discretised using the Fayyad discretise method. On this file, discretised dataset are availeble on the Discretised folder, and the normal dataset on the NonDiscretised folder.

The dataset has been partitioned following a 5-fold distribution optimally balanced cross validation procedure. For each dataset, a file for each training/test partition and fold are presented.

Rules extracted by all algorithms

The rules extracted by EvAEP, iEPMiner and FEPM on the bands, german and iris datasets can be downloaded on this link.

Source code

MOEA-EFEP has been written in Scala 2.10.6, below, we attached the source code in order to be used on your favourite IDE. Download link.

Results of the algorithms

In this section, the results obtained by the different algorithms on each dataset are presented. A table for each algorithm is presented. The algorithms used in the different studies of the paper are: EvAEP, iEPMiner, FEPM, MOEA-EFEP with canonical rules and MOEA-EFEP with DNF rules. In addition, the quality measures presented are: number of rules (NRULES), number of variables (NVAR), WRAcc, Con, GR, TPR and FPR.

Results of EvAEP

Below, the results for the EvAEP algorithm are presented:

Results of iEPMiner

Results obtained for iEPMiner are presented below:

Results of FEPM

The table presented below shows the results of FEPM:

Results of MOEA-EFEP with canonical representation

The table below presents the results obtained by MOEA-EFEP using the canonical (conjunctionf of attribute-value pairs) representation:

Results of MOEA-EFEP with DNF representation

Below, resuls for MOEA-EFEP with DNF (disjunctive normal form) knowledge representation is presentend:

Selection of the best knowledge representation for MOEA-EFEP

Here they are presented the results of the statistcal for the comparison of knowledge representation for MOEA-EFEP. The test used is the Wilcoxon signed rank test.

Selection of the best filter strategy

In this section, the results of the different filter strategies proposed for a post-processing filter on MOEA-EFEP are presented. The proposed filters are: minimal emerging patterns, maximal emerging patterns and rules with a confidence grater than a 60% on training. These results have been obtained by the MOEA-EFEP algorithm with a DNF rule representation.

Minimal emerging patterns filter

Results presented below show belongs to the minimal emerging pattern filter application:

Maximal emerging patterns filter

Below, the results of the application of the maximal emerging patterns filter are presented:

Confidence filter results

The results of the application of a filter based on a confidence threshold of a 60% are presented below:

Statistical results

In this section the statistical results are shown. First, the results of the Iman-Davenport test for each of the analysed measures are presented:

After that, the results obtained by the application of the Holm test for each quality measure are shown below:

Confidence

WRAcc

GR

TPR

FPR

Comparison against other algorithms

Here we presents the results of the statistical tests carried out for the comparison of MOEA-EFEP against the most relevant EPM algorithm developed so far. First, the results of the Iman-Davenport for each quality measure analysed is presented below:

After that, it is necessary the application of the Holm post-hoc in order to determine the best algorithm on each quality measure analysed. The results of this test for each measure analysed are presented below:

Execution time analysis

Here the complete table regarding to the execution time analysis is carried. In this table it is presented the average execution of each method on each dataset