Bootstrap analysis of multiple repetitions of experiments using an interval-valued multiple comparison procedure

Author	José Otero Luciano Sánchez Inés Couso Ana Palacios
Keywords	Cross validation Statistical comparisons of algorithms Tests for interval-valued data
Abstract	A new bootstrap test is introduced that allows for assessing the significance of the differences between stochastic algorithms in a cross-validation with repeated folds experimental setup. Intervals are used for modeling the variability of the data that can be attributed to the repetition of learning and testing stages over the same folds in cross validation. Numerical experiments are provided that support the following three claims: (1) Bootstrap tests can be more powerful than ANOVA or Friedman test for comparing multiple classifiers. (2) In the presence of outliers, interval-valued bootstrap tests achieve a better discrimination between stochastic algorithms than nonparametric tests. (3) Choosing ANOVA, Friedman or Bootstrap can produce different conclusions in experiments involving actual data from machine learning tasks.
Year of Publication	2014
Journal	Journal of Computer and System Sciences
Volume	80
Number of Pages	88-100
ISSN Number	0022-0000
URL	http://www.sciencedirect.com/science/article/pii/S0022000013000731
DOI	10.1016/j.jcss.2013.03.009
Download citation	DOI Google Scholar BibTeX

Author

José Otero

Luciano Sánchez

Inés Couso

Ana Palacios

Keywords

Cross validation

Statistical comparisons of algorithms

Tests for interval-valued data

Abstract

A new bootstrap test is introduced that allows for assessing the significance of the differences between stochastic algorithms in a cross-validation with repeated folds experimental setup. Intervals are used for modeling the variability of the data that can be attributed to the repetition of learning and testing stages over the same folds in cross validation. Numerical experiments are provided that support the following three claims: (1) Bootstrap tests can be more powerful than ANOVA or Friedman test for comparing multiple classifiers. (2) In the presence of outliers, interval-valued bootstrap tests achieve a better discrimination between stochastic algorithms than nonparametric tests. (3) Choosing ANOVA, Friedman or Bootstrap can produce different conclusions in experiments involving actual data from machine learning tasks.

Year of Publication

2014

Journal

Journal of Computer and System Sciences

Volume

80

Number of Pages

88-100

ISSN Number

0022-0000

URL

http://www.sciencedirect.com/science/article/pii/S0022000013000731

DOI

10.1016/j.jcss.2013.03.009

Download citation

Bootstrap analysis of multiple repetitions of experiments using an interval-valued multiple comparison procedure

Location

Resources

User account menu

🍪 Cookie Notice