|Title||Machine learning models, epistemic set-valued data and generalized loss functions: An encompassing approach|
|Publication Type||Journal Article|
|Year of Publication||2016|
|Authors||Couso, Inés, and Sánchez Luciano|
|Pagination||129 - 150|
|Keywords||classification, Generalized stochastic ordering, Loss function, Low-quality data, Regression, Set-valued data|
We study those problems where the goal is to find “optimal” models with respect to some specific criterion, in regression and supervised classification problems. Alternatives to the usual expected loss minimization criterion are proposed, and a general framework where this criterion can be seen as a particular instance of a general family of criteria is provided. In the new setting, each model is formally identified with a random variable that associates a loss value to each individual in the population. Based on this identification, different stochastic orderings between random variables lead to different criteria to compare pairs of models. Our general setting encompasses the classical criterion based on the minimization of the expected loss, but also other criteria where a numerical loss function is not available, and therefore the computation of its expectation does not make sense. The presentation of the new framework is divided into two stages. First, we consider the new framework under standard situations about the sample information, where both the collection of attributes and the response variables are observed with precision. Then, we assume that just incomplete information about them (expressed in terms of set-valued data sets) is provided. We cast some comparison criteria from the recent literature on learning methods from low-quality data as particular instances of our general approach.