Mutual Information-Based Feature Selection in Fuzzy Databases Applied to Searching for the Best Code Metrics in Automatic Grading

Otero, José; Suárez, Rosario; Sánchez, Luciano; Polycarpou, Marios; de Carvalho, André C. P. L. F.; Pan, Jeng-Shyang; Woźniak, Michał; Quintián, Héctor; Corchado, Emilio

Submitted by fjpr0013 on Tue, 23/04/2019 - 09:17

Title	Mutual Information-Based Feature Selection in Fuzzy Databases Applied to Searching for the Best Code Metrics in Automatic Grading
Publication Type	Conference Paper
Year of Publication	2014
Authors	Otero, José, Suárez Rosario, and Sánchez Luciano
Editor	Polycarpou, Marios, de Carvalho André C. P. L. F., Pan Jeng-Shyang, Woźniak Michał, Quintián Héctor, and Corchado Emilio
Conference Name	Hybrid Artificial Intelligence Systems
Pagination	330–341
Publisher	Springer International Publishing
Conference Location	Cham
ISBN Number	978-3-319-07617-1
Abstract	Massive open online courses have a large impact in developing countries, helping to improve education in poor regions. However, instructors cannot review open-ended work from students as they do in smaller class settings. In the context of computer science courses where students' code is reviewed, there are some methods that use code metrics for automatically providing the student with a qualification. Notwithstanding this, a high number of incomplete and conflicting sources of information must be combined in the prediction process, and it may happen that there are too many variables involved to make any meaningful predictions. In this work a new method is proposed for sorting a set of metrics by their relevance in the prediction of student qualifications that can cope with incomplete and imprecise results. Measurements taken on variable-sized sets of assignments are aggregated into fuzzy values, and a fuzzy random variable-based definition of mutual information is used to build a partial ranking of metrics according to their predictive power. The most relevant metrics are fed to a genetic fuzzy system that models the dependence between the fuzzy code metrics and the qualifications of the corresponding students. A set comprising 800 source code files, collected in classroom Computer Science lectures taught between 2013 and 2014, was used for validating the hypotheses of this research, and it was found that the new ranking method significantly improves the predictive capability of the models.