Mutual Information-Based Feature Selection in Fuzzy Databases Applied to Searching for the Best Code Metrics in Automatic Grading

TitleMutual Information-Based Feature Selection in Fuzzy Databases Applied to Searching for the Best Code Metrics in Automatic Grading
Publication TypeConference Paper
Year of Publication2014
AuthorsOtero, José, Suárez Rosario, and Sánchez Luciano
EditorPolycarpou, Marios, de Carvalho André C. P. L. F., Pan Jeng-Shyang, Woźniak Michał, Quintián Héctor, and Corchado Emilio
Conference NameHybrid Artificial Intelligence Systems
Pagination330–341
PublisherSpringer International Publishing
Conference LocationCham
ISBN Number978-3-319-07617-1
Abstract

Massive open online courses have a large impact in developing countries, helping to improve education in poor regions. However, instructors cannot review open-ended work from students as they do in smaller class settings. In the context of computer science courses where students' code is reviewed, there are some methods that use code metrics for automatically providing the student with a qualification. Notwithstanding this, a high number of incomplete and conflicting sources of information must be combined in the prediction process, and it may happen that there are too many variables involved to make any meaningful predictions. In this work a new method is proposed for sorting a set of metrics by their relevance in the prediction of student qualifications that can cope with incomplete and imprecise results. Measurements taken on variable-sized sets of assignments are aggregated into fuzzy values, and a fuzzy random variable-based definition of mutual information is used to build a partial ranking of metrics according to their predictive power. The most relevant metrics are fed to a genetic fuzzy system that models the dependence between the fuzzy code metrics and the qualifications of the corresponding students. A set comprising 800 source code files, collected in classroom Computer Science lectures taught between 2013 and 2014, was used for validating the hypotheses of this research, and it was found that the new ranking method significantly improves the predictive capability of the models.