|Title||Mutual Information-Based Feature Selection in Fuzzy Databases Applied to Searching for the Best Code Metrics in Automatic Grading|
|Publication Type||Conference Paper|
|Year of Publication||2014|
|Authors||Otero, José, Suárez Rosario, and Sánchez Luciano|
|Editor||Polycarpou, Marios, de Carvalho André C. P. L. F., Pan Jeng-Shyang, Woźniak Michał, Quintián Héctor, and Corchado Emilio|
|Conference Name||Hybrid Artificial Intelligence Systems|
|Publisher||Springer International Publishing|
Massive open online courses have a large impact in developing countries, helping to improve education in poor regions. However, instructors cannot review open-ended work from students as they do in smaller class settings. In the context of computer science courses where students' code is reviewed, there are some methods that use code metrics for automatically providing the student with a qualification. Notwithstanding this, a high number of incomplete and conflicting sources of information must be combined in the prediction process, and it may happen that there are too many variables involved to make any meaningful predictions. In this work a new method is proposed for sorting a set of metrics by their relevance in the prediction of student qualifications that can cope with incomplete and imprecise results. Measurements taken on variable-sized sets of assignments are aggregated into fuzzy values, and a fuzzy random variable-based definition of mutual information is used to build a partial ranking of metrics according to their predictive power. The most relevant metrics are fed to a genetic fuzzy system that models the dependence between the fuzzy code metrics and the qualifications of the corresponding students. A set comprising 800 source code files, collected in classroom Computer Science lectures taught between 2013 and 2014, was used for validating the hypotheses of this research, and it was found that the new ranking method significantly improves the predictive capability of the models.