Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks

Ramos-López, Darío and Maldonado, Ana D. (2021) Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks. Mathematics, 9 (2). p. 156. ISSN 2227-7390

[thumbnail of mathematics-09-00156.pdf] Text
mathematics-09-00156.pdf - Published Version

Download (531kB)

Abstract

Multi-class classification in imbalanced datasets is a challenging problem. In these cases, common validation metrics (such as accuracy or recall) are often not suitable. In many of these problems, often real-world problems related to health, some classification errors may be tolerated, whereas others are to be avoided completely. Therefore, a cost-sensitive variable selection procedure for building a Bayesian network classifier is proposed. In it, a flexible validation metric (cost/loss function) encoding the impact of the different classification errors is employed. Thus, the model is learned to optimize the a priori specified cost function. The proposed approach was applied to forecasting an air quality index using current levels of air pollutants and climatic variables from a highly imbalanced dataset. For this problem, the method yielded better results than other standard validation metrics in the less frequent class states. The possibility of fine-tuning the objective validation function can improve the prediction quality in imbalanced data or when asymmetric misclassification costs have to be considered.

Item Type: Article
Uncontrolled Keywords: multi-class classification; imbalanced data; Bayesian networks; variable selection
Subjects: STM Repository > Mathematical Science
Depositing User: Managing Editor
Date Deposited: 15 Nov 2022 04:45
Last Modified: 26 Oct 2024 04:13
URI: http://classical.goforpromo.com/id/eprint/1637

Actions (login required)

View Item
View Item