Learning Bayesian Network Structure Using Genetic Algorithm with Consideration of the Node Ordering via Principal Component Analysis

Authors

1 Department of Statistics‎, ‎Faculty of Mathematics and Computer Sciences‎, ‎Allameh Tabataba'i University‎, ‎Tehran‎, ‎Iran and School of Biological sciences‎, ‎Institute for Research in Fundamental Science (IPM)‎, ‎Tehran‎, ‎Iran

2 Department of Information Technology Management‎, ‎Faculty of Management‎, ‎Kharazmi University‎, ‎Tehran‎, ‎Iran.

3 Department of Biostatistics‎, ‎SBMU School of Para-Medical Sciences‎, ‎Shahid Beheshti University‎, ‎Tehran‎, ‎Iran.

4 The National Organization for Educational Testing (NOET)‎, ‎Ministry of Science‎, ‎Research and Technology‎, ‎Tehran‎, ‎Iran.

10.18869/acadpub.jirss.15.2.45

Abstract

‎The most challenging task in dealing with Bayesian networks is learning their structure‎. ‎Two classical approaches are often used for learning Bayesian network structure;‎ ‎Constraint-Based method and Score-and-Search-Based one‎. ‎But neither the first nor the second one are completely satisfactory‎. ‎Therefore the heuristic search such as Genetic Algorithms with a fitness score function is considered for learning Bayesian network structure‎. ‎To assure the closeness of the genetic operators‎, ‎the ordering among variables (nodes) must be determined.

‎In this paper‎, ‎we determine the  node ordering by considering the Principal Component Analysis (PCA)‎. ‎For this purpose we first determine the appropriate correlation between variables and then use the absolute value of variable's coefficients in the first component‎. ‎It means that a node X_i can only have the node X_j as a parent‎, ‎if the absolute value of   coefficient X_j in first component will be higher than X_i.  ‎We then use the Genetic Algorithm with fitness score BIC regarding the  node ordering  to construct the Bayesian Network. ‎Experimental results over well-known networks  Asia‎, ‎Alarm and Hailfinder  show that our new technique  has higher accuracy and better degree of data matching‎.  ‎In addition‎,  ‎we apply our technique to the  real data set which is related to Bank's debtor that owe over 500 million Rials to Bank Maskan (Housing Bank) in Iran‎. ‎Results also show that the proposed technique has greater modeling power than other node ordering techniques such as   Hruschka et al. (2007), Chen et al. (2008) and K2 algorithm‎.

Keywords

Volume 15, Issue 2
August 2016
Pages 45-61
  • Receive Date: 23 July 2022
  • Revise Date: 12 May 2024
  • Accept Date: 23 July 2022