Authors
1 Department of Statistics, Faculty of Mathematics and Computer Sciences, Allameh Tabataba'i University, Tehran, Iran and School of Biological sciences, Institute for Research in Fundamental Science (IPM), Tehran, Iran
2 Department of Information Technology Management, Faculty of Management, Kharazmi University, Tehran, Iran.
3 Department of Biostatistics, SBMU School of Para-Medical Sciences, Shahid Beheshti University, Tehran, Iran.
4 The National Organization for Educational Testing (NOET), Ministry of Science, Research and Technology, Tehran, Iran.
Abstract
The most challenging task in dealing with Bayesian networks is learning their structure. Two classical approaches are often used for learning Bayesian network structure; Constraint-Based method and Score-and-Search-Based one. But neither the first nor the second one are completely satisfactory. Therefore the heuristic search such as Genetic Algorithms with a fitness score function is considered for learning Bayesian network structure. To assure the closeness of the genetic operators, the ordering among variables (nodes) must be determined.
In this paper, we determine the node ordering by considering the Principal Component Analysis (PCA). For this purpose we first determine the appropriate correlation between variables and then use the absolute value of variable's coefficients in the first component. It means that a node X_i can only have the node X_j as a parent, if the absolute value of coefficient X_j in first component will be higher than X_i. We then use the Genetic Algorithm with fitness score BIC regarding the node ordering to construct the Bayesian Network. Experimental results over well-known networks Asia, Alarm and Hailfinder show that our new technique has higher accuracy and better degree of data matching. In addition, we apply our technique to the real data set which is related to Bank's debtor that owe over 500 million Rials to Bank Maskan (Housing Bank) in Iran. Results also show that the proposed technique has greater modeling power than other node ordering techniques such as Hruschka et al. (2007), Chen et al. (2008) and K2 algorithm.
Keywords