A Model for Determining Insured Premiums Based on Household Expenses Using Advanced Computational Techniques under Heterogeneous Data Conditions

Document Type : Original Article

Authors
Department of Industrial Engineering, Yazd University, Iran
Abstract
The escalating public health costs are a significant concern for governments globally. The efficient management of those costs is critical, with health insurance systems playing a pivotal role. However, the insurance industry faces challenges due to the heterogeneous data, leading to inconsistent outputs for identical inputs. Traditional predictive methods such as Artificial Neural Networks and Adaptive Neuro-Fuzzy Inference Systems (ANFIS) often fail to address these inconsistencies. This study proposes a novel two-stage model to determine insurance premiums, incorporating equity considerations and advanced computational techniques. We advocate for an expenditure-based premium calculation as a superior alternative to the traditional salary-based approach. This method aligns premiums more closely with household expenses, promoting fairness and efficiency. Our results demonstrate that the expenditure-based strategy outperforms the salary-based one in controlling costs for both the insured and the insurer. Specifically, the error metrics, including Mean Absolute Error and Root Mean Square Error, show significant improvement in our model compared to the ANFIS method. To enhance the model's accuracy, we integrate sampling techniques to mitigate the data heterogeneity and employ genetic algorithms to optimize the weights of the neural network. The genetic algorithm iteratively evolves the network parameters, ensuring robust performance even in diverse data. Our findings indicate that this integrated approach significantly reduces prediction errors and enhances the overall reliability of the premium calculation process. In conclusion, the proposed model offers a robust framework for premium determination, addressing the inherent data heterogeneity in the insurance industry. This study provides a valuable contribution to the field by demonstrating a practical and effective solution for improving the accuracy and fairness of insurance premium calculations.
Keywords
Subjects

Abdi, F., Khalili-Damghani, K., and Abolmakarem, S. (2017). Solving Customer Insurance Coverage Sales Plan Problem Using a Multi-Stage Data Mining Approach. Kybernetes, 47(1), 2–19.
Boodhun, N., and Jayabalan, M. (2018). Risk prediction in life insurance industry using supervised learning algorithms.
Brofer, A., Rezaian, A., and Shokoohyar, S. (2017). Identification of Customer Behavior Pattern in Life Insurance and Capital Formation Using Data Mining. Management Research in Iran, 20(4), 65–94.
Folland, S., Goodman, A., and Stano, M. (2016). The Economics of Health and Health Care. Routledge. [https://doi.org/10.4324/9781315510736](https://doi.org/10.4324/9781315510736).
Frost, J. (2019). Heterogeneity. statisticsbyjim.com/basics/heterogeneity.
Ghuse, N., Pawar, P., and Potgantwar, A. (2017). An Improved Approach for Fraud Detection in Health Insurance Using Data Mining Techniques. International Journal of Scientific Research in Network Security and Communication, 5(5).
Goel, S., and Chaudhary, A. (2024). Prediction of Health Insurance Price using Machine Learning Algorithms. INDIACom, 2024. DOI:10.23919/INDIACom61295.2024.10498661.
Goodarzi, A., and Janat Babaei, S. (2016). Evaluation of Decision Tree Algorithms, Naive Bayes and Logistic Regression in Detection of Car Insurance Frauds. Insurance Research Quarterly, 1(2), 61–80.
Jones, K. I., and Swati, S. (2023). The Implementation of Machine Learning in the Insurance Industry With Big Data Analytics. International Journal of Data Informatics and Intelligent Computing, 2(2), 21–38.
Kalra, M., Lal, N., and Qamar, S. (2018). K-Mean Clustering Algorithm for Mining Heterogeneous Data. Information and Communication Technology for Sustainable Development. DOI:10.1007/978-981-10-3920-1-7.
Kalra, H., Singh, R., and Kumar, T. S. (2022). Fraud Claims Detection in Insurance Using Machine Learning. Journal of Pharmaceutical Negative Results. [https://doi.org/10.47750/pnr.2022.13.S03.053](https://doi.org/10.47750/pnr.2022.13.S03.053).
Kumar Dubey, A., Kumar Dubey, A. N., Agarwal, V., and Khandagre, Y. (2012). Knowledge discovery with a subset–superset approach for Mining Heterogeneous Data. CSI Sixth International Conference on Software Engineering (CONSEG). DOI:10.1109/CONSEG.2012.6349495.
Özgur, B., and Yolcu, U. (2023). Prediction of the Premium Production of Insurance Companies Operating in Turkey Using Artificial Neural Networks. Turkish Journal of Forecasting. [https://doi.org/10.34110/forecasting.1223653](https://doi.org/10.34110/forecasting.1223653).
Panda, S., Purkayastha, B., Das, D., Manomita, C., and Saroj, B. (2022). Health Insurance Cost Prediction Using Regression Models. COM-IT-CON, 2022. DOI:10.1109/COMIT-CON54601.2022.9850653.
Pantelous, A., and Passalidou, E. (2013). Optimal premium pricing policy in a competitive insurance market environment. Annals of Actuarial Science, 7(2), 175–191.
Patil, M. S., Sanika, K., and Sanjana, K. (2024). Medical Insurance Premium Prediction with Machine Learning. International Journal of Innovations in Engineering Research and Technology. [https://doi.org/10.26662/ijiert.v11i5.pp5-12](https://doi.org/10.26662/ijiert.v11i5.pp5-12).
Rezaei Navaei, S., and Koosha, H. (2016). Applying Data Mining Techniques for Customer Churn Prediction in Insurance Industry. International Journal of Industrial Engineering & Production Management, 27(4), 635–653.
Rose, F. (2013). Marine Insurance: Law and Practice. Routledge.
Salama, M., Abdelkader, H., and Abdelwahab, A. (2022). A novel ensemble approach for heterogeneous data with active learning. International Journal of Engineering Business Management. [https://doi.org/10.1177/18479790221082605](https://doi.org/10.1177/18479790221082605).
Timothy, J., Layton, A., Randall, P., Thomas, G., and Van Kleefd, R. (2017). Measuring efficiency of health plan payment systems in managed competition health insurance markets. Journal of Health Economics, 56, 237–255.
Voto, T., and Ngepah, N. (2025). Out-of-Pocket Health Expenditure in Sub-Saharan Africa. Economies. [https://doi.org/10.3390/economies13050119](https://doi.org/10.3390/economies13050119).
Wang, T. Ch., and Liaw, R. T. (2020). Multifactorial Genetic Fuzzy Data Mining for Building Membership Functions. IEEE Congress on Evolutionary Computation (CEC). DOI:10.1109/CEC48606.2020.9185900.
Wanke, P., and Barros, C. (2016). Efficiency drivers in Brazilian insurance. Economic Modelling, 53, 8–22.
Wilson, A. A., Nehme, A., Dhyani, A., and Mahbub, K. (2024). A Comparison of GLM with Machine Learning Approaches for Predicting Loss Cost in Mo
tor Insurance. Risks. [https://doi.org/10.3390/risks12040062](https://doi.org/10.3390/risks12040062).
Yan, C. H., Li, Y., Liu, W., Li, M., Chen, J., and Wang, L. (2019). An artificial bee colony-based kernel ridge regression for automobile insurance fraud identification.
Volume 24, Issue 1
June 2025
Pages 35-66

  • Receive Date 20 June 2024
  • Revise Date 16 June 2025
  • Accept Date 31 August 2025