Transformer Self-Attention Network for Forecasting Mortality Rates

Roshani, Amin; Izadi, Muhyiddin; Khaledi, Baha-Eldin

doi:10.22034/jirss.2022.704621

Transformer Self-Attention Network for Forecasting Mortality Rates

Document Type : Original Article

Authors

¹ Department of Statistics, Razi University, Kermanshah, Iran.

² Department of Applied Statistics and Research Methods, University of Northern Colorado, Greeley, CO 80636, USA.

10.22034/jirss.2022.704621

Abstract

The transformer network is a deep learning architecture that uses self-attention mechanisms to
capture the long-term dependencies of a sequential data. The Poisson-Lee-Carter model, introduced to predict mortality rate, includes the factors of age and the calendar year, which is a time-dependent component. In this paper, we use the transformer to predict the time-dependent component in the Poisson-Lee-Carter model. We use the real mortality data set of some countries to compare the mortality rate prediction performance of the transformer with that of the long short-term memory (LSTM) neural network, the classic ARIMA time series model and simple exponential smoothing method. The results show that the transformer dominates or is comparable to the LSTM, ARIMA and simple exponential smoothing method.

Keywords

References

Antonio, K., Bardoutsos, A., and Ouburg, W. (2015), Bayesian poisson log-bilinear models for mortality projections with multiple populations. European Actuarial Journal, 5(2), 245-281.

Box, G. E., Jenkins, G. M., Reinsel, G. C., and Ljung, G. M. (2015), Time series analysis: forecasting and control. New York: John Wiley & Sons.

Brouhns, N., Denuit, M., and Vermunt, J. K. (2002), A poisson log-bilinear regression approach to the construction of projected lifetables. Insurance: Mathematics and economics, 31(3), 373-393.

Brownlee, J. (2017), Long short-term memory networks with python: develop sequence prediction models with deep learning. Machine Learning Mastery.

Choi, J. (2021), 6-parametric factor model with long short-term memory. Communications for Statistical Applications and Methods, 28(5), 521–536.

Chollet, F. et al. (2015), Keras. https://github.com/fchollet/keras.

Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014), Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555.

Farsani, R. M., and Pazouki, E. (2021)., A transformer self-attention model for time series forecasting. Journal of Electrical and Computer Engineering Innovations (JECEI), 9(1), 1-10.

Géron, A. (2019), Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media.

Glorot, X., and Bengio, Y. (2010), Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings, 249-256.

Hochreiter, S., and Schmidhuber, J. (1997), Long short-term memory. Neural computation, 9(8), 1735-1780.

Hunt, A., and Blake, D. (2021), On the structure and classification of mortality models. North American Actuarial Journal, 25(1), 215-234.

Hyndman, R. J., Koehler, A. B., Ord, J. K., and Snyder, R. D. (2008), Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media.

Hyndman, R. J., and Khandakar, Y. (2008), Automatic time series forecasting: the forecast package for R. Journal of Statistical Software, 26(3), 1-22.

Lee, R. D., and Carter, L. R. (1992), Modeling and forecasting us mortality. Journal of the American statistical association, 87(419), 659-671.

Li, J. (2013), A poisson common factor model for projecting mortality and life expectancy jointly for females and males. Population studies, 67(1), 111-126.

Li, N., and Lee, R. (2005), Coherent mortality forecasts for a group of populations: An extension of the lee-carter method. Demography, 42(3), 575-594.

Nigri, A., Levantesi, S., Marino, M., Scognamiglio, S., and Perla, F. (2019), A deep learning integrated lee–carter model. Risks, 7(1), 33.

Pascanu, R., Mikolov, T., and Bengio, Y. (2013). On the diculty of training recurrent neural networks. In International conference on machine learning, PMLR, 1310-1318.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, Steiner, B., Fang, L., Bai, J. and Chintala, S. (2019),
Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems,Wallach, H., Larochelle, H., Beygelzimer, A., dAlché-Buc, F., Fox, E. and Garnett, R. (Eds.). Curran Associates, Inc., 8024-8035.

Perla, F., Richman, R., Scognamiglio, S. and Wüthrich, M. V. (2021), Time-series forecasting of mortality rates using deep learning. Scandinavian Actuarial Journal, 1-27.

R Core Team (2021). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.

Renshaw, A. E. and Haberman, S. (2006), A cohort-based extension to the lee–carter model for mortality reduction factors. Insurance: Mathematics and economics, 38(3), 556-570.

Richman, R. and Wüthrich, M. V. (2019), Lee and carter go machine learning: Recurrent neural networks. Available at SSRN 3441030.

Roshani, A., Izadi, M. and Khaledi, B. (2022), Bayesian poisson common factor model with overdispersion for mortality forecasting in multiple populations. Submitted.

Rumelhart, D. E., Hinton, G. E., andWilliams, R. J. (1986), Learning representations by back-propagating errors. nature, 323(6088), 533-536.

Ushey, K. and Allaire, J. and Tang, Y. (2021). reticulate: Interface to Python. R package version 1.22.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. and Polosukhin, I. (2017), Attention is all you need. In Advances in neural information processing systems, 5998-6008.

Villegas, A. M., Kaishev, V. K., and Millossovich, P. (2018), StMoMo: An R package for stochastic mortality modeling. Journal of Statistical Software, 84(3), 1-38.

Wu, N., Green, B., Ben, X., and O’Banion, S. (2020), Deep transformer models for time series forecasting: The influenza prevalence case. arXiv:2001.08317.

Journal of the Iranian Statistical Society

Volume 21, Issue 1
June 2022
Pages 81-103

Files

History

Receive Date: 15 April 2022
Revise Date: 25 November 2022
Accept Date: 21 December 2022

How to cite

Statistics

Article View: 164
PDF Download: 323

Journal of the Iranian Statistical Society

Transformer Self-Attention Network for Forecasting Mortality Rates

Volume 21, Issue 1June 2022Pages 81-103

Volume 21, Issue 1
June 2022
Pages 81-103