Statistical Relationship between Quantitative and Dichotomous Variables: Student’s Test and Moving Average Approach

Document Type : Original Article

Authors
1 Federal state budgetary institution of science "Institute of Industrial Ecology" Ural Branch of the Russian Academy of Sciences
2 Institute of Industrial Ecology, Ural Branch of the Russian Academy of Sciences
Abstract
A new technique is proposed for evaluating the statistical relationship between a quantitative variable Y and a dichotomous variable X assuming two values: X=0 and X=1. The technique is based on the division of the quantitative variable Y into strata by the moving average technique and computation of average values in the strata for the variables Y and X. Stratification turns the dichotomous variable X into a quantitative one. Once the variable X has been transformed in this way, the statistical relationship between Y and X may be analyzed by linear regression and by analysis of variance. Thus, the technique proposed expands the range of methods available for analyzing statistical relationships between quantitative and dichotomous variables. Specific examples are used to compare the moving average technique with the t-test for symmetric (normal) and asymmetric distributions of quantitative variable Y. It is shown that the statistical relationship between stratified Y and X can be strongly different for a symmetrically (normally) distributed variable Y.
Keywords
Subjects

Afifi, A. A. and Azen, S. P. (1979), Statistical analysis: a computer oriented approach. New York: Academic Press.
Baevsky, R. M. and Chernikova, A. G. (2017), Heart rate variability analysis: physiological foundations and main methods. Cardiometry, 10, 66–76. doi: 10.12710/cardiometry. 2017.10.6676.
Bayevsky, R. M. and Ivanov, G. G. (2001), Cardiac Rhythm Variability: the Theoretical Aspects and the Opportunities of Clinical Application (Lecture). Ultrasound and Functional Diagnostics, 3, 108–127.
Büning, H. (2002), Robustness and power of modified Lepage, Kolmogorov-Smirnov and Crame´r-von Mises two-sample tests. Journal of applied statistics, 29(6), 907-924. doi:10.1080/02664760220136212.
Gerald, B. and Patson, T. F. (2021), Parametric and Nonparametric Tests: A Brief Review. International Journal of Statistical Distributions and Applications, 7(3), 78-82. doi: 10.11648/j.ijsd.20210703.12
Hazra, A. and Gogtay, N. (2016), Biostatistics series module 3: Comparing groups: Numerical variables. Indian journal of dermatology, 61(3), 251-260. doi: 10.4103/0019-5154.182416.
Hettmansperger, T. P. (1984), Statistical inference based on ranks. Nashville: JohnWiley & Sons.
Hosmer, D. W., Lemeshow, S. and Sturdivant, R. X. (2013), Applied Logistic Regression. Nashville: John Wiley & Sons.
Kim, T.K. and Park, J.H. (2019), More about the basic assumptions of t-test: normality and sample size. Korean Journal Anesthesiol, 72(4), 331-335. doi: 10.4097/kja.d.18.00292.
Konstantinova, E. et al. (2017), Imbalances of the autonomic nervous system in people living in radioactive contaminated territories: radioactivity and vegetative imbalance. International Journal of Radiation Research, 15(3), 317-320.
Krispin, R. (2019), Hands-on time series analysis with R: perform time series analysis and forecasting using R. Birmingham Mumbai: Packt Publishing.
Le Cessie, S., Goeman, J. J. and Dekkers, O. M. (2020), Who is afraid of non-normal data? Choosing between parametric and non-parametric tests. European journal of endocrinology, 182(2), E1-E3. doi: 10.1530/EJE-19-0922.
Maslakova, T. A. et al. (2021), Assessment of the state of the cardiovascular system in metallurgical workers depending on the length of service in harmful working conditions, behavioral and metabolic risks. Ecological Systems and Devices, 10, 18-28.
Mosteller, F. and Tukey, J. W. (1977), Data analysis and regression: A second course in statistics. New Jersey: Upper Saddle River.
Nuijten, M. B., Hartgerink, C. H., van Assen, M. A., Epskamp, S. and Wicherts, J. M. (2016), The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48(4), 1205–1226. doi: 10.3758/s13428-015-0664-2.
Shoukri, M. M. and Pause, C. A. (1999), Statistical methods for health sciences. Florida: CRC Press.
Student (1908), The Probable Error of a Mean. Biometrika, 6(1), 1-25. doi: 10.2307/2331554.
Tukey, J. W. (1977), Exploratory Data Analysis. New Jersey: Upper Saddle River.
Varaksin, A. N. et al. (2022), An analysis of the links between smoking and BMI in adolescents: A moving average approach to establishing the statistical relationship between quantitative and dichotomous variables. Children, 9(2), 220-233. doi: 10.3390/children9020220.
Walker, G. A. and Shostak, J. (2019), Common statistical methods for clinical research with SAS examples. Cary, NC: SAS Institute.
World Health Organization (2006). WHO child growth standards: length/height-for-age, weight-for-age, weight-for-length, weight-forheight and body mass index-for-age: methods and development. https://www.who.int/publications/i/item/924154693X.
Yaffee, R. A. and Mcgee, M. (2000), Introduction to time series analysis and forecasting: with applications in SAS and SPSS. San Diego: Academic Press.
Yaffee, R. A. (2007), Stata 10 (Time Series and Forecasting). Journal of Statistical Software, 23.
Zimmerman, D. W. and Zumbo, B. D. (1992), Parametric alternatives to the student T test under violation of normality and homogeneity of variance. Perceptual and motor skills, 74(3), 835-844. doi: 10.2466/pms.1992.74.3.835.
Volume 22, Issue 1
June 2023
Pages 123-135

  • Receive Date 28 March 2023
  • Revise Date 22 June 2023
  • Accept Date 15 September 2023