On the Blocks of Interpoint Distances

Author

Department of Statistics, George Washington University, Washington, DC, USA

10.52547/jirss.20.1.197

Abstract

We study the blocks of interpoint distances, their distributions, correlations, independence and the homogeneity of their total variances. We discuss the exact and asymptotic distribution of the interpoint distances and their average under three models and provide connections between the correlation of interpoint distances with their vector correlation and test of sphericity. We discuss testing independence of the blocks based on the correlation of block interpoint distances. A homogeneity test of the total variances in each block and a simultaneous plot to visualize their relative ordering are presented.

Keywords

  1. Anderson, T. W. (2003), An Introduction to Multivariate Statistical Analysis. New Jersey: Wiley-Interscience.
  2. Bottesch, T., Bühler, T. Kächele, M. (2016), Speeding up k-means by approximating Euclidean distances via block vectors. Proceedings of the 33rd International Conference on International Conference on Machine Learning, Volume 48, 2578-2586.
  3. Escoufier, Y. (1973), Le traitement des variables vectorielles. Biometrics, 29, 751-760. [DOI:10.2307/2529140]
  4. Fang, K. T., Zhang, Y. T. (1990), Generalized multivariate analysis. Springer-Verlag, Berlin; Science Press, Beijing.
  5. Flexer, A., Schnitzer, D. (2015), Choosing l_p norms in high-dimensional spaces based on hub analysis. Neurocomputing, 169, 281-287. [DOI:10.1016/j.neucom.2014.11.084]
  6. Freeman, J. and Modarres, R. (2005), Efficiency of test for independence after Box-Cox transformation. Journal of Multivariate Analysis, 95, 107-118. [DOI:10.1016/j.jmva.2004.08.005]
  7. Freeman, J. and Modarres, R. (2006), Inverse Box-Cox: The power-normal distribution. Statistics and Probability Letters, 76, 764-772. [DOI:10.1016/j.spl.2005.10.036]
  8. Guo, L., Modarres, R. (2019), Interpoint Distance Classification of High Dimensional Discrete Observations. International Statistical Review, 87(2), 191-206. [DOI:10.1111/insr.12281]
  9. Guo, L., Modarres, R. (2020), Nonparametric tests of independence based on interpoint distance. Journal of Nonparametric Statistics, 32 (1), 225-245. [DOI:10.1080/10485252.2020.1714613]
  10. Gupta, A. K. and Huang, W. J. (2002), Quadratic forms in skew normal variates. J. Math. Anal. Appl., 273, 558-564.
  11. Iwashita, T. and Siotani, M. (1994), Asymptotic Distributions of Functions of a Sample Covariance Matrix under the Elliptical Distribution. The Canadian Journal of Statistics, 22 (2), 273-283. [DOI:10.2307/3315589]
  12. Li, J. (2018), Asymptotic normality of interpoint distances for high-dimensional data with applications to the two-sample problem. Biometrika, 105 (3), 529-546. [DOI:10.1093/biomet/asy020]
  13. Marozzi, M. (2015), Multivariate multidistance tests for high-dimensional low sample size case-control studies. Statistics in Medicine, 34, 1511-1526. [DOI:10.1002/sim.6418]
  14. Marozzi, M. (2016), Multivariate tests based on interpoint distances with application to magnetic resonance imaging. Stat. Methods Med. Res., 25 (6), 2593-2610. [DOI:10.1177/0962280214529104]
  15. Marozzi, M., Mukherjee, A. and Kalina, J. (2020), Interpoint distance tests for high-dimensional comparison studies. J. Appl. Stat., 47 (4), 653-665. [DOI:10.1080/02664763.2019.1649374]
  16. Modarres, R. and Song, Y. (2020), Interpoint Distances: Applications, Properties and Visualization. Applied Stochastic Models in Business and Industry, [DOI:10.1002/asmb.2508]
  17. Modarres, R. (2020), Nonparametric Tests for Detection of High Dimensional Outliers. Submitted for publication.
  18. Muirhead R. J. (1982), Aspects of Multivariate Statistical Theory, John Wiley & Sons, New York, NY.
  19. Pal, A. K., Mondal, P. K., and Ghosh, A. K. (2016), High dimensional nearest neighbor classification based differences of inter-point distances. Pattern Recognition Letters, 74, 1-8. [DOI:10.1016/j.patrec.2016.01.018]
  20. Robert, P., Cl'eroux, R., and Ranger, N. (1985), Some results on vector correlation. Computational Statistics and Data Analysis, 3, 25-32. [DOI:10.1016/0167-9473(85)90055-6]
  21. Sarkar, S. and Ghosh, A. K. (2020), On Perfect Clustering of High Dimension, Low Sample Size Data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(9), 2257-2272. [DOI:10.1109/TPAMI.2019.2912599]
  22. Song, Y. and Modarres, R. (2019), Interpoint Distance Test of Homogeneity for Multivariate Mixture Models. International Statistical Review, 87 (3), 613-638. [DOI:10.1111/insr.12332]
  23. Srivastava, M. S. (2005), Some tests concerning the covariance matrix in high-dimensional data. Journal of Japan Statistical Society, 35, 251-272. [DOI:10.14490/jjss.35.251]
  24. Srivastava, M. S. and Kubokawa, T. (2013), Tests for multivariate analysis of variance in high dimension under non-normality. Journal of Multivariate Analysis, 115, 204-216. [DOI:10.1016/j.jmva.2012.10.011]
Volume 20, Issue 1
June 2021
Pages 197-218
  • Receive Date: 23 July 2022
  • Revise Date: 19 May 2024
  • Accept Date: 23 July 2022