Home / Regular Issue / JTAS Vol. 29 (2) Apr. 2021 / JST-2242-2020


Robust Multivariate Correlation Techniques: A Confirmation Analysis using Covid-19 Data Set

Friday Zinzendoff Okwonu, Nor Aishah Ahad, Joshua Sarduana Apanapudor and Festus Irismisose Arunaye

Pertanika Journal of Tropical Agricultural Science, Volume 29, Issue 2, April 2021

DOI: https://doi.org/10.47836/pjst.29.2.16

Keywords: Coefficient of determination, Covid-19, multivariate correlation techniques, robust

Published on: 30 April 2021

Robust multivariate correlation techniques are proposed to determine the strength of the association between two or more variables of interest since the existing multivariate correlation techniques are susceptible to outliers when the data set contains random outliers. The performances of the proposed techniques were compared with the conventional multivariate correlation techniques. All techniques under study are applied on COVID-19 data sets for Malaysia and Nigeria to determine the level of association between study variables which are confirmed, discharged, and death cases. These techniques’ performances are evaluated based on the multivariate correlation (R), multivariate coefficient of determination (R^2), and Adjusted R^2. The proposed techniques showed R=0.99 and the conventional methods showed that R ranges from 0.44 to 0.73. The R^2 and the Adjusted R^2 for proposed methods are 0.98 and 0.97 while the conventional methods showed that R equals 0.53, 0.44, and 0.19 whereas Adjusted R^2 equals 0.52, 0.43, and 0.18, respectively. The proposed techniques strongly affirmed that for any patient to be discharged or die of the Covid-19, the patient must be confirmed Covid-19 positive, whereas the conventional method showed moderate to very weak affirmation. Based on the results, the proposed techniques are robust and show a very strong association between the variables of interest than the conventional techniques.

  • Abdi, H. (2007). Multiple correlation coefficient. In N. Salkind (Ed.), Encyclopedia of Measurement and Statistics (pp. 648-651). Sage Publication.

  • Abdullah, M. B. (1990). On a robust correlation coefficient. Journal of the Royal Statistical Society: Series D (The Statistician), 39(4), 455-460. https://doi.org/10.2307/2349088

  • Armstrong, R. A. (2019). Should Pearson’s correlation coefficient be avoided? Ophthalmic and Physiological Optics, 39(5), 316-327. https://doi.org/10.1111/opo.12636

  • Asuero, A. G., Sayago, A., & Gonzalez, A. G. (2006). The correlation coefficient: An overview. Critical Reviews in Analytical Chemistry, 36(1), 41-59. https://doi.org/10.1080/10408340500526766

  • Bareinboim, E., Tian, J., & Pearl, J. (2014). Recovering from selection bias in causal and statistical inference. In Proceedings of the National Conference on Artificial Intelligence (Vol. 28, No. 1). PKP Publishing Services Network.

  • Brown, G., Pocock, A., Zhao, M. J., & Luján, M. (2012). Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. The Journal of Machine Learning Research, 13(1), 27-66.

  • Châtillon, G. (1984). The balloon rules for a rough estimate of the correlation coefficient. American Statistician, 38(1), 58-60. https://doi.org/10.1080/00031305.1984.10482875

  • Garnett, J. C. (1919). General ability, cleverness and purpose. British Journal of Psychology, 9(3), 345-366.

  • Geiß, S., & Einax, J. (1996). Multivariate correlation analysis - A method for the analysis of multidimensional time series in environmental studies. Chemometrics and Intelligent Laboratory Systems, 32(1), 57-65. https://doi.org/10.1016/0169-7439(95)00067-4

  • Geiss, S., Einax, J., & Danzer, K. (1991). Multivariate correlation analysis and its application in environmental analysis. Analytica Chimica Acta, 242, 5-9. https://doi.org/10.1016/0003-2670(91)87040-E

  • Huberty, C. J. (2003). Multiple correlations versus multiple regression. Educational and Psychological Measurement, 63(2), 271-278. https://doi.org/10.1177/0013164402250990

  • Lewis-Beck, M. S., Bryman, A., & Futing Liao, T. (2004). The SAGE Encyclopedia of Social Science Research Methods (Vols. 1-0). Sage Publications, Inc. https://doi.org/10.4135/9781412950589

  • KKM. (2020). Distribution of covid-19 cases according to date of confirmation. Retrieved October 01, 2020, from http://covid-19.moh.gov.my/

  • Mukaka M. M. (2012). Statistics corner: A guide to the appropriate use of the Correlation coefficient in medical research. Malawi Medical Journal, 24(3), 69-71.

  • Nagelkerke, N. J. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691-692. https://doi.org/10.1093/biomet/78.3.691

  • Nakagawa, S., Johnson, P. C. D., & Schielzeth, H. (2017). The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of the Royal Society Interface, 14(134), Article 20170213. https://doi.org/10.1098/rsif.2017.0213

  • NCDC. (2020). The official Twitter account of the Nigeria Centre for Disease Control. Retrieved June 19, 2020, from https://twitter.com/ncdcgov

  • Nguyen, H. V., Müller, E., Vreeken, J., Keller, F., & Böhm, K. (2013). CMI: An information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In Proceedings of the 2013 SIAM International Conference on Data Mining (pp. 198-206). Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611972832.22

  • Okwonu, F. Z., Asaju, B. L., & Arunaye, F. I. (2020, September). Breakdown analysis of pearson correlation coefficient and robust correlation methods. In IOP Conference Series: Materials Science and Engineering (Vol. 917, No. 1, p. 012065). IOP Publishing. https://doi.org/10.1088/1757-899X/917/1/012065

  • Pearson, K. (1920). Notes on the history of correlation. Biometrika, 13(1), 25-45. https://doi.org/10.1093/biomet/13.1.25

  • Pyrczak, F., & Oh, D. M. (2018). Making sense of statistics: A conceptual overview (7th ed.). Routledge.

  • Rodgers, L. J., & Nicewander, W. L. (1988). Thirteen ways to look at the correlation coefficient. The American Statistician, 42(1), 59-66. https://doi.org/10.1080/00031305.1988.10475524

  • Tan, Z., Jamdagni, A., He, X., Nanda, P., & Liu, R. P. (2011). Denial-of-service attack detection based on multivariate correlation analysis. In International Conference on Neural Information Processing (pp. 756-765). Springer. https://doi.org/10.1007/978-3-642-24965-5_85

  • Urain, J., & Peters, J. (2019). Generalized multiple correlation coefficient as a similarity measurement between trajectories. In IEEE International Conference on Intelligent Robots and Systems (pp. 1-7). IEEE Conference Publication. https://doi.org/10.1109/IROS40897.2019.8967884

  • Wang, J., & Zheng, N. (2020). Correlation with applications (1): Measures of correlation for multiple variables. In IEEE International Conference on Intelligent Robots and Systems (pp. 1-18). Cornell University Press.

  • Wang, L., Tang, X., Zhang, J., & Guan, D. (2018). Correlation Analysis for Exploring Multivariate Data Sets. IEEE Access, 6, 44235-44243. https://doi.org/10.1109/ACCESS.2018.2864685

  • Wang, Y., Romano, S., Nguyen, V., Bailey, J., Ma, X., & Xia, S. T. (2017). Unbiased multivariate correlation analysis. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 31, No. 1). PKP Publishing Services Network.

  • Weida, F. M. (1927). On various conceptions of correlation. The Annals of Mathematics, 29(1/4), 276-312. https://doi.org/10.2307/1968000

  • Zhang, X., Pan, F., Wang, W., & Nobel, A. (2008). Mining non-redundant high order correlations in binary data. In Proceedings of the VLDB Endowment International Conference on Very Large Data Bases (Vol. 1, No. 1, p. 1178). NIH Public Access. https://doi.org/10.14778/1453856.1453981

ISSN 1511-3701

e-ISSN 2231-8542

Article ID


Download Full Article PDF

Share this article

Recent Articles