Shamihah Muhammad Ghazali, Norshahida Shaadan and Zainura Idrus
Pertanika Journal of Science & Technology, Volume 29, Issue 4, October 2021
Keywords: Air quality, empirical orthogonal functions, imputation, long gap missing values, PM10
Published on: 29 October 2021
Missing values are often a major problem in many scientific fields of environmental research, leading to prediction inaccuracy and biased analysis results. This study compares the performance of existing Empirical Orthogonal Functions (EOF) based imputation methods. The EOF mean centred approach (EOF-mean) with several proposed EOF based methods, which include the EOF-median, EOF-trimmean and the newly applied Regularised Expectation-Maximisation Principal Component Analysis based method, namely R-EMPCA in estimating missing values for long gap sequence of missing values problem that exists in a Single Site Temporal Time-Dependent (SSTTD) multivariate structure air quality (PM10) data set. The study was conducted using real PM10 data set from the Klang air quality monitoring station. Performance assessment and evaluation of the methods were conducted via a simulation plan which was carried out according to four percentages (5, 10, 20 and 30) of missing values with respect to several long gap sequences (12, 24, 168 and 720) of missing points (hours). Based on several performance indicators such as RMSE, MAE, R-Square and AI, the results have shown that R-EMPCA outperformed the other methods. The results also conclude that the proposed EOF-median and EOF-trimmean have better performance than the existing EOF-mean based method in which EOF-trimmean is the best among the three. The methodology and findings of this study contribute as a solution to the problem of missing values with long gap sequences for the SSTTD data set.
Bai, K., Li, K., Guo, J., Yang, Y., & Chang, N. B. (2020). Filling the gaps of in situ hourly PM2.5 concentration data with the aid of empirical orthogonal function analysis constrained by diurnal cycles. Atmospheric Measurement Techniques, 13(3), 1213-1226. https://doi.org/10.5194/amt-13-1213-2020
Bartzokas, A., Darula, S., Kambezidis, H. D., & Kittler, R. (2003). Sky luminance distribution in Central Europe and the Mediterranean area during the winter period. Journal of Atmospheric and Solar-Terrestrial Physics, 65(1), 113-119. https://doi.org/10.1016/S1364-6826(02)00283-3
Beckers, J. M., & Rixen, M. (2003). EOF calculations and data filling from incomplete oceanographic datasets. Journal of Atmospheric and Oceanic Technology, 20(12), 1839-1856. https://doi.org/10.1175/1520-0426(2003)020<1839:ECADFF>2.0.CO;2
Di Salvo, F., Plaia, A., Ruggieri, M., & Agro, G. (2016). Empirical orthogonal function and functional data analysis procedures to impute long gaps in environmental data. In Studies in Theoretical and Applied Statistics, Selected Papers of the Statistical Societies (pp. 3-13). Springer. https://doi.org/10.1007/978-3-319-27274-0_1
Ghazali, S. M., Shaadan, N., & Idrus, Z. (2020). Missing data exploration in air quality data set using R-package data visualisation tools. Bulletin of Electrical Engineering and Informatics, 9(2), 755-763. https://doi.org/10.11591/eei.v9i2.2088
Hannachi, A., Jolliffe, I. T., & Stephenson, D. B. (2007). Empirical orthogonal functions and related techniques in atmospheric science: A review. International Journal of Climatology: A Journal of the Royal Meteorological Society, 27(9), 1119-1152. https://doi.org/10.1002/joc.1499
Josse, J., & Husson, F. (2016). missMDA: A package for handling missing values in multivariate data analysis. Journal of Statistical Software, 70(1), 1-31. https://doi.org/10.18637/jss.v070.i01
Junger, W. L., & Ponce de Leon, A. (2015). Imputation of missing data in time series for air pollutants. Atmospheric Environment, 102, 96-104. https://doi.org/10.1016/j.atmosenv.2014.11.049
Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., & Kolehmainen, M. (2004). Methods for imputation of missing values in air quality data sets. Atmospheric Environment, 38(18), 2895-2907. https://doi.org/10.1016/j.atmosenv.2004.02.026
Malaysia Environmental Quality Report. (2013). Air Quality. Department of Environment Malaysia.
Plaia, A., & Bondı, A. L. (2006). Imputation of missing values in air quality data sets. In XLIII Riunione Scientifica Della Società Italiana Di Statistica (pp. 667-670). CLEUP Publishing.
Ruggieri, M., Plaia, A., Di Salvo, F., & Agró, G. (2013). Functional principal component analysis for the explorative analysis of multisite-multivariate air pollution time series with long gaps. Journal of Applied Statistics, 40(4), 795-807. https://doi.org/10.1080/02664763.2012.754852
Ruggieri, M., Di Salvo, F., Plaia, A., & Agró, G. (2010). EOFs for gap filling in multivariate air quality data: a FDA approach. In Compstat 2010 (pp. 1557-1564). Physica-Verlag.
Shaadan, N., Deni, S. M., & Jemain, A. A. (2015). Application of functional data analysis for the treatment of missing air quality data. Sains Malaysiana, 44(10), 1531-1540. https://doi.org/10.17576/jsm-2015-4410-19
Shaadan, N., & Rahim, N. A. (2019). Imputation analysis for time series air quality (PM10) data set: A comparison of several methods. In Journal of Physics: Conference Series (Vol. 1366, No. 1, p. 012107). IOP Publishing. https://doi.org/10.1088/1742-6596/1366/1/012107
Sorjamaa, A., Lendasse, A., Cornet, Y., & Deleersnijder, E. (2010). An improved methodology for filling missing values in spatiotemporal climate data set. Computational Geosciences, 14(1), 55-64. https://doi.org/10.1007/s10596-009-9132-3
Share this article