e-ISSN 2231-8526
ISSN 0128-7680

Home / Regular Issue / JST Vol. 31 (1) Jan. 2023 / JST-3498-2022


Prediction of Daily Air Pollutants Concentration and Air Pollutant Index Using Machine Learning Approach

Nurul A’isyah Mustakim, Ahmad Zia Ul-Saufie, Wan Nur Shaziayani, Norazian Mohamad Noor and Sofianita Mutalib

Pertanika Journal of Science & Technology, Volume 31, Issue 1, January 2023


Keywords: data mining, decision tree, gradient boosted trees, Modeling, PM10, random forest

Published on: 3 January 2023

The major air pollutants in Malaysia that contribute to air pollution are carbon monoxide, sulfur dioxide, nitrogen dioxide, ozone, and particulate matter. Predicting the air pollutants concentration can help the government to monitor air quality and provide awareness to the public. Therefore, this study aims to overcome the problem by predicting the air pollutants concentration for the next day. This study focuses on an industrial, the Petaling Jaya monitoring station in Selangor. The data is obtained from the Department of Environment, which contains the dataset from 2004 to 2018. Subsequently, this study is conducted to construct predictive modeling that can predict the air pollutants concentrations for the next day using a tree-based approach. From the comparison of the three models, a random forest is a best-proposed model. The results of PM10 concentration prediction for the random forest is the best performance which is shown by RMSE (15.7611–19.0153), NAE (0.6508–0.8216), and R2 (0.346–0.5911). For SO2, the RMSE was 0.0016–0.0017, the NAE was 0.7056–0.8052, and the R2 was 0.3219–0.4676. The RMSE (0.0062–0.0075), the NAE (0.7892–0.9591), and the R2 (0.0814–0.3609) for NO2. The RMSE (0.3438–0.3975), NAE (0.7387–0.9015), and R2 (0.2005–0.4399) for CO were all within acceptable limits. For O3, the RMSE was 0.0051–0.0057, the NAE was 0.8386–0.9263, and the R2 was 0.1379–0.2953. The API calculation results indicate that PM10 is a significant pollutant in representing the API.

  • Alias, S. N., Hamid, N. Z. A., Saleh, S. H. M., & Bidin, B. (2021). Predicting carbon monoxide time series between different settlements area in Malaysia through chaotic approach. Journal of Science and Mathematics Letters, 9, 45-54.

  • Alpan, K., & Sekeroglu, B. (2020). Prediction of pollutant concentrations by meteorological data using machine learning algorithms. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives, 44(4/W3), 21-27.

  • Arabameri, A., Pradhan, B., & Lombardo, L. (2019). Comparative assessment using boosted regression trees, binary logistic regression, frequency ratio and numerical risk factor for gully erosion susceptibility modelling. Catena, 183(6), Article 104223.

  • Arhami, M., Kamali, N., & Rajabi, M. M. (2013). Predicting hourly air pollutant levels using artificial neural networks coupled with uncertainty analysis by Monte Carlo simulations. Environmental Science and Pollution Research, 20(7), 4777-4789.

  • Breiman, L., Culter, A., Liaw, A., & Wiener, M. (2002). Classification and regression by random forest. R News, 2, 18-22.

  • Cai, M., Yin, Y., & Xie, M. (2009). Prediction of hourly air pollutant concentrations near urban arterials using artificial neural network approach. Transportation Research Part D: Transport and Environment, 14(1), 32-41.

  • Dedovic, M. M., Avdakovic, S., Turkovic, I., Dautbasic, N., & Konjic, T. (2016). Forecasting PM10 concentrations using neural networks and system for improving air quality. In 2016 xi international symposium on telecommunications (bihtel) (pp. 1-6). IEEE Publishing.

  • Department of Environment. (1997). A guide to air pollution index in Malaysia (API). Ministry of Science, Technology and the Environment.

  • Department of Environment. (2017). AIR pollutant index (API) calculation. Ministry of Environment and Water.

  • Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813.

  • Hamid, H. A., Japeri, A. Z. U. S. M., & Ahmat, H. (2017). Characteristic and prediction of carbon monoxide concentration using time series analysis in selected urban area in Malaysia. In MATEC Web of Conferences (Vol. 103, p. 05001). EDP Sciences..

  • Hu, Y., Scavia, D., & Kerkez, B. (2018). Are all data useful? Inferring causality to predict flows across sewer and drainage systems using directed information and boosted regression trees. Water Research, 145, 697-706.

  • Lu, J., Zhang, Y., Chen, M., Wang, L., Zhao, S., Pu, X., & Chen, X. (2021). Estimation of monthly 1 km resolution PM2.5 concentrations using a random forest model over “2 + 26” cities, China. Urban Climate, 35, Article 100734.

  • Masih, A. (2019). Application of random forest algorithm to predict the atmospheric concentration of NO2. In 2019 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT) (pp. 252-255). AIP Publishing LLC.

  • Moazami, S., Noori, R., Amiri, B. J., Yeganeh, B., Partani, S., & Safavi, S. (2016). Reliable prediction of carbon monoxide using developed support vector machine. Atmospheric Pollution Research, 7(3), 412-418.

  • Moustris, K. P., Ziomas, I. C., & Paliatsos, A. G. (2010). 3-day-ahead forecasting of regional pollution index for the pollutants NO2, CO, SO2, and O3 using artificial neural networks in athens, Greece. Water, Air, and Soil Pollution, 209(1-4), 29-43.

  • Qadeer, K., & Jeon, M. (2019). Prediction of PM10 concentration in South Korea using gradient tree boosting models. In PervasiveHealth: Pervasive Computing Technologies for Healthcare (pp. 1-6). ACM Publishing.

  • Rahman, N. H. A., Lee, M. H., Latif, M. T., & Suhartono, S. (2013). Forecasting of air pollution index with artificial neural network. Jurnal Teknologi, 63(2), 59-64.

  • Sekar, C., Gurjar, B. R., Ojha, C. S. P., & Goyal, M. K. (2016). Potential assessment of neural network and decision tree algorithms for forecasting ambient PM2.5 and CO concentrations: Case study. Journal of Hazardous, Toxic, and Radioactive Waste, 20(4), 1-9.

  • Shaadan, N., Rusdi, M. S., Azmi, N. N. S. N. M., Talib, S. F., & Azmi, W. A. W. (2019). Time series model for Carbon Monoxide (CO) at several industrial sites in Peninsular Malaysia. Malaysian Journal of Computing (MJoC), 4(1), 246-260.

  • Shaharuddin, A., & Noorazuan, M. H. (2006). Kebakaran hutan dan isu pencemaran udara di Malaysia: Kes jerebu pada Ogos 2005 [Forest fires and air pollution issues in Malaysia: The case of haze on August 2005]. UKM Journal Article Repository, 1(1), 1-19.

  • Shams, S. R., Jahani, A., Moeinaddini, M., & Khorasani, N. (2020). Air carbon monoxide forecasting using an artificial neural network in comparison with multiple regression. Modeling Earth Systems and Environment, 6(3), 1467-1475.

  • Shaziayani, W. N., Ul-Saufie, A. Z., Ahmat, H., & Al-Jumeily, D. (2021). Coupling of quantile regression into boosted regression trees (BRT) technique in forecasting emission model of PM10 concentration. Air Quality, Atmosphere and Health, 14, 1647-1663.

  • Thomas, S., & Jacko, R. B. (2007). Model for forecasting expressway fine particulate matter and carbon monoxide concentration: Application of regression and neural network models. Journal of the Air and Waste Management Association, 57(4), 480-488.

  • Ul-Saufie, A. Z., Yahaya, A. S., Ramli, A., & Hamid, H. A. (2012). Performance of multiple linear regression model for long-term PM10 concentration prediction based on gaseous and meteorological parameters. Journal of Applied Sciences, 12(14), 1488-1494.

  • Watson, G. L., Telesca, D., Reid, C. E., Pfister, G. G., & Jerrett, M. (2019). Machine learning models accurately predict ozone exposure during wildfire events. Environmental Pollution, 254, Article 112792.

ISSN 0128-7680

e-ISSN 2231-8526

Article ID


Download Full Article PDF

Share this article

Recent Articles