PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY

 

e-ISSN 2231-8526
ISSN 0128-7680

Home / Regular Issue / JST Vol. 33 (2) Mar. 2025 / JST(S)-0668-2025

 

Random Forest Model for Software Build Time Prediction on CI/CD Pipeline

Wen Han Seow, Chia Yean Lim and Sau Loong Ang

Pertanika Journal of Science & Technology, Volume 33, Issue 2, March 2025

DOI: https://doi.org/10.47836/pjst.33.2.22

Keywords: CI/CD, machine learning, random forest, regression, software engineering

Published on: 2025-03-07

In the fast-paced world of software engineering, Continuous Integration/Continuous Delivery (CI/CD) pipelines are essential to deliver software builds continuously. However, the varying time taken for software builds to complete on these pipelines can challenge scheduling software delivery and impact productivity. To the best of researchers' knowledge, machine learning techniques have never been used to predict software build time in the CI/CD pipeline. This research attempted to apply data science and machine learning techniques, including linear regression (LR), support vector regressor (SVR), random forest regressor (RFR), and XGBoost regressor, to predict software build completion time to address this research gap. Past build events were used as a dataset to train and identify the best-performing model by evaluating the time a software build takes to complete. Different factors contributing to software build time on the CI/CD pipeline were also analyzed to identify opportunities for improvement. This research found that the random forest (RF) model achieved the best and outstanding performance of 14.306 in mean squared error (MSE). This model could be deployed to provide completion time estimates for software builds, enabling better code delivery scheduling. This research also suggested opportunities for improvement in the CI/CD pipeline by discovering major factors causing high build time in the CI/CD pipeline that engineers could rectify to reduce software build time in the CI/CD pipeline.

  • Amazon Web Services. (2023). Practicing continuous integration and continuous delivery on AWS. AWS. https://docs.aws.amazon.com/pdfs/whitepapers/latest/practicing-continuous-integration-continuous-delivery/practicing-continuous-integration-continuous-delivery.pdf

    Battina, D. S. (2021). The challenges and mitigation strategies of using devops during software development. International Journal of Creative Research Thoughts (IJCRT), 9(1), 4760–4765.

    Bedrina, O. (2023, August 7). Best continuous integration tools for 2023 ‒ Survey results. JetBrains Blog. https://blog.jetbrains.com/teamcity/2023/07/best-ci-tools/

    Cahoon, B. D. (2002). Effective compile-time analysis for data prefetching in Java [Doctoral dissertation, University of Massachusetts Amherst]. University of Massachusetts Amherst. https://www.cs.utexas.edu/users/mckinley/papers/cahoon-thesis.pdf

    Fairbanks, J., Tharigonda, A., & Eisty, N. U. (2023, May 23-25). Analyzing the effects of CI/CD on open source repositories in github and gitlab. [Paper presentation]. IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA), Orlando, Florida. https://doi.org/10.1109/sera57763.2023.10197778

    Hodson, T. O., Over, T. M., & Foks, S. S. (2021). Mean squared error, deconstructed. Journal of Advances in Modeling Earth Systems, 13(12), Article e2021MS002681. https://doi.org/10.1029/2021ms002681

    IBM. (2021, August 17). CRISP-DM help overview. IBM. https://www.ibm.com/docs/en/spss-modeler/saas?topic=dm-crisp-help-overview

    Jaspan, C., & Green, C. (2023). developer productivity for humans, part 4: Build latency, predictability, and developer productivity. IEEE Software, 40(4), 25–29. https://doi.org/10.1109/ms.2023.3275268

    Kaliappan, J., Srinivasan, K., Mian Qaisar, S., Sundararajan, K., Chang, C. Y., & C, S. (2021). Performance evaluation of regression models for the prediction of the COVID-19 reproduction rate. Frontiers in Public Health, 9, Article 729795. https://doi.org/10.3389/fpubh.2021.729795

    Lazzarinetti, G., Massarenti, N., Sgrò, F., & Salafia, A. (2021, November 30). A machine learning based framework for continuous defect prediction in CI/CD pipelines. [Paper presentation]. Proceedings of the Italian Workshop on Artificial Intelligence and Applications for Business and Industries (AIABI), Milan, Italy.

    Red Hat. (2023, December 12). What is CI/CD?. Red Hat. https://www.redhat.com/en/topics/devops/what-is-ci-cd

    Rosidi, N. (2023, June 6). Advanced feature selection techniques for machine learning models. KDnuggets. https://www.kdnuggets.com/2023/06/advanced-feature-selection-techniques-machine-learning-models.html

    Saidani, I., Ouni, A., Chouchen, M., & Mkaouer, M. W. (2020). Predicting continuous integration build failures using evolutionary search. Information and Software Technology, 128, Article 106392. https://doi.org/10.1016/j.infsof.2020.106392

    Silverthorne, V. (2022, February 15). 10 Reasons why your business needs CI/CD. GitLab. https://about.gitlab.com/blog/2022/02/15/ten-reasons-why-your-business-needs-ci-cd/

    Snyk. (2020, October 1). What is CI/CD? CI/CD Pipeline and Tools Explained. Snyk. https://snyk.io/learn/what-is-ci-cd-pipeline-and-tools-explained/

    Sterkenburg, T. F., & Grünwald, P. D. (2021). The no-free-lunch theorems of supervised learning. Synthese, 199(3), 9979–10015. https://doi.org/10.1007/s11229-021-03233-1