e-ISSN 2231-8526
ISSN 0128-7680
Wen Han Seow, Chia Yean Lim and Sau Loong Ang
Pertanika Journal of Science & Technology, Volume 33, Issue 2, March 2025
DOI: https://doi.org/10.47836/pjst.33.2.22
Keywords: CI/CD, machine learning, random forest, regression, software engineering
Published on: 2025-03-07
In the fast-paced world of software engineering, Continuous Integration/Continuous Delivery (CI/CD) pipelines are essential to deliver software builds continuously. However, the varying time taken for software builds to complete on these pipelines can challenge scheduling software delivery and impact productivity. To the best of researchers' knowledge, machine learning techniques have never been used to predict software build time in the CI/CD pipeline. This research attempted to apply data science and machine learning techniques, including linear regression (LR), support vector regressor (SVR), random forest regressor (RFR), and XGBoost regressor, to predict software build completion time to address this research gap. Past build events were used as a dataset to train and identify the best-performing model by evaluating the time a software build takes to complete. Different factors contributing to software build time on the CI/CD pipeline were also analyzed to identify opportunities for improvement. This research found that the random forest (RF) model achieved the best and outstanding performance of 14.306 in mean squared error (MSE). This model could be deployed to provide completion time estimates for software builds, enabling better code delivery scheduling. This research also suggested opportunities for improvement in the CI/CD pipeline by discovering major factors causing high build time in the CI/CD pipeline that engineers could rectify to reduce software build time in the CI/CD pipeline.
Amazon Web Services. (2023). Practicing continuous integration and continuous delivery on AWS. AWS. https://docs.aws.amazon.com/pdfs/whitepapers/latest/practicing-continuous-integration-continuous-delivery/practicing-continuous-integration-continuous-delivery.pdf
Battina, D. S. (2021). The challenges and mitigation strategies of using devops during software development. International Journal of Creative Research Thoughts (IJCRT), 9(1), 4760–4765.
Bedrina, O. (2023, August 7). Best continuous integration tools for 2023 ‒ Survey results. JetBrains Blog. https://blog.jetbrains.com/teamcity/2023/07/best-ci-tools/
Cahoon, B. D. (2002). Effective compile-time analysis for data prefetching in Java [Doctoral dissertation, University of Massachusetts Amherst]. University of Massachusetts Amherst. https://www.cs.utexas.edu/users/mckinley/papers/cahoon-thesis.pdf
Fairbanks, J., Tharigonda, A., & Eisty, N. U. (2023, May 23-25). Analyzing the effects of CI/CD on open source repositories in github and gitlab. [Paper presentation]. IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA), Orlando, Florida. https://doi.org/10.1109/sera57763.2023.10197778
Hodson, T. O., Over, T. M., & Foks, S. S. (2021). Mean squared error, deconstructed. Journal of Advances in Modeling Earth Systems, 13(12), Article e2021MS002681. https://doi.org/10.1029/2021ms002681
IBM. (2021, August 17). CRISP-DM help overview. IBM. https://www.ibm.com/docs/en/spss-modeler/saas?topic=dm-crisp-help-overview
Jaspan, C., & Green, C. (2023). developer productivity for humans, part 4: Build latency, predictability, and developer productivity. IEEE Software, 40(4), 25–29. https://doi.org/10.1109/ms.2023.3275268
Kaliappan, J., Srinivasan, K., Mian Qaisar, S., Sundararajan, K., Chang, C. Y., & C, S. (2021). Performance evaluation of regression models for the prediction of the COVID-19 reproduction rate. Frontiers in Public Health, 9, Article 729795. https://doi.org/10.3389/fpubh.2021.729795
Lazzarinetti, G., Massarenti, N., Sgrò, F., & Salafia, A. (2021, November 30). A machine learning based framework for continuous defect prediction in CI/CD pipelines. [Paper presentation]. Proceedings of the Italian Workshop on Artificial Intelligence and Applications for Business and Industries (AIABI), Milan, Italy.
Red Hat. (2023, December 12). What is CI/CD?. Red Hat. https://www.redhat.com/en/topics/devops/what-is-ci-cd
Rosidi, N. (2023, June 6). Advanced feature selection techniques for machine learning models. KDnuggets. https://www.kdnuggets.com/2023/06/advanced-feature-selection-techniques-machine-learning-models.html
Saidani, I., Ouni, A., Chouchen, M., & Mkaouer, M. W. (2020). Predicting continuous integration build failures using evolutionary search. Information and Software Technology, 128, Article 106392. https://doi.org/10.1016/j.infsof.2020.106392
Silverthorne, V. (2022, February 15). 10 Reasons why your business needs CI/CD. GitLab. https://about.gitlab.com/blog/2022/02/15/ten-reasons-why-your-business-needs-ci-cd/
Snyk. (2020, October 1). What is CI/CD? CI/CD Pipeline and Tools Explained. Snyk. https://snyk.io/learn/what-is-ci-cd-pipeline-and-tools-explained/
Sterkenburg, T. F., & Grünwald, P. D. (2021). The no-free-lunch theorems of supervised learning. Synthese, 199(3), 9979–10015. https://doi.org/10.1007/s11229-021-03233-1
ISSN 0128-7680
e-ISSN 2231-8526