e-ISSN 2231-8526
ISSN 0128-7680

Home / Regular Issue / JST Vol. 31 (5) Aug. 2023 / JST-3903-2022


Workload Characterization and Classification: A Step Towards Better Resource Utilization in a Cloud Data Center

Avita Katal, Susheela Dahiya and Tanupriya Choudhury

Pertanika Journal of Science & Technology, Volume 31, Issue 5, August 2023


Keywords: Classification, cloud data center, clustering, Gaussian mixture model, K Means, workload

Published on: 31 July 2023

Advancements in virtualization technology have led to better utilization of existing infrastructure. It allows numerous virtual machines with different workloads to coexist on the same physical server, resulting in a pool of server resources. It is critical to understand enterprise workloads to correctly create and configure existing and future support in such pools. Managing resources in a cloud data center is one of the most difficult tasks. The dynamic nature of the cloud environment, as well as the high level of uncertainty, has created these challenges. These applications’ diverse Quality of Service (QoS) requirements make data center management difficult. Accurate forecasting of future resource demand is required to meet QoS needs and ensure better resource utilization. Consequently, data center workload modeling and categorization are needed to meet software quality solutions cost-effectively. This paper uses traces of Bitbrain’s data to characterize and categorize workload. Clustering (K Means and Gaussian mixture model) and Classification strategies (K Nearest Neighbors, Logistic Regression, Decision Trees, Random Forest, and Support Vector Machine) characterize and model the workload traces. K Means shows better results as compared to GMM when compared to the Calinski Harabasz index and Davies-Bouldin score. The results showed that the Decision Tree achieves the maximum accuracy of 99.18%, followed by K Nearest Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM) Logistic Regression (LR), Multi-Layer Perceptron (MLP), and Back Propagation Neural Networks.

  • Abrahao, B., & Zhang, A. (2004) Characterizing application workloads on CPU utilization for utility computing (HPL-2004-157). Hewlett-Packard Company.

  • Ali-Eldin, A., Rezaie, A., Mehta, A., Razroev, S., Luna, S. S. de, Seleznjev, O., Tordsson, J., & Elmroth, E. (2014, March 11-14). How will your workload look like in 6 years? Analyzing Wikimedia’s workload. [Paper presentation]. 2014 IEEE International Conference on Cloud Engineering, Boston, USA.

  • Bennani, M. N., & Menascé, D. A. (2005, June 13-16). Resource allocation for autonomic data centers using analytic performance models. [Paper presentation]. Second International Conference on Autonomic Computing, ICAC’05. Seattle, USA.

  • Bienia, C., Kumar, S., Singh, J. P., & Li, K. (2008, October 25-29). The PARSEC benchmark suite: Characterization and architectural implications. [Paper presentation]. Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. Toronto, Canada.

  • Birke, R., Chen, L. Y., & Smirni, E. (2014, May 5-9). Multi-resource characterization and their (in) dependencies in production datacenters. [Paper presentation]. IEEE/IFIP Network Operations and Management Symposium (NOMS), Krakow, Poland.

  • Bodnarchuk, R., & Bunt, R. (1991, May 21-24). A synthetic workload model for a distributed system file server. [Paper presentation]. Proceedings of the 1991 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, California, USA.

  • Calzarossa, M. C., Massari, L., & Tessera, D. (2016). Workload characterization. ACM Computing Surveys (CSUR), 48(3), 1-43.

  • Cheng, Y., Chai, Z., & Anwar, A. (2018, August 27-28). Characterizing co-located datacenter workloads: An Alibaba case study. [Paper presentation]. Proceedings of the 9th Asia-Pacific Workshop on Systems, Jeju, Korea.

  • Delimitrou, C., & Kozyrakis, C. (2011, June 20-24). Cross-examination of datacenter workload modeling techniques. [Paper presentation]. International Conference on Distributed Computing Systems Workshops, Minneapolis, USA.

  • Huang, S., & Feng, W. (2009, May 18-21). Energy-efficient cluster computing via accurate workload characterization. [Paper presentation]. 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, Shanghai, China.

  • Ismaeel, S., Al-Khazraji, A., & Miri, A. (2019, April 15-17). An efficient workload clustering framework for large-scale data centers. [Paper presentation]. 8th International Conference on Modeling Simulation and Applied Optimization, Manama, Bahrain.

  • Ismaeel, S., & Miri, A. (2019, January 7-9). Real-time energy-conserving VM-provisioning framework for cloud-data centers. [Paper presentation]. IEEE 9th Annual Computing and Communication Workshop and Conference, Las Vegas, USA.

  • Jackson, K. R., Ramakrishnan, L., Muriki, K., Canon, S., Cholia, S., Shalf, J., Wasserman, H. J., & Wright, N. J. (2010, November 30 – December 3). Performance analysis of high performance computing applications on the Amazon Web Services cloud. [Paper presentation]. IEEE Second International Conference on Cloud Computing Technology and Science, Indianapolis, USA.

  • Mishra, A. K., Hellerstein, J. L., Cirne, W., & Das, C. R. (2010). Towards characterizing cloud backend workloads. ACM SIGMETRICS Performance Evaluation Review, 37(4), 34-41.

  • Moro, A., Mumolo, E., & Nolich, M. (2009, September 16-18). Ergodic continuous hidden markov models for workload characterization. [Paper presentation]. Proceedings of the 6th International Symposium on Image and Signal Processing and Analysis, Salzburg, Austria.

  • Onan, A. (2019). Consensus Clustering-based undersampling approach to imbalanced learning. Scientific Programming, 2019, 1-14.

  • Onan, A., & KorukoGlu, S. (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43(1), 25-38.

  • Panneerselvam, J., Liu, L., Antonopoulos, N., & Bo, Y. (2014, December 8-11). Workload analysis for the scope of user demand prediction model evaluations in cloud environments. [Paper presentation]. IEEE/ACM 7th International Conference on Utility and Cloud Computing, London, United Kingdom.

  • Patel, J., Jindal, V., Yen, I. L., Bastani, F., Xu, J., & Garraghan, P. (2015, March 25-27). Workload estimation for improving resource management decisions in the cloud. [Paper presentation]. IEEE 12th International Symposium on Autonomous Decentralized Systems, Taichung, Taiwan.

  • Rasheduzzaman, M., Islam, M. A., Islam, T., Hossain, T., & Rahman, R. M. (2014, February 21-22). Task shape classification and workload characterization of google cluster trace. [Paper presentation]. IEEE International Advance Computing Conference, Gurgaon, India.

  • Reiss, C., Tumanov, A., Tumanov, A., Ganger G. R., & Katz, R. (2012). Towards understanding heterogeneous clouds at scale: Google trace analysis. ResearchGate.

  • Shekhawat, V. S., Gautam, A., & Thakrar, A. (2018, December 1-2). Datacenter workload classification and characterization: An empirical approach. [Paper presentation]. IEEE 13th International Conference on Industrial and Information Systems, Rupnagar, India.

  • Shen, S., van Beek, V., & Iosup, A. (2015, May 4-7). Statistical characterization of business-critical workloads hosted in cloud datacenters. [Paper presentation]. IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, Shenzhen, China.

  • Wang, K., Lin, M., Ciucu, F., Wierman, A., & Lin, C. (2015). Characterizing the impact of the workload on the value of dynamic resizing in data centers. Performance Evaluation, 85-86, 1-18.

  • Yin, J., Lu, X., Zhao, X., Chen, H., & Liu, X. (2015). BURSE: A bursty and self-similar workload generator for cloud computing. IEEE Transactions on Parallel and Distributed Systems, 26(3), 668-680.

  • Zhang, H., Jiang, G., Yoshihira, K., & Chen, H. (2014). Proactive workload management in hybrid cloud computing. IEEE Transactions on Network and Service Management, 11(1), 90-100.

  • Zhang, Q., Hellerstein, J., & Boutaba, R. (2011) Characterizing task usage shapes in Google compute clusters. Google Research.

ISSN 0128-7680

e-ISSN 2231-8526

Article ID


Download Full Article PDF

Share this article

Related Articles