e-ISSN 2231-8526
ISSN 0128-7680

Home / Regular Issue / JST Vol. 29 (2) Apr. 2021 / JST-2309-2020


Assessing Malaysian University English Test (MUET) Essay on Language and Semantic Features Using Intelligent Essay Grader (IEG)

Wee Sian Wong and Chih How Bong

Pertanika Journal of Science & Technology, Volume 29, Issue 2, April 2021


Keywords: Artificial intelligence, automated essay scoring, intelligent system in education, machine learning, MUET, natural language processing

Published on: 30 April 2021

Automated Essay Scoring (AES) refers to the Artificial Intelligence (AI) application with the “intelligence” in assessing and scoring essays. There are several well-known commercial AES adopted by western countries, as well as many research works conducted in investigating automated essay scoring. However, most of the products and research works are not related to the Malaysian English test context. The AES products tend to score essays based on the scoring rubrics of a particular English text context (e.g., TOEFL, GMAT) by employing their proprietary scoring algorithm that is not accessible by the users. In Malaysia, the research and development of AES are scarce. This paper intends to formulate a Malaysia-based AES, namely Intelligent Essay Grader (IEG), for the Malaysian English test environment by using our collection of two Malaysian University English Test (MUET) essay dataset. We proposed the essay scoring rubric based on its language and semantic features. We analyzed the correlation of the proposed language and semantic features with the essay grade using the Pearson Correlation Coefficient. Furthermore, we constructed an essay scoring model to predict the essay grades. In our result, we found that the language featured such as vocabulary count and advanced part of speech were highly correlated with the essay grades, and the language features showed a greater influence on essay grades than the semantic features. From our prediction model, we observed that the model yielded better accuracy results based on the selected high-correlated essay features, followed by the language features.

  • Accuracy. (2017). Accuracy. In C. Sammut & G. I. Webb (Eds.) Encyclopedia of machine learning and data mining (pp. 1-48). Springer. https: //

  • Benesty, J., Chen, J., Huang, Y., & Cohen, I. (2009). Pearson correlation coefficient. In Noise reduction in speech processing (pp. 1-4). Springer. https: //

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https: //

  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. https: //

  • Cozma, M., Butnaru, A. M., & Ionescu, R. T. (2018). Automated essay scoring with string kernels and word embeddings. Computation and Language, 2018, 1-7.

  • Cramer, J. S. (2002). The origins of logistic regression. Tinbergen Institute Working Paper No. 2002-119/4. https: //

  • Crossley, S. A., & McNamara, D. S. (2011). Understanding expert ratings of essay quality: Coh-Metrix analyses of first and second language writing. International Journal of Continuing Engineering Education and Life Long Learning, 21(2-3), 170-191.

  • Crossley, S. A., & McNamara, D. S. (2016). Say more and be more coherent: How text elaboration and cohesion can increase writing quality. Journal of Writing Research, 7(3), 351-370.

  • Crossley, S. A., Bradfield, F., & Bustamante, A. (2019a). Using human judgments to examine the validity of automated grammar, syntax, and mechanical errors in writing. Journal of Writing Research, 11(2), 251-270.

  • Crossley, S. A., Kyle, K., & Dascalu, M. (2019b). The tool for the automatic analysis of cohesion 2.0: Integrating semantic similarity and text overlap. Behavioral Research Methods, 51(1), 14-27. https: //

  • Darus, S., Stapa, S. H., & Hussin, S. (2003). Experimenting a computer-based essay marking system at Universiti Kebangsaan Malaysia. Jurnal Teknologi, 39(E), 1-18.

  • Educational Testing Service. (n.d.). About the e-rater® scoring engine. Retrieved October 30, 2020, from https: //

  • Foltz, P. W. (2007). Discourse coherence and LSA. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 167-184). Lawrence Erlbaum Associates.

  • Govindasamy, P. N., Tan, B. H., & Yong, M. F. (2013). Lower six students’ preferred mode of feedback for essay revision. Malaysian Journal of ELT Research, 9(2), 82-104.

  • Janda, H. K., Pawar, A., Du, S., & Mago, V. (2019). Syntactic, semantic and sentiment analysis: The joint effect on automated essay evaluation. IEEE Access, 7, 108486-108503. https: //

  • Kaggle (2012). The Hewlett foundation: Automated essay scoring. Retrieved October 30, 2020, from https: //

  • Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259-284. https: //

  • Leave-One-Out Cross-Validation. (2011). Leave-One-Out Cross-Validation. In C. Sammut, & G. I. Webb (Eds.) Encyclopedia of machine learning. Springer. https: //

  • LightSide. (2019). LightSide researcher’s workbench. Retrieved January 11, 2021, from http: //

  • Malaysian Examination Council. (2014). Malaysian University English Test (MUET) - regulations, test specifications, test format and sample questions. Retrieved October 30, 2020, from https: //

  • Manap, M. R., Ramli, N. F., & Kassim, A. A. M. (2019). Web 2.0 automated essay scoring application and human ESL essay assessment: A comparison study. European Journal of English Language Teaching, 5(1), 146-161. https: // 10.5281/zenodo.3461784

  • McNamara, D. S., Crossley, S. A., & McCarthy, P. M. (2010). Linguistic features of writing quality. Written Communication, 27(1), 57-86.

  • Measurement Incorporated. (n.d.). Automated Essay Scoring - Project Essay Grade (PEG®). Retrieved October 31, 2020, from https: //

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111-3119.

  • Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39-41.

  • Ng, S. Y., Bong, C. H., Hong, K. S., & Lee, N. K. (2019). Developing an automated essay scorer with feedback (AESF) for Malaysian University English Test (MUET): A design-based research approach. Pertanika Journal of Social Science & Humanities, 27(3), 1451-1468.

  • Nguyen, H., & Litman, D. (2018). Argument mining for improving the automated scoring of persuasive essays. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 5892-5899.

  • Omar, N., Razali, N. A. M., & Darus, S. (2009) Automated grammar checking of tenses for ESL writing. In P. Wen, Y. Li, L. Polkowski, Y. Yao, S. Tsumoto, & G. Wang (Eds.), Lecture notes in computer science, Vol 5589: Rough Sets and Knowledge Technology (pp. 475-482). Springer. https: //

  • Page, E. B. (1966). The imminence of grading essays by computer. The Phi Delta Kappan, 47(5), 238-243.

  • Pearson Education. (2010). Intelligent Essay Assessor (IEA)™ Fact Sheet [Fact sheet ]. Retrieved October 31, 2020, from https: //

  • Persing, I., & Ng, V. (2014). Modeling prompt adherence in student essays. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 1, 1534-1543.

  • Persing, I., & Ng, V. (2016). Modeling stance in student essays. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1, 2174-2184.

  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation (No. ICS-8506). California Univ San Diego La Jolla Inst for Cognitive Science.

  • Shermis, M. D., & Burstein, J. (2003). Introduction. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. xiii-xvi). Lawrence Erlbaum Associates.

  • Maasum, T. N. R. T. M., Stapa, S. H., Omar, N., Aziz, M. J. A., & Darus, S. (2012). Development of an automated tool for detecting errors in tenses. GEMA Online Journal of Language Studies, 12(2), 427- 442.

  • Vantage Learning, (n.d.). Intellimetric®. Retrieved October 31, 2020, from http: //

  • Wong, W. S., & Bong, C. H. (2019). A study for the development of automated essay scoring (AES) in Malaysian English test environment. International Journal of Innovative Computing, 9(1), 69-78. https: //

  • Zupanc, K., & Bosnic, Z. (2014). Automated essay evaluation augmented with semantic coherence measures. In R. Kumar, H. Toivonen, J. Pei, J. Z. Huang, & X. Wu (Eds.), 2014 IEEE International Conference on Data Mining (pp. 1133-1138). IEEE Conference Publication. https: //

  • Zupanc, K., & Bosnić, Z. (2017). Automated essay evaluation with semantic analysis. Knowledge-Based Systems, 120, 118-132. https: //

ISSN 0128-7680

e-ISSN 2231-8526

Article ID


Download Full Article PDF

Share this article

Recent Articles