Pertanika Journal

Go to Pertanika

Go to JTAS Home

Go to Pertanika Facebook

Home / Regular Issue / JST Vol. 29 (2) Apr. 2021 / JST-2309-2020

Assessing Malaysian University English Test (MUET) Essay on Language and Semantic Features Using Intelligent Essay Grader (IEG)

Wee Sian Wong and Chih How Bong

Pertanika Journal of Science & Technology, Volume 29, Issue 2, April 2021

DOI: https://doi.org/10.47836/pjst.29.2.12

Keywords: Artificial intelligence, automated essay scoring, intelligent system in education, machine learning, MUET, natural language processing

Published on: 30 April 2021

Abstract

Automated Essay Scoring (AES) refers to the Artificial Intelligence (AI) application with the “intelligence” in assessing and scoring essays. There are several well-known commercial AES adopted by western countries, as well as many research works conducted in investigating automated essay scoring. However, most of the products and research works are not related to the Malaysian English test context. The AES products tend to score essays based on the scoring rubrics of a particular English text context (e.g., TOEFL, GMAT) by employing their proprietary scoring algorithm that is not accessible by the users. In Malaysia, the research and development of AES are scarce. This paper intends to formulate a Malaysia-based AES, namely Intelligent Essay Grader (IEG), for the Malaysian English test environment by using our collection of two Malaysian University English Test (MUET) essay dataset. We proposed the essay scoring rubric based on its language and semantic features. We analyzed the correlation of the proposed language and semantic features with the essay grade using the Pearson Correlation Coefficient. Furthermore, we constructed an essay scoring model to predict the essay grades. In our result, we found that the language featured such as vocabulary count and advanced part of speech were highly correlated with the essay grades, and the language features showed a greater influence on essay grades than the semantic features. From our prediction model, we observed that the model yielded better accuracy results based on the selected high-correlated essay features, followed by the language features.

References

Accuracy. (2017). Accuracy. In C. Sammut & G. I. Webb (Eds.) Encyclopedia of machine learning and data mining (pp. 1-48). Springer. https: //doi.org/10.1007/978-1-4899-7687-1_3
Benesty, J., Chen, J., Huang, Y., & Cohen, I. (2009). Pearson correlation coefficient. In Noise reduction in speech processing (pp. 1-4). Springer. https: //doi.org/10.1007/978-3-642-00296-0_5
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https: //doi.org/10.1023/A:1010933404324
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. https: //doi.org/10.1007/BF00994018
Cozma, M., Butnaru, A. M., & Ionescu, R. T. (2018). Automated essay scoring with string kernels and word embeddings. Computation and Language, 2018, 1-7.
Cramer, J. S. (2002). The origins of logistic regression. Tinbergen Institute Working Paper No. 2002-119/4. https: //doi.org/10.2139/ssrn.360300
Crossley, S. A., & McNamara, D. S. (2011). Understanding expert ratings of essay quality: Coh-Metrix analyses of first and second language writing. International Journal of Continuing Engineering Education and Life Long Learning, 21(2-3), 170-191.
Crossley, S. A., & McNamara, D. S. (2016). Say more and be more coherent: How text elaboration and cohesion can increase writing quality. Journal of Writing Research, 7(3), 351-370.
Crossley, S. A., Bradfield, F., & Bustamante, A. (2019a). Using human judgments to examine the validity of automated grammar, syntax, and mechanical errors in writing. Journal of Writing Research, 11(2), 251-270.
Crossley, S. A., Kyle, K., & Dascalu, M. (2019b). The tool for the automatic analysis of cohesion 2.0: Integrating semantic similarity and text overlap. Behavioral Research Methods, 51(1), 14-27. https: //doi.org/10.3758/s13428-018-1142-4
Darus, S., Stapa, S. H., & Hussin, S. (2003). Experimenting a computer-based essay marking system at Universiti Kebangsaan Malaysia. Jurnal Teknologi, 39(E), 1-18.
Educational Testing Service. (n.d.). About the e-rater® scoring engine. Retrieved October 30, 2020, from https: //www.ets.org/erater/about
Foltz, P. W. (2007). Discourse coherence and LSA. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 167-184). Lawrence Erlbaum Associates.
Govindasamy, P. N., Tan, B. H., & Yong, M. F. (2013). Lower six students’ preferred mode of feedback for essay revision. Malaysian Journal of ELT Research, 9(2), 82-104.
Janda, H. K., Pawar, A., Du, S., & Mago, V. (2019). Syntactic, semantic and sentiment analysis: The joint effect on automated essay evaluation. IEEE Access, 7, 108486-108503. https: //doi.org/10.1109/ACCESS.2019.2933354
Kaggle (2012). The Hewlett foundation: Automated essay scoring. Retrieved October 30, 2020, from https: //www.kaggle.com/c/ASAP-AES
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259-284. https: //doi.org/10.1080/01638539809545028
Leave-One-Out Cross-Validation. (2011). Leave-One-Out Cross-Validation. In C. Sammut, & G. I. Webb (Eds.) Encyclopedia of machine learning. Springer. https: //doi.org/10.1007/978-0-387-30164-8_469
LightSide. (2019). LightSide researcher’s workbench. Retrieved January 11, 2021, from http: //ankara.lti.cs.cmu.edu/side
Malaysian Examination Council. (2014). Malaysian University English Test (MUET) - regulations, test specifications, test format and sample questions. Retrieved October 30, 2020, from https: //www.mpm.edu.my/images/dokumen/calon-peperiksaan/muet/regulation/Regulations_Test_Specifications_Test_Format_and_Sample_Questions.pdf
Manap, M. R., Ramli, N. F., & Kassim, A. A. M. (2019). Web 2.0 automated essay scoring application and human ESL essay assessment: A comparison study. European Journal of English Language Teaching, 5(1), 146-161. https: //doi.org/ 10.5281/zenodo.3461784
McNamara, D. S., Crossley, S. A., & McCarthy, P. M. (2010). Linguistic features of writing quality. Written Communication, 27(1), 57-86.
Measurement Incorporated. (n.d.). Automated Essay Scoring - Project Essay Grade (PEG®). Retrieved October 31, 2020, from https: //www.measurementinc.com/products-services/automated-essay-scoring
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111-3119.
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39-41.
Ng, S. Y., Bong, C. H., Hong, K. S., & Lee, N. K. (2019). Developing an automated essay scorer with feedback (AESF) for Malaysian University English Test (MUET): A design-based research approach. Pertanika Journal of Social Science & Humanities, 27(3), 1451-1468.
Nguyen, H., & Litman, D. (2018). Argument mining for improving the automated scoring of persuasive essays. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 5892-5899.
Omar, N., Razali, N. A. M., & Darus, S. (2009) Automated grammar checking of tenses for ESL writing. In P. Wen, Y. Li, L. Polkowski, Y. Yao, S. Tsumoto, & G. Wang (Eds.), Lecture notes in computer science, Vol 5589: Rough Sets and Knowledge Technology (pp. 475-482). Springer. https: //doi.org/10.1007/978-3-642-02962-2_60
Page, E. B. (1966). The imminence of grading essays by computer. The Phi Delta Kappan, 47(5), 238-243.
Pearson Education. (2010). Intelligent Essay Assessor (IEA)™ Fact Sheet [Fact sheet ]. Retrieved October 31, 2020, from https: //images.pearsonassessments.com/images/assets/kt/download/IEA-FactSheet-20100401.pdf
Persing, I., & Ng, V. (2014). Modeling prompt adherence in student essays. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 1, 1534-1543.
Persing, I., & Ng, V. (2016). Modeling stance in student essays. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1, 2174-2184.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation (No. ICS-8506). California Univ San Diego La Jolla Inst for Cognitive Science.
Shermis, M. D., & Burstein, J. (2003). Introduction. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. xiii-xvi). Lawrence Erlbaum Associates.
Maasum, T. N. R. T. M., Stapa, S. H., Omar, N., Aziz, M. J. A., & Darus, S. (2012). Development of an automated tool for detecting errors in tenses. GEMA Online Journal of Language Studies, 12(2), 427- 442.
Vantage Learning, (n.d.). Intellimetric®. Retrieved October 31, 2020, from http: //www.intellimetric.com/direct
Wong, W. S., & Bong, C. H. (2019). A study for the development of automated essay scoring (AES) in Malaysian English test environment. International Journal of Innovative Computing, 9(1), 69-78. https: //doi.org/10.11113/ijic.v9n1.220
Zupanc, K., & Bosnic, Z. (2014). Automated essay evaluation augmented with semantic coherence measures. In R. Kumar, H. Toivonen, J. Pei, J. Z. Huang, & X. Wu (Eds.), 2014 IEEE International Conference on Data Mining (pp. 1133-1138). IEEE Conference Publication. https: //doi.org/10.1109/ICDM.2014.21
Zupanc, K., & Bosnić, Z. (2017). Automated essay evaluation with semantic analysis. Knowledge-Based Systems, 120, 118-132. https: //doi.org/10.1016/j.knosys.2017.01.006