Home / Regular Issue / JST Vol. 26 (4) Oct. 2018 / JST-0994-2018

 

Named Entity Recognition in Hindi Using Hyperspace Analogue to Language and Conditional Random Field

Arti Jain and Anuja Arora

Pertanika Journal of Science & Technology, Volume 26, Issue 4, October 2018

Keywords: Conditional Random Field, Hindi, Hyperspace Analogue to Language, Named Entity Recognition

Published on: 24 Oct 2018

Named Entity Recognition (NER) is defined as identification and classification of Named Entities (NEs) into set of well-defined categories. Many rule-based, machine learning based, and hybrid approaches have been devised to deal with NER, particularly, for the English language. However, in case of Hindi language several perplexing challenges occur that are detailed in this research paper. A new approach is proposed to perform Hindi NE Recognition using semantic properties to handle some of the Hindi language specific NER challenges. And because of increasing demand in Hindi health care applications, Hindi Health Data (HHD) is crawled from four well-known Indian websites: Traditional Knowledge Digital Library; Ministry of Ayush; University of Patanjali; and Linguistic Data Consortium for Indian Languages. Four novel NE types are determined, namely- Person NE, Disease NE, Symptom NE and Consumable NE. For training purpose, HHD data is converted into Hyperspace Analogue to Language (HAL) vectors, thereby, maps each word into a high dimensional space. Conditional Random Field model is applied based on HHD feature engineering, HHD gazetteers and HAL. Blind test data is then mapped into the high dimensional space created during the training phase and outputs the annotated test data. The results obtained are quite significant; and HAL accompanied with CRF approach seems to provide effective outcome for Hindi NE Recognition.

ISSN 0128-7680

e-ISSN 2231-8526

Article ID

JST-0994-2018

Download Full Article PDF

Share this article

Recent Articles