Home / Regular Issue / JTAS Vol. 26 (1) Mar. 2018 / JSSH-1661-2016

 

Identification of Gender of the Author of a Written Text using Topic-Independent Features

Tatiana Litvinova, Pavel Seredin, Olga Litvinova and Olga Zagorovskaya

Pertanika Journal of Tropical Agricultural Science, Volume 26, Issue 1, March 2018

Keywords: Authorship profiling, corpus, corpus linguistics, gender attribution, gender identification, Russian language, stylometry

Published on: 20 Mac 2018

Authorship profiling, which is the process of extraction of information about a text's author through linguistics analysis, is now gaining momentum as an interdisciplinary subject. Scholars who employ this technique (i.e. data analysis specialists, linguists, psychologists) study the identification of demographics, personality traits, education and the native language of authors of texts, among others. Gender, in this context, is the most popular variable. Some studies report accuracy as high as 80% or even higher in identifying the gender of a text's author. However, there are still many issues that must be addressed. Firstly, most of the previous research concerns English texts. Secondly, most of the papers focus on content-based features, which are obviously easily to imitate. Thirdly, many recent papers in the field make use of machine-learning algorithms with emphasis on accuracy, not on the differences between male and female writing. The objective of this paper is to reveal differences in male and female Russian written texts and to design a mathematical model to identify the gender of authors of texts using only high-frequency topic-independent text parameters. Special emphasis is made on comparing the obtained data on the differences in male and female written texts with those previously obtained for Russian and other languages. An original mathematical solution for identification of author's gender is set forth.

ISSN 1511-3701

e-ISSN 2231-8542

Article ID

JSSH-1661-2016

Download Full Article PDF

Share this article

Recent Articles