e-ISSN 2231-8542
ISSN 1511-3701
Nurul Adzlyana, M. S., Rosma, M. D. and Nurazzah, A. R.
Pertanika Journal of Tropical Agricultural Science, Volume 25, Issue 2, April 2017
Keywords: Categorical data, context-based, data mining, similarity measure
Published on: 27 Apr 2017
Data mining processes such as clustering, classification, regression and outlier detection are developed based on similarity between two objects. Data mining processes of categorical data is found to be most challenging. Earlier similarity measures are context-free. In recent years, researchers have come up with context-sensitive similarity measure based on the relationships of objects. This paper provides an in-depth review of context-based similarity measures. Descriptions of algorithm for four context-based similarity measure, namely Association-based similarity measure, DILCA, CBDL and the hybrid context-based similarity measure, are described. Advantages and limitations of each context-based similarity measure are identified and explained. Context-based similarity measure is highly recommended for data-mining tasks for categorical data. The findings of this paper will help data miners in choosing appropriate similarity measures to achieve more accurate classification or clustering results.
ISSN 1511-3701
e-ISSN 2231-8542