•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 707
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 255
The present paper discusses the training and evaluation of the CRF and SVM algorithms for Indo-Aryan languages: Hindi, Odia and
Bhojpuri. For annotation of the corpus, we have used Bureau of Indian Standards (BIS) annotation scheme which is a common
standard of annotation for Indian languages. The main objective of the paper is to provide an idea of the error pattern and suggestions
following the same algorithms. The experiment is conducted with 90k tokens training and 2k tokens test data each, for ease of
comparison among languages. In the evaluation report, we focus on each tool (SVM and CRF++) at the level of accuracy, error
analysis of the tools, the error pattern and common error of the system. The accuracy of the SVM taggers ranges between 88 to 93.7
% whereas CRF ranges between 82 to 86.7%. CRF performs less qualitatively than SVM for Odia and Hindi which is not true for
Bhojpuri. In this study, we have observed that languages having more variations are suitable for CRF in comparison to SVM.

Added on February 23, 2016


  More Details
  • Contributed by : Atul
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Atul Kr. Ojha, Pitambar Behera, Srishti Singh and Girish Nath Jha
Author Community Profile :
Similar / Suggested Resources