•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 707
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 255
In this paper, we present a comparative analysis between three methods for statistical part-of-speech(POS) tagging, chunking and named entity recognition(NER) for a morphologically rich language, Hindi, using a large annotated corpus. The methods explored are Conditional Random Fields(CRF), Hidden Markov Models(HMM) and Maximum Entropy Model(MaxEnt). We further propose an iterative approach as a method to improve the results. To the best of our knowledge, there is no previous work on comparative analysis of statistical POS tagging, chunking and NER in Hindi using the three methods when a large manually annotated corpus is used. The maximum POS tagging, chunking and NER accuracies for CRF, HMM and MaxEnt achieved are (94.00%, 91.70%, 56.03%), (92.96%, 89.23%, 48.21%) and (92.88%, 85.48%, 49.09%) respectively. Our work shows that CRF performs consistently better than HMM and MaxEnt for all of the three abovementioned tasks.
For Full Paper : Click Here

Added on February 27, 2012


  More Details
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Manish Agarwal,Rahul Goutam,Ashish Jain,Sruthilaya Reddy Kesidi,Prudhvi Kosaraju,Shashikant Muktyar,Bharat Ambati,Rajeev Sangal
Author Community Profile :
Similar / Suggested Resources