In this paper, we present a comparative analysis between three methods for statistical part-of-speech(POS) tagging, chunking and named entity recognition(NER) for a morphologically rich language, Hindi, using a large annotated corpus. The methods explored are Conditional Random Fields(CRF), Hidden Markov Models(HMM) and Maximum Entropy Model(MaxEnt). We further propose an iterative approach as a method to improve the results. To the best of our knowledge, there is no previous work on comparative analysis of statistical POS tagging, chunking and NER in Hindi using the three methods when a large manually annotated corpus is used. The maximum POS tagging, chunking and NER accuracies for CRF, HMM and MaxEnt achieved are (94.00%, 91.70%, 56.03%), (92.96%, 89.23%, 48.21%) and (92.88%, 85.48%, 49.09%) respectively. Our work shows that CRF performs consistently better than HMM and MaxEnt for all of the three abovementioned tasks.
For Full Paper : Click Here