•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 701
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 251
  Catalogue
The paper presents a CRF based hybridized chunker for Hindi and Indian English. The immediate
goal is to chunk text data in the ILCI project funded by DeitY, Govt of India. The experiment was
conducted on 25k annotated sentences on the data from health and tourism domains. 23k sentences
were used for training and the rest 2k sentences were used for evaluation. The experiment involved
the following stages: training the chunker, automatic chunking and validation of chunked output for
Hindi and Indian English; and finding measures to solve issues detected at different levels of
experiment. The chunker for Indian English is developed on ILMT chunk tag scheme to meet the
necessary mapping requirements of the translation tool for English to Indian languages. The
accuracies of Hindi and Indian English chunker are 88.84% & 89.04 %, respectively. So far as
Hindi chunker is concerned, we have observed errors in the chunk categories such as noun
(pronominal), verb finite, verb non-finite (conjunct verb), adjectival phrase etc. Errors like finitenon-finite, adverb-conjunction, wh-determiner and conjunction chunk etc are discussed in detail for
the development of English chunker. Implementation of hybrid approach for error resolution has
also been attempted.

Added on June 8, 2016

133

  More Details
  • Contributed by : Atul
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Atul Kr. Ojha, Srishti Singh, Pitambar Behera and Girish Nath Jha
Author Community Profile :
Similar / Suggested Resources