•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 707
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 255
A well annotated corpus is a treasure for Natural Language Processing (NLP) and can benefit NLP research activities like Machine
Translation, Text Summarization and Information Retrieval. But since language is a dynamic and complex phenomenon, Part Of
Speech (POS) annotation and Local Word Grouping or chunking prove to be challenging tasks mainly because of two reasons: first,
maximum possible information about the structure of a sentence needs to be captured and second, the tags should be easy for the
machine to map and facilitate desirable output resulting in an effective application. The present paper deals with issues faced in
chunking verb groups in Hindi with respect to their mapping with English verb groups for machine translation. There are some verbal
constructions in Hindi which are not present in English e.g. double causatives and serial constructions. Thus the task of mapping Hindi
verbal groups with English for the purpose of translation can restrict the accuracy of the output attained. These divergences have been
charted out with some relevant examples from both the languages. The purpose of describing these divergence issues is to find the
most appropriate way of creating Chunk Annotation Tag-set standards which are currently under development for Indian languages.

Added on June 6, 2016


  More Details
  • Contributed by : Atul
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Esha Banerjee, Akanksha Bansal and Girish Nath Jha
Author Community Profile :
Similar / Suggested Resources