Research in Parts-of-Speech (POS) tagset
design for European and East Asian languages
started with a mere listing of important
morphosyntactic features in one language
and has matured in later years towards
hierarchical tagsets, decomposable
tags, common framework for multiple languages
(EAGLES) etc. Several tagsets
have been developed in these languages
along with large amount of annotated data
for furthering research. Indian Languages
(ILs) present a contrasting picture with
very little research in tagset design issues.
We present our work in designing a common
POS-tagset framework for ILs, which
is the result of in-depth analysis of eight
languages from two major families, viz.
Indo-Aryan and Dravidian. Our framework
follows hierarchical tagset layout similar to
the EAGLES guidelines, but with significant
changes as needed for the ILs.

Added on June 8, 2016


  More Details
  • Contributed by : Atul
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Sankaran Baskaran, Kalika Bali, Tanmoy Bhattacharya, Pushpak Bhattacharyya, Rajendran S, Saravanan K, Sobha L and Subbarao K V
Author Community Profile :
