This article proposes an improved XML standard for storing online handwritten data in Indian languages. This standard has evolved over a period of two years, and is currently being used by the Consortium for online handwritten recognition of Indian languages, for annotating about 100,000 handwritten words in each of six Indian languages, namely, Tamil, Kannada, Telugu, Malayalam, Hindi and Bangla. In order that the huge amount of data that is being collected is useable by the future researchers, it is preferable that the data is stored in a format that is unambiguous and easy to read. The uniqueness of this refined standard is that it gives quality labels at different levels to the data, and has provision to annotate all the peculiarities of writing the script of the various Indian languages included in the current consortium project. The current format allows the use of automated and semi-automated annotation tools.
Added on August 19, 2014
Product Type : Research Paper
License Type : Freeware
System Requirement :
Author : Swapnil Belhe, Srinivasa Chakravarthy, AG Ramakrishnan