A comprehensive recognition system has been developed for open vocabulary, online handwrittentet in Tamil language. A page of text can be segmented at the line, word and then the symbol level. The symbols are recognized using a SVM classifier with RBF kernel trained to recognize 155 distinct Tamil symbols, which can make up all the 313 different characters in Tamil. By analyzing
the cross‐validation performance of the classifier, the sets of confused symbols have been identified. If the recognition label of a symbol corresponds to that of a confused symbol, then the feature vector of the corresponding stroke group is fed to an expert
classifier trained only on the set of confused symbols. Then, the recognized symbols of each
word are corrected using a symbol level bigram model derived from a huge text corpus. Finally,
the sequence of symbol labels corresponding to each word is converted to the Tamil Unicode
sequence using a set of rules. The recognition engine at the level of a handwritten word has
been developed in C as a .dll and integrated with the census data collection application
developed by CDAC Pune. On the annotated dataset of 45,405 words collected from over a
hundred Tamil writers, the engine has a recognition performance of 83.2% at the symbol level
and 54.2% at the word level, without the use of expert classifiers. A separate SVM has been
trained to recognize the Indo‐Arabic numerals 0 to 9, with a cross validation accuracy of 98%.
Added on September 23, 2014
Product Type : Research Paper
License Type : Freeware
System Requirement :
Author : A. G. Ramakrishnan, Bhargava Urala, Suresh Sundaram, Harshitha PV.