The closeness of the machine translation in Indian language to the actual human translation can be evaluated on the basis of the grammatical classification of Testsuite. Various Lexical and Structural categories have been identified and they further have been subcategorized to make fine-grained classification.
Added on November 10, 2010
Product Type : Linguistic Resources
License Type : Research
System Requirement :
Windows,Linux,Mac OS X,Solaris os
Paradigm Table consists of part of speech categories to cater for GNP information and post position to the root word (with appropriate modification). The categorization follows norms led down by AnglaBharati requirement.
The system uses the IITK-Punjabi Roman notations for the translation process. When it comes to the user interface side these notations have to be converted to some standard codes like Punjabi Unicode. Similarly the lexical data which is in entered in Punjabi Unicode has to be converted to IITK-Punjabi Roman notation for internal processing. The system output is displayed in Devanagari and INSROT also,to enable a non native person to read any target language.
A post-processor is an integral part of any OCR system. This paper proposes a method for detection and correction of errors in recognition results of handwritten and machine printed Gurmukhi OCR. Based on the shape similarity of characters, the consonants of Gurmukhi Script are divided into different sets.