A simplified robust OCR Software for printed Indian scripts, which can deliver reasonable performance for possible conversion of legacy, printed documents into electronically accessible format.This System is an outcome of effort of consortium members sponsored by Ministry of Electronics and Information Technology.The preprocessing modules such as Noise cleaning,Skew detection, binarization modules have been developed by different consortium institutes. The Language Vertical tasks and integration have been carried out by various consortia members.
The potential of e-Aksharayan is enormous as it enables users to harness the power of computers to access printed documents in Indian language/scripts.
Present version of e-Aksharayan supports major Indian languages- Hindi, Bangla, Malayalam, Gurmukhi, Tamil, Kannada & Assamese.
It converts printed document images to editable text with upto 90-95% recognition accuracy at character level & 85-90% at word level.
Current version of e-Aksharayan takes 45 to 60 sec to process an A4 size document.
Technology Development for Indian Languages (Room No 2072),
Ministry of Electronics & Information Technology, Electronics Niketan, 6, CGO Complex, New Delhi - 110 003
tdildc@cdac.in
tusharpatnaik@cdac.in
schaudhury@gmail.com