Click Here to access web-based OCR
The Objective of the OCR system is to develop robust OCR's for printed Indian scripts, which can deliver desired performance for possible conversion of legacy, printed documents into electronically accessible format. The system has been developed for Bangla, Devanagari, Gurumukhi, Kannada, Malayalam, Tamil, Telugu, Urdu, Assamese and it will soon be available for Gujarati, Oriya, Tibetan, Manipuri, script in future.
Indian Language OCR being a consortium based project is having a hybrid approach, designed to work with the platform and technology independent modules. This system has been developed to facilitate the digitization of the multi-lingual textual images. The area of coverage of the system is Printed Text OCR with features like dictionary building and spell checker . This system is an outcome of an effort of consortium members sponsored by DeitY. The pre-processing modules such as Noise cleaning, skew detection, binarization modules have been developed various involved consortium institutes. It has features like dictionary building and spell checker, The Language Vertical tasks and integration have been carried out by various consortia members.