Web OCR

Click Here to access web-based OCR


The Objective of the OCR system is to develop robust OCR's for printed Indian scripts, which can deliver desired performance for possible conversion of legacy, printed documents into electronically accessible format. The system has been developed for Bangla, Devanagari, Gurumukhi, Kannada, Malayalam, Tamil, Telugu and it will soon be available for Gujrati, Oriya, Tibetan, Assamese,Manipuri,Urdu script in future. Indian Language OCR being a consortium based project is having a hybrid approach, designed to work with the platform and technology independent modules. This system has been developed to facilitate the digitization of the multi-lingual textual images. The area of coverage of the system is Printed Text OCR. The implementing Agency comprises of Consortium with IIT Delhi as Consortium Leader .This Sytem is an outcome of effort of consortium members sponsored by Ministry of Communication and Information Technology. The preprocessing modules such as Noise cleaning,Skew detection, binarization modules have been developed by different consortium institutes. The Language Vertical tasks and integration have been carried out by various consortia members.