Cloud Tags
  • 1
  • Machine Translation
  • A)
  • Development of English to Indian Languages Machine Translation (MT) System:
  • B) Development of English to Indian Languages Machine Translation (MT) System with Angla-Bharti Technology
  • C) Development of Indian Language to Indian Language Machine Translation System:
  • 2. Development of Cross-lingual Information Access
  • 3. Development of Robust Document Analysis & Recognition System for Indian Languages (OCR)
  • 4. Development of On-line Handwriting Recognition System (OHWR)
  • 5. Development of Text to Speech System for Indian Languages
  • 6. Development of Automatic Speech Recognition in Indian Languages

Long Term Research Areas

1. Machine Translation
In Machine translation one natural language gets translated to another language using computational applications without real time human interface or with minimal human effort, the various software’s being developed under the Machine Translation project are as follows:-


A) Development of English to Indian Languages Machine Translation (MT) System:

Since majority of the Indian population could not read or write in English, while most of the information available on web or electronic media is in English language, therefore to reach out to the common man across various sections, an automatic language translator is important. Hence to begin with, two specific domains are identified as Tourism and Health for the machine translation .The project is being implemented in consortium mode and ten institutions are participating to build the system. At present, six languages pairs are being targeted under Tourism and Health domain are-

i) English to Hindi
ii) English to Marathi
iii) English to Bengali
iv) English to Odia
v) English to Tamil
vi) English to Urdu

B) Development of English to Indian Languages Machine Translation (MT) System with Angla-Bharti Technology

ANGLABHARTI represents a machine-aided translation methodology specifically designed for translating English to Indian languages. Angla-Bharti uses pattern directed approach using context free grammar like structures. It analyses English only once and creates an intermediate structure called PLIL (Pseudo Lingua for Indian Languages). The PLIL structure is then converted to each Indian language through a process of text-generation. There is a provision for automatic pre-editing & paraphrasing, recognition of named-entities and incorporated an error-analysis module and statistical language-model for automated post-editing. The purpose of automatic pre-editing module is to transform/paraphrase the input sentence to a form, which is more easily translatable. The project had being implemented in consortium mode with four institutions are participating to build the system. The six languages pairs being targeted are English to Hindi/ Marathi/ Bengali/ Odia/ Tamil/ Urdu. At present, six languages pairs under the domain of Tourism and Health are-

i) English to Hindi
ii) English to Marathi
iii) English to Bengali
iv) English to Odia
v) English to Tamil
vi) English to Urdu

C) Development of Indian Language to Indian Language Machine Translation System:
As India has 22 constitutionally recognized languages, Indian Language to Indian Language Machine Translation system (IL-ILMT) is an important application to convert text written in one Indian language to other Indian language. The project is being implemented in consortium mode and eleven institutions are participating to build it the system. At present, nine languages pairs under the domain of Tourism and Health are-

i. Tamil to Hindi
ii. Telugu to Hindi
iii. Urdu to Hindi
iv. Kannada to Hindi
v. Punjabi to Hindi
vi. Marathi to Hindi
vii. Bengali to Hindi
viii. Tamil to Telugu
ix. Malayalam to Tamil

2. Development of Cross-lingual Information Access
Cross-Language Information Access is an extension of the Cross-Language Information Retrieval Paradigm. It enables a user to enter queries in languages they are familiar with, and retrieves information in the language of the query as well as information available in other languages.

The objective of Cross-Language Information Access is to introduce additional post retrieval processing to enable users make sense of these retrieved documents. This additional processing may take the form of machine translation of snippets, summarization and subsequent translation of summaries and/or information extraction. The project is being implemented in consortium mode and eleven institutions are participating to build the system. At present, six languages are being targeted under Tourism and Health domain are-

i) Hindi
ii) Marathi
iii) Bengali
iv) Punjabi
v) Tamil
vi) Telugu

3. Development of Robust Document Analysis & Recognition System for Indian Languages (OCR)
Optical Character Recognition (OCR) is a utility tool for digitizing the content and is essential for development of knowledge networks such as digital libraries. OCR technology offers the facility to scan and store the printed text in editable format. There are three basic elements of OCR technology - scanning, recognition and then reading text. The project is being implemented in consortium mode. The eleven scripts being targeted are:-

i. Bengali
ii. Devanagari
iii. Gujarati
iv. Gurumukhi
v. Kannada
vi. Malayalam
vii. Odia
viii. Tamil
ix. Telugu
x. Tibetan
xi. Nepali

4. Development of On-line Handwriting Recognition System (OHWR)
On-line handwriting recognition system (OHWR) is a useful tool that converts the written strokes of an individual into editable text thus bypassing the need for a keyboard for text entry. There are seven institutions participating to build the On-Line Handwriting Recognition System in consortium mode. The six languages being targeted are:-

i. Bengali
ii. Devanagari
iii. Kannada
iv. Malayalam
v. Tamil
vi. Telugu

5. Development of Text to Speech System for Indian Languages
Consortium Mode Project has been initiated to develop Text-to-Speech system in 13 Indian Languages namely Hindi, Bengali, Marathi, Tamil, Telugu, Malayalam, Gujarati, Odia, Assamese, Manipuri, Kannada, Bodo and Rajasthani. The objective of the project is to develop and deploy Text to Speech system for visually challenged persons with JAWS product (For English) like functionality, which will be an application for benefit of social cause.

6. Development of Automatic Speech Recognition in Indian Languages
Consortium Mode project has been initiated for development of Automatic Speech Recognition system for accessing prices of agricultural commodities through telephone channel As an interface on NIC website , which is multilingual and provides information on agricultural commodities .