Speech Resources Standards

Speech processing provides powerful capabilities for improving the interaction between humans and machines, and between humans using machines. Speech processing can also be enhanced with Natural Language Processing (NLP) technology to model the human capacity to comprehend and process the content of human language, and to enable translation of a spoken sentence from one language to another, and many other intelligent linguistic applications. Speech Tools help in a great extent for providing information access interface to differently-abled persons such as people with visual and cerebral disability.

Since Speech Resources are the key building blocks for development of speech based systems, initiatives are being taken to develop speech resources for Indian Languages. To develop Speech Resources for synthesis, recognition and speaker identification, different standards and methodologies are required.

Role of TDIL

TDIL Programme of MeitY is actively collaborating with bodies like W3C, LDC (Linguistic Data Consortium) etc to formulate relevant standards.Initiative is also being taken to develop speech resources and standards to promote speech and voice enabled solutions in Indian languages such as

1. Minimum 50 Hours Annotated Speech Corpora for Text to Speech Systems in Indian Languages
2. Pronunciation Lexicon for Indian Languages as per W3C Pronunciation Specification

Speech Corpus for Assamese, Bangla, Hindi, Manipuri, Marathi and Punjabi are already build and available freely for non-commercial research purpose to Indian Researchers.