Indian Language Technology Proliferation and Deployment Centre

HINDI ASR CHALLENGE

Details

Speech Lab, IIT Madras, presents the Automatic Speech Recognition (ASR) in Hindi challenge. This challenge is a part of the National Language Translation Mission funded by MeitY. It aims towards helping and encouraging the advancement of ASR in Indian Languages.

CHALLENGE OVERVIEW

Recent advancements in Speech technology has shown that ASR systems can work on par with humans. To build an efficient ASR system, would require large amounts of training data and high-end computational resources.

However, when it comes to Indian languages, not everyone, especially academic institutions and startups, have access to these resources. As a part of this challenge, we will be releasing speech data in Hindi. Everyone who participates in this challenge will then be free to use this data for research purposes.

DATA SET AND BASELINE RECIPES

The data set comprises of Hindi read speech data along with the corresponding transcriptions. The text data was crawled from newspapers, and then volunteers were asked to read them. It covers genres like politics, sports, entertainment, etc. The following data sets will be released for this challenge:

Train set - 40 hours

Development set - 5 hours

Evaluation set - 5 hours

Lexicon, baseline models, results and recipes to replicate the baseline experiments will also be made available.

Closed Hindi-ASR Challenge: Only the training data distributed as part of the challenge can be used to train the models (both acoustic and language models)

Open Hindi-ASR Challenge: You can use any external/additional data to train the acoustic and language models.

Check out this link more details

Indian Language Technology Proliferation & Deployment Centre

भारतीय भाषा प्रौद्योगिकी प्रसरण एवं विस्तारण केंद्र