Speech Lab, IIT Madras, presents the Automatic Speech Recognition (ASR) in Hindi challenge. This challenge is a part of the National Language Translation Mission funded by MeitY. It aims towards helping and encouraging the advancement of ASR in Indian Languages.
CHALLENGE OVERVIEW
Recent advancements in Speech technology has shown that ASR systems can work on par with humans. To build an efficient ASR system, would require large amounts of training data and high-end computational resources.
However, when it comes to Indian languages, not everyone, especially academic institutions and startups, have access to these resources. As a part of this challenge, we will be releasing speech data in Hindi. Everyone who participates in this challenge will then be free to use this data for research purposes.
DATA SET AND BASELINE RECIPES
The data set comprises of Hindi read speech data along with the corresponding transcriptions. The text data was crawled from newspapers, and then volunteers were asked to read them. It covers genres like politics, sports, entertainment, etc. The following data sets will be released for this challenge:
Train set - 40 hours
Development set - 5 hours
Evaluation set - 5 hours
Lexicon, baseline models, results and recipes to replicate the baseline experiments will also be made available.
Closed Hindi-ASR Challenge: Only the training data distributed as part of the challenge can be used to train the models (both acoustic and language models)
Open Hindi-ASR Challenge: You can use any external/additional data to train the acoustic and language models.
Check out this link more details