• Hindi ASR Challenge Data (ASR Speech Data released under 1st Challenge) - NLTMP
Hindi ASR Challenge Data (ASR Speech Data released under 1st Challenge) - NLTMP
  • Contributor: ASR Consortia
  • Product Code: NLTMP-ASR-CHALLENGE-HIN-002

Available Under License: Research  

Sample Download | size: 66MB | type: zip
Added on : 10 Jun 2021

The data set comprises of Hindi read speech data along with the corresponding transcriptions. The text data was crawled from newspapers, and then volunteers were asked to read them. It covers genres like politics, sports, entertainment, etc. Lexicon, baseline models, results and recipes to replicate the baseline experiments are also made available The following data sets are released for this challenge: Train set - 40 hours Development set - 5 hours Evaluation set - 5 hours


Speech Data Attributes
Language Hindi
Transcription Yes, Available
Duration 50 hours
Recording Environment Read Speech and Lectures on Tablets
Speaker Type Hindi Native Speaker
BitRate 16 KHz
No. of Audio Segment Tokens- 15.5k words , Sentences- 33.5k (train+dev+eval)
Speaker Gender Both Male & Female

Write a review

Please login or register to review

Tags: Hindi, ASR Challenge Data, ASR, Speech Data, NLTM Pilot

Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.