Hindi ASR Speech Data

Hindi ASR Challenge Data (ASR Speech Data released under 1st Challenge) - NLTMP

Contributor: ASR Consortia
Product Code: NLTMP-ASR-CHALLENGE-HIN-002

Available Under License: Research

Sample Download | size: 0B | type: zip

Added on : 10 Jun 2021

The data set comprises of Hindi read speech data along with the corresponding transcriptions. The text data was crawled from newspapers, and then volunteers were asked to read them. It covers genres like politics, sports, entertainment, etc. Lexicon, baseline models, results and recipes to replicate the baseline experiments are also made available The following data sets are released for this challenge: Train set - 40 hours Development set - 5 hours Evaluation set - 5 hours

Speech Data Attributes
Language	Hindi
Transcription	Yes, Available
Duration	50 hours
Recording Environment	Read Speech and Lectures on Tablets
Speaker Type	Hindi Native Speaker
BitRate	16 KHz
No. of Audio Segment	Tokens- 15.5k words , Sentences- 33.5k (train+dev+eval)
Speaker Gender	Both Male & Female

Tags: Hindi, ASR Challenge Data, ASR, Speech Data, NLTM Pilot

Write a review