Indian English ASR Challenge Data (ASR Speech Data)

Indian English ASR Challenge Data (ASR Speech Data) - NLTM Pilot

Contributor: ASR Consortia
Product Code: NLTMP-ASR-CHALLENGE-ENG-001

Available Under License: Research

Sample Download | size: 0B | type: tar

Added on : 10 Jun 2021

The data set comprises of Indian English read speech and lecture speech data along with the corresponding transcriptions. The read speech covers genres like politics sports, entertainment, etc. It was collected by Speech Lab ITM and has text data crawled from newspapers. The volunteers were asked to read them. The lecture speech data was obtained from Computer Science and Electrical lectures of NPTEL. The read speech corpus is named IITM whereas the lecture speech corpus is referred to as NPTEL. Lexicon, baseline models, results and recipes to replicate the baseline experiments are also made available. The following data sets are released for this challenge. Train set - 280 hours --- IITM (80 hours) + NPTEL (200 hours) Development set IITM - 6 hours --- IITM Development set NPTEL - 5 hours --- NPTEL Evaluation set IITM - 6 hours --- IITM Evaluation set NPTEL - 5 hours --- NPTEL

Speech Data Attributes
Language	Indian Accent English
Transcription	Yes, Available
Duration	302 Hours
Recording Environment	Studio recorded or classroom recorded, Read Speech and Lectures
Speaker Type	Indian people, native language can be any.
BitRate	16 KHz
No. of Audio Segment	Token-48k, Sentences-168k sentences
Speaker Gender	Both Male & Female

Tags: Indian English, ASR Challenge Data, ASR Speech Data, NLTM Pilot, Speech Corpus, Speech, Corpus

Write a review