Your cart is empty!
0 reviews / Write a review
Available Under License: Research
The data set comprises of Indian English read speech and lecture speech data along with the corresponding transcriptions. The read speech covers genres like politics sports, entertainment, etc. It was collected by Speech Lab ITM and has text data crawled from newspapers. The volunteers were asked to read them. The lecture speech data was obtained from Computer Science and Electrical lectures of NPTEL. The read speech corpus is named IITM whereas the lecture speech corpus is referred to as NPTEL. Lexicon, baseline models, results and recipes to replicate the baseline experiments are also made available. The following data sets are released for this challenge. Train set - 280 hours --- IITM (80 hours) + NPTEL (200 hours) Development set IITM - 6 hours --- IITM Development set NPTEL - 5 hours --- NPTEL Evaluation set IITM - 6 hours --- IITM Evaluation set NPTEL - 5 hours --- NPTEL
Tags: Indian English, ASR Challenge Data, ASR Speech Data, NLTM Pilot, Speech Corpus, Speech, Corpus