Speech Corpus

NLTM Pilot TTS Data for Indian Languages — Hindi, Punjabi, Tamil, and Indian English.

NLTM Pilot TTS Data for Indian Languages — Hindi, Punjabi, Tamil, and Indian English.

TTS data for Indian languages — Hindi, Punjabi, Tamil, and Indian English. Text and corresponding speech data record in studio environment...

Available Under License:
CC BY-SA 2.0  

Sample Download | size: 423.2MB | type: zip

Added on : 16 Aug 2021

Indian English ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

Indian English ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

The data set comprises of English read and conversational speech data along with the corresponding transcriptions. This speech data was collected by S..

Available Under License:
Research  

Added on : 26 Jul 2021

Hindi ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

Hindi ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

The data set comprises of Hindi read and conversational speech data along with the corresponding transcriptions. This speech data was collected by Spe..

Available Under License:
Research  

Added on : 26 Jul 2021

Hindi ASR Challenge Data (ASR Speech Data released under 1st Challenge) - NLTMP

Hindi ASR Challenge Data (ASR Speech Data released under 1st Challenge) - NLTMP

The data set comprises of Hindi read speech data along with the corresponding transcriptions. The text data was crawled from newspapers, and then volu..

Available Under License:
Research  

Sample Download | size: 66MB | type: zip

Added on : 10 Jun 2021

Tamil ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

Tamil ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

The data set comprises of Tamil read and conversational speech data along with the corresponding transcriptions. This speech data was collected by Spe..

Available Under License:
Research  

Added on : 26 Jul 2021

Indian English ASR Challenge Data (ASR Speech Data) - NLTM Pilot

Indian English ASR Challenge Data (ASR Speech Data) - NLTM Pilot

The data set comprises of Indian English read speech and lecture speech data along with the corresponding transcriptions. The read speech covers genre..

Available Under License:
Research  

Sample Download | size: 23.7MB | type: tar

Added on : 10 Jun 2021

Telugu Speech Data- ASR

Telugu Speech Data- ASR

This corpus contains the 6019 audio files of Telugu language of approx. 1000 native speakers.  This data was prepared for Agricultural Commo..

Available Under License:
CC BY-SA 2.0  

Sample Download | size: 1.7MB | type: zip

Added on : 21 Jan 2021

BIHARI SPEECH DATA - ASR

BIHARI SPEECH DATA - ASR

This corpus contains the 54866 audio files of Bihari language of approx. 1000 native speakers. This corpus also  contains word and its correspond..

Available Under License:
CC BY-SA 2.0  

Sample Download | size: 1.4MB | type: zip

Added on : 21 Jan 2021

Bengali Speech Data – ASR

Bengali Speech Data – ASR

This corpus contains the more than 43134 audio files of Bengali language of approx. 1000 native speakers. This corpus also contains word and its corre..

Available Under License:
CC BY-SA 2.0  

Sample Download | size: 981.8KB | type: zip

Added on : 12 Jan 2021

HINDI Speech Data – ASR

HINDI Speech Data – ASR

This corpus contains the more than 194714 audio files of HINDI language of approx. 1000 native speakers. This corpus also contains word and its c..

Available Under License:
CC BY-SA 2.0  

Sample Download | size: 2.7MB | type: zip

Added on : 12 Jan 2021

Marathi Speech Data - ASR

Marathi Speech Data - ASR

This corpus contains the more than 44521 audio files of Marathi language of 1500 speakers, dic file which contains word and its corresponding phonetic..

Available Under License:
CC BY-SA 2.0  

Sample Download | size: 2.1MB | type: zip

Added on : 11 Dec 2020

Tamil Speech Data- ASR

Tamil Speech Data- ASR

This corpus contains the more than 88175 audio files of Tamil language of approx. 1000 native speakers. This corpus contains word and its correspondin..

Available Under License:
CC BY-SA 2.0  

Sample Download | size: 2.7MB | type: zip

Added on : 04 Dec 2020

Odia Speech Data – ASR

Odia Speech Data – ASR

This corpus contains the more than 11940 audio files of Odia language of approx. 1000 native speakers. This corpus contains word and its corresponding..

Available Under License:
CC BY-SA 2.0  

Sample Download | size: 1.6MB | type: zip

Added on : 04 Dec 2020

Kannada Speech Data – ASR

Kannada Speech Data – ASR

This corpus contains the more than 93803 audio files of Kannada language of 1000 native speakers, Callflow1.dic file which contains word and its corre..

Available Under License:
CC BY-SA 2.0  

Sample Download | size: 973.4KB | type: zip

Added on : 04 Dec 2020

HINDI (JHARKHAND) Speech Data – ASR

HINDI (JHARKHAND) Speech Data – ASR

This corpus contains the more than 36694 audio files of HINDI (JHARKHAND)  language of approx. 1000 native speakers. This corpus also contains wo..

Available Under License:
CC BY-SA 2.0  

Sample Download | size: 2MB | type: zip

Added on : 03 Dec 2020

Showing 1 to 15 of 45 (3 Pages)
Disclaimer: The information provided on this page has been procured through different sources. Please write back to us at nplt_support[at]cdac[dot]in in case you would like to suggest an update.