National Platform for Language Technology
  • Skip to Main Content
  • Announcement 1
  • Sign up
    • Register
    • Login
  • Save for later (0)
  • Feedback
    • Your cart is empty!

Highlights / Announcement

New Services Added on Portal
  • About
    • NLTM
    • NPLT
    • NLTM Advisors
    • NLTM Consortium
  • Resources
    • Text Corpus
    • Tools
    • Speech Corpus
    • WordNet
    • Treebank
    • PLS
    • Other Repositories
    • By Private Players
    Show All Resources
  • Services
    • Machine Translation
    • Speech Recognizer
    • Text to Speech
    • Transliteration
    • OCR
    • Govt. Services
    • Startups Services
    • Third Party Services
    Show All Services
  • Demonstration
  • Startups
    • Startup Wall
    • Mentor Wall
  • LeaderBoard
  • Dashboard
  • Marketplace
    • Data Marketplace
    • Translation Marketplace
Localization Logo
TDIL
Meity Startup
Startup Wall
Dashboard
C-DAC : Transliteration
  • Search

Search

Products meeting the search criteria

Product Compare (0)
NLTM Pilot TTS Data for Indian Languages — Hindi, Punjabi, Tamil, and Indian English.

NLTM Pilot TTS Data for Indian Languages — Hindi, Punjabi, Tamil, and Indian English.

TTS data for Indian languages — Hindi, Punjabi, Tamil, and Indian English. Text and corresponding speech data record in studio environment....

Contributor:  TTS Consortia
Tags:  TTS Data,Speech Data, Hindi TTS Data, Punjabi TTS Data, Tamil TTS Data, Indian English TTS Data, IITM
Redirect to external website
click here
Indian English ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

Indian English ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

The data set comprises of English read and conversational speech data along with the corresponding transcriptions. This speech data was collected by Speech Lab IITM and several startups. The text data...

Contributor:  ASR Consortia
Tags:  Indian English, ASR Challenge Data, ASR Speech Data, NLTM Pilot, Speech Corpus, Speech, Corpus
Redirect to external website
click here
Hindi ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

Hindi ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

The data set comprises of Hindi read and conversational speech data along with the corresponding transcriptions. This speech data was collected by Speech Lab IITM and several startups. The text data w...

Contributor:  ASR Consortia
Tags:  Hindi, ASR Challenge Data, ASR Speech Data, NLTM Pilot, Speech Corpus, Speech, Corpus
Redirect to external website
click here
Hindi ASR Challenge Data (ASR Speech Data released under 1st Challenge) - NLTMP

Hindi ASR Challenge Data (ASR Speech Data released under 1st Challenge) - NLTMP

The data set comprises of Hindi read speech data along with the corresponding transcriptions. The text data was crawled from newspapers, and then volunteers were asked to read them. It covers genres l...

Contributor:  ASR Consortia
Tags:  Hindi, ASR Challenge Data, ASR, Speech Data, NLTM Pilot
Redirect to external website
click here
Indian English Raw Speech Corpus - Kannada Variant

Indian English Raw Speech Corpus - Kannada Variant

Dataset Description23:43:04 Hours | 15.3 GB | 56 Speakers| 14,455 Audio Segments | 48 kHz | 16 bit wav. English language is a blend of Anglo-Saxon which is the prominent language of Britain in mi...

Contributor:  CIIL Mysore
Tags:  Indian English, Raw Speech Corpus, Kannada Variant, Speech Corpus
Redirect to external website
click here
Indian English Raw Speech Corpus - Bengali Variant

Indian English Raw Speech Corpus - Bengali Variant

Dataset Description 25:47:11 Hours | 15.5 GB | 53 Speakers| 16,044 Audio Segments | 48 kHz | 16 bit wav.English language is a blend of Anglo-Saxon which is the prominent language of Britain in mi...

Contributor:  CIIL Mysore
Tags:  Indian English, Raw Speech Corpus, Bengali Variant, Speech Corpus
Redirect to external website
click here
Multilingual Raw Speech Corpus

Multilingual Raw Speech Corpus

Dataset Description 97:43:54 Hours | 62.2 GB speech data | 1916 Speakers | 1,916 Audio segment...

Contributor:  CIIL Mysore
Tags:  Multilingual, Raw Speech Corpus, Speech Corpus
Redirect to external website
click here
Tamil Raw Speech Corpus

Tamil Raw Speech Corpus

Dataset Description139:11:41 Hours | 86 GB speech data | 452 Speakers | 60,287 Audio segments | 48 kHz | 16 bit wav. Tamil is one of the longest-surviving classical languages in the world. &nbs...

Contributor:  CIIL Mysore
Tags:  Tamil, Raw Speech Corpus, Speech Corpus
Redirect to external website
click here
Odia Raw Speech Corpus

Odia Raw Speech Corpus

Dataset Description 138:06:18 hours |  89 GB | 474 Speakers | 73,418 Audio segments | 48 kHz | 16 bit wav.Odia is an Indo-Aryan language; which is mainly spoken in the state...

Contributor:  CIIL Mysore
Tags:  Odia, Raw Speech Corpus, Speech Corpus
Redirect to external website
click here
Kashmiri Raw Speech Corpus

Kashmiri Raw Speech Corpus

Dataset Description 28:10:07 Hours | 18 GB speech data | 150 Speakers | 16,380 Audio segments | 48 kHz | 16 bit wav. Kashmiri Language belongs to Dardic group...

Contributor:  CIIL Mysore
Tags:  Kashmiri, Raw Speech Corpus, Speech Corpus
Redirect to external website
click here
Gujarati Raw Speech Corpus(Mono Recordings)

Gujarati Raw Speech Corpus(Mono Recordings)

Dataset Description 64:44:02 Hours | 7.1 GB | 233 Speakers| 26,223 Audio Segments | 16 kHz | 16 bit wav. Gujarati is one of the major literary languages of India and it is t...

Contributor:  CIIL Mysore
Tags:  Gujarati, Raw Speech Corpus, Mono Recordings, Speech Corpus
Redirect to external website
click here
Gujarati Raw Speech Corpus

Gujarati Raw Speech Corpus

Dataset Description57:17:08 Hours | 37 GB | 204 Speakers| 25,712 Audio Segments | 48 kHz | 16 bit wav. Gujarati is one of the major literary languages of India and it is the off...

Contributor:  CIIL Mysore
Tags:  Gujarati, Raw Speech Corpus, Speech Corpus
Redirect to external website
click here
Dogri Raw Speech Corpus

Dogri Raw Speech Corpus

Dataset Description 17:10:26 Hours | 11 GB speech data | 61 Speakers | 12,036 Audio segments | 48 kHz | 16 bit wav.    Dogri, the language ...

Contributor:  CIIL Mysore
Tags:  Dogri, Raw Speech Corpus, Speech Corpus
Redirect to external website
click here
Assamese Raw Speech Corpus

Assamese Raw Speech Corpus

Dataset Description  54:21:12 Hours | 32.5 GB | 304 Speakers | 37,570 Audio Segments | 48 kHz | 16 bit wav. Assamese is the official language of Assam.&nb...

Contributor:  CIIL Mysore
Tags:  Assamese, Raw Speech Corpus, Speech Corpus
Redirect to external website
click here
Tamil ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

Tamil ASR Challenge Data (ASR Speech Data released under 3rd Challenge) - NLTMP

The data set comprises of Tamil read and conversational speech data along with the corresponding transcriptions. This speech data was collected by Speech Lab IITM and several startups. The text data w...

Contributor:  ASR Consortia
Tags:  Tamil, ASR Challenge Data, ASR Speech Data, NLTM Pilot, Speech Corpus, Speech, Corpus
Redirect to external website
click here
Information
  • About NPLT
  • Privacy Policy
  • Return Policy
  • Terms & Conditions
  • MeitY Linguistic Resource Sharing Policy
Customer Service
  • Contact Us
  • Website Survey
  • Feedback
  • FAQs
  • Site Map
Imp Links
  • National Portal of India
  • MeitY
  • TDIL Programme
  • TDIL-DC
  • Language Technology Players
My Account
  • My Account
  • Order History
  • Save for Later
  • Newsletter
National Portal link
MeitY Website link
Digital India Website link
TDIL logo
CDAC logo

Copyright @ All Rights Reserved
National Platform for Language Technology © 2025