TTS Challenge

CHALLENGE OVERVIEW

Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more attention in recent times. Recent advances on speech synthesis has shown that TTS systems can produce very natural sounding speech from text. All around the world, TTS systems are built using various approaches. In India speech research community has grown significantly in the present time, and one can witness the current speech revolution. It is necessary to understand and compare the various research techniques used to build Indian Language TTS systems. Primary objective of this challenge is -- understanding and comparing the various approaches to build TTS and simultaneously identifying efficient speech groups in the country.

Good quality studio recorded speech data with very accurate transcript is required to build a high quality TTS system. However, when it comes to Indian languages, not everyone, especially academic institutions and startups, have access to these resources. As a part of this challenge, we will be releasing good quality training data for Hindi and Tamil. Everyone who participates in this challenge will then be free to use this data for research purposes.

Main Task

About 5 hours of speech data in each of Hindi Male, Hindi Female, Tamil Male and Tamil Female, recorded by native professional speakers in high quality studio environments, and corresponding Text in UTF-8 format will be provided. No other information, such as segment labels, will be provided. Participants may build one voice for one or more subtask and submit for evaluation in web API form (as mentioned below). The subtasks are numbered as follows:

2020-ILTTS Hindi Male
2020-ILTTS Hindi Female
2020-ILTTS Tamil Male
2020-ILTTS Tamil Female

Submission of System for Evaluation

After building a system participants will have to put it on server and provide us url of GUI, where text can be synthesised. Organisers will use this url to prepare test data for listening test. This url may be public or private (only access to organiser).

It is not permissible for a single participant to submit multiple entries to any subtask (mentioned above), because the listening test may otherwise become unmanageable. This rule may be relaxed in the event of a small number of participants.

Write Up

Along with the built system participant will have to submit a write up (one page or two page) about the entry mentioning approach, technology, data, challenge faced, features, observations, etc.

Listening Test

The organisers will conduct a DMOS (“degradation” or “differential” MOS test ) listening test to evaluate the submitted system.

Benefits of Participation

Those who would perform well in this challenge may get the following opportunity:

Some seed funding to develop product/solution using the technologies developed under this project
MHRD and MeitY are planning to engage some agencies to do Speech to Speech translation. Good performers would get priority in this process.
Opportunity to participate in the next phase of the project.

Registration

Interested parties should register as soon as possible, by using the below link:

You need to provide the following information in a form available at the above link:

Preferred team name - the organisers may adjust this so that all teams have meaningful, unique names
Affiliation - the name of your University and lab, or your Company
Contact details:
main contact person's email address - should be an institutional email address
backup email address (es)
postal address
phone number
You should only register for the challenge if you actually intend to submit an entry to the challenge and to comply with all the rules/guidelines mentioned.

Registration Fee

There is no registration fee.

Pre-Qualification

It is expected that a participating member must have some experience of building TTS in Indian Languages. After registration please send sample output of the synthesizer already built by your group (4 sentences), along with corresponding text to pranaw@cdac.in with cc to hema@cse.iitm.ac.in. Please mention the name of the language also.

Your registration would be confirmed based on evaluation of the submitted synthesizer output.

Provisional Timelines

Date / Month	Event
27th July 2020	Announcement of challenge
15 August 2020	Last date of registration
As soon as registration is confirmed	Database release
30th September 2020	Submission of system for evaluation (by midnight PDT)
October 2020	Evaluation of system
November 2020	Release of results

Licenses

The license for the released data will be shared to the participants. Data will be released to each participant once the appropriate license has been agreed to.

Development tools and Other Resources

Development tools, useful scripts, and other resources helpful in developing Indian language TTS systems are available at the below website:

https://www.iitm.ac.in/donlab/tts/

These may be helpful during development. This is just for reference, participants are free to use any tool or technologies for building voice.

Use of External Data

"External data" is defined as data, of any type, that is not part of the provided database.
You are allowed to use external data in any way you wish, subject to any exclusions given in these rules
Use of external data is entirely optional and is not compulsory
You must use the provided audio files
You must not use any additional speech data from the same speakers
You may exclude any parts of the provided databases if you wish.
Use of any provided segmentations, transcriptions or labels is optional.
If you have any doubt about how to apply these rules, please contact the organizers immediately.

How are these rules/guidelines enforced?

This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.

CALL FOR PARTICIPATION

Hema A Murthy, Pranaw Kumar