CALL FOR PARTICIPATION

Evaluation of Corpus Based TTS Built on Common Databases
Indian Language TTS Development Challenge, 2020

Hema A Murthy, Pranaw Kumar

( hema@cse.iitm.ac.in, pranaw@cdac.in )

As a part of the National Translation Mission funded by MeitY, Govt of India, IIT Madras and CDAC Mumbai are jointly organising a challenge for the "Development of Text to Speech Synthesis (TTS) for Hindi and Tamil". It aims towards helping and encouraging the advancement of TTS in Indian Languages. The basic challenge is to take the released speech data, build TTS voices, and share the voice in web API form for evaluation. The output from each synthesizer is then evaluated through extensive listening tests.

CHALLENGE OVERVIEW

Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more attention in recent times. Recent advances on speech synthesis has shown that TTS systems can produce very natural sounding speech from text. All around the world, TTS systems are built using various approaches. In India speech research community has grown significantly in the present time, and one can witness the current speech revolution. It is necessary to understand and compare the various research techniques used to build Indian Language TTS systems. Primary objective of this challenge is -- understanding and comparing the various approaches to build TTS and simultaneously identifying efficient speech groups in the country.

Good quality studio recorded speech data with very accurate transcript is required to build a high quality TTS system. However, when it comes to Indian languages, not everyone, especially academic institutions and startups, have access to these resources. As a part of this challenge, we will be releasing good quality training data for Hindi and Tamil. Everyone who participates in this challenge will then be free to use this data for research purposes.

Main Task

About 5 hours of speech data in each of Hindi Male, Hindi Female, Tamil Male and Tamil Female, recorded by native professional speakers in high quality studio environments, and corresponding Text in UTF-8 format will be provided. No other information, such as segment labels, will be provided. Participants may build one voice for one or more subtask and submit for evaluation in web API form (as mentioned below). The subtasks are numbered as follows:

  • 2020-ILTTS Hindi Male
  • 2020-ILTTS Hindi Female
  • 2020-ILTTS Tamil Male
  • 2020-ILTTS Tamil Female

Submission of System for Evaluation

After building a system participants will have to put it on server and provide us url of GUI, where text can be synthesised. Organisers will use this url to prepare test data for listening test. This url may be public or private (only access to organiser).

It is not permissible for a single participant to submit multiple entries to any subtask (mentioned above), because the listening test may otherwise become unmanageable. This rule may be relaxed in the event of a small number of participants.

Write Up

Along with the built system participant will have to submit a write up (one page or two page) about the entry mentioning approach, technology, data, challenge faced, features, observations, etc.

Listening Test

The organisers will conduct a DMOS (“degradation” or “differential” MOS test ) listening test to evaluate the submitted system.

Benefits of Participation

Those who would perform well in this challenge may get the following opportunity:

  • Some seed funding to develop product/solution using the technologies developed under this project
  • MHRD and MeitY are planning to engage some agencies to do Speech to Speech translation. Good performers would get priority in this process.
  • Opportunity to participate in the next phase of the project.

Registration

Interested parties should register as soon as possible, by using the below link:

REGISTER NOW

You need to provide the following information in a form available at the above link:

  • Preferred team name - the organisers may adjust this so that all teams have meaningful, unique names
  • Affiliation - the name of your University and lab, or your Company
  • Contact details:
  • main contact person's email address - should be an institutional email address
  • backup email address (es)
  • postal address
  • phone number
  • You should only register for the challenge if you actually intend to submit an entry to the challenge and to comply with all the rules/guidelines mentioned.

Registration Fee

There is no registration fee.

Pre-Qualification

It is expected that a participating member must have some experience of building TTS in Indian Languages. After registration please send sample output of the synthesizer already built by your group (4 sentences), along with corresponding text to pranaw@cdac.in with cc to hema@cse.iitm.ac.in. Please mention the name of the language also.

Your registration would be confirmed based on evaluation of the submitted synthesizer output.

Provisional Timelines

Date / Month Event
27th July 2020 Announcement of challenge
15 August 2020 Last date of registration
As soon as registration is confirmed Database release
30th September 2020 Submission of system for evaluation (by midnight PDT)
October 2020 Evaluation of system
November 2020 Release of results

Licenses

The license for the released data will be shared to the participants. Data will be released to each participant once the appropriate license has been agreed to.

Development tools and Other Resources

Development tools, useful scripts, and other resources helpful in developing Indian language TTS systems are available at the below website:

https://www.iitm.ac.in/donlab/tts/

These may be helpful during development. This is just for reference, participants are free to use any tool or technologies for building voice.

Use of External Data

  • "External data" is defined as data, of any type, that is not part of the provided database.
  • You are allowed to use external data in any way you wish, subject to any exclusions given in these rules
  • Use of external data is entirely optional and is not compulsory
  • You must use the provided audio files
  • You must not use any additional speech data from the same speakers
  • You may exclude any parts of the provided databases if you wish.
  • Use of any provided segmentations, transcriptions or labels is optional.
  • If you have any doubt about how to apply these rules, please contact the organizers immediately.

How are these rules/guidelines enforced?

This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.

Contact US!

For further information please contact pranaw@cdac.in with cc to hema@cse.iitm.ac.in.

Pranaw Kumar (Mob) : +91-7303226768