•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 738
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 265
Vulnerability of voice biometrics systems to spoofing attacks by Synthetic Speech (SS) and Voice Converted (VC) speech have arose the need of standalone Spoofed Speech Detection (SSD) systems. The present work is an extension of our previously proposed features (used in relatively best performing SSD system) at the first ASVspoof 2015 challenge held at INTERSPEECH 2015. For the challenge, the authors proposed novel features based on Cochlear Filter Cepstral Coefficients (CFCC) and Instantaneous Frequency (IF), i.e., CFCCIF. The basic motivation behind this is that human ear processes speech in subbands. The envelope of each subband and its IF is important for perception of speech. In addition, the transient information also adds to the perceptual information that is captured. We observed that subband energy variations across CFCCIF when estimated by symmetric difference (CFCCIFS) gave better discriminative properties than CFCCIF. The features are extracted at frame-level and Gaussian Mixture Model (GMM)-based classification system was used. Experiments were conducted on ASVspoof 2015 challenge database with MFCC, CFCC, CFCCIF and CFCCIFS features. On the evaluation dataset, after score-level fusion with MFCC, the CFCCIFS features gave an overall Equal Error Rate (EER) of 1.45 % as compared to 1.87 % and 1.61 % with CFCCIF and CFCC, respectively. In addition to detecting the known and unknown attacks, intensive experiments have been conducted to study the effectiveness of the features under the condition that either only SS or only VC speech is available for training. It was observed that when only VC speech is used in training, both VC, as well as SS, can be detected. However, when only SS is used in training, VC speech was not detected. In general, amongst vocoder-based spoofs, it was observed that VC speech is relatively difficult to detect than SS by the SSD system. However, vocoder-independent SS was toughest with highest EER (i.e., > 10 %).

Added on April 15, 2020


  More Details
  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Tanvina B. Patel and Hemant A. Patil
Similar / Suggested Resources