•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 696
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 251

Search Results | Total Results found :   1181

You refine search by : All Results
  Catalogue
BIS standard "IS 16333 (Part 3)" defines the requirements for mobile handset for inputting of text in English, Hindi and at least one additional official Indian language along with facility of message readability in the phones for all 22 Indian official languages. So to help the mobile manufacturer in the internal verification and to check the effectiveness of language support, TDIL along with CDAC-GIST have prepared a robust test data covering relevant language Consonant (C), Vowels (V), Numerals (N), Matras(M), Halant (H), Diacritic(D), combinations of C, V, N, M, H, D along with word list and sentences. Test data, thus created can be used to test the inputting and display on the mobile handsets.
For best view download SakalBharati Font.

Added on July 27, 2018

135
0

  More Details
  • Contributed by : CDAC- GIST, TDIL
  • Product Type : Linguistic Resources
  • License Type : Freeware
  • System Requirement : Not Applicable

Authorship Identification is the task of identifying who wrote a given piece of text from a given set of candidate authors (suspects). The increasingly large volumes of texts on the Internet enhance the great yet urgent necessity for authorship identification. For this purpose, a large amount of work has already been done for the English language. Comparatively, less
research has been carried out for Indian regional languages such as Tamil, Telugu, Bengali and Punjabi whereas no such experiment is available for Marathi.

Added on July 17, 2018

57

  More Details
  • Contributed by : Individual
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Kale Sunil Digamberrao ,Dr. Rajesh S. Prasad

This is Author wise Marathi Language text corpus for research purpose that includes research area as Data or Text Mining that includes but not limited to Author Identification, Author Profiling, Sentiment Analysis, Text Summarization etc on Marathi Language text. There are two datasets based on two categories one is comedy articles and another category is mixed articles i.e. composed of comedy, novels, Lalit lekhan etc. of well-known authors in Marathi.

Dataset–I is a collection of articles on category comedy by 5 different authors. A file for each author is prepared which contain all articles by that author i.e. dataset-I contain 5 files with the file name as author name. A number of words by each author is ranging from minimum 7006 and maximum 10,411 words.
Dataset–II is composed of articles of the mixed category. In total 10 different authors with minimum 26874 and maximum 33722 words. A file for each author is prepared which contain all articles by that author with the file name as the name of the author name. These files are encoded with UTF 8 encoding.

This resource is contributed by Mr. Sunil Digamberrao Kale & Dr. Rajesh S. Prasad

Disclaimer: The responsibility of performance of this resource/tool lies with the contributor.

Last updated on August 1, 2018

0
4

  More Details
  • Contributed by : Kale Sunil Digamberrao Dr. Rajesh S. Prasad
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable

This corpus contains 1077 audio files of Tamil language of 1073 speakers and transcriptions folder which contains the .lab transcription files for each audio file. This data was prepared for Agricultural Commodity domain and Size of this corpus is 5.4 GB.

Added on July 3, 2018

0
38

  More Details
  • Contributed by : ASR Consortium
  • Product Type : Speech Corpora
  • License Type : Research
  • System Requirement : Not Applicable

This corpus contains the more than 62000 audio files of Tamil language of 1000 speakers, .dic file which contains word and its corresponding phonetic representation and transcription text file listing the transcription for each audio file. This data was prepared for Agricultural Commodity domain and Size of this corpus is 5.7 GB.

Added on July 3, 2018

0
20

  More Details
  • Contributed by : ASR Consortium
  • Product Type : Speech Corpora
  • License Type : Research
  • System Requirement : Not Applicable