•    Freeware
  •    Shareware
  •    Research
  •    Localization Tools 20
  •    Publications 701
  •    Validators 2
  •    Mobile Apps 22
  •    Fonts 31
  •    Guidelines/ Draft Standards 3
  •    Documents 13
  •    General Tools 38
  •    NLP Tools 105
  •    Linguistic Resources 253

Search Results | Total Results found :   1188

You refine search by : All Results
  Catalogue
e-Aksharayan is a Desktop software for converting scanned printed Indian Language documents into a fully editable text format in Unicode encoding. Works on Windows 7,8, and 10.

Input and output specifications
• Works on Windows 7,8, and 10.
• The Software supports BMP,TIFF & PNG formats.
• Output formats supported are RTF,TXT,DOC.
• Gray level and black ’n’ white images can be given as input.
• Image dimensions up to 3500 × 3500 pixels.
• Minimum scanning resolution supported 300dpi.
• Maximum input skew supported 15degrees.
• Equipped with Unicode typing tool for typing in Indian Language
• Sakal Bharati font (11 Indian Language scripts in a Single font) is also provided.
• The system recognizes up to 5 pages at a time.

Added on August 1, 2018

184

  More Details
  • Contributed by : OCR Consortium
  • Product Type : General Tools
  • License Type : Freeware
  • System Requirement : Windows

e-Aksharayan is a Desktop software for converting scanned printed Indian Language documents into a fully editable text format in Unicode encoding. Works on Windows 7,8, and 10.

Input and output specifications
• Works on Windows 7,8, and 10.
• The Software supports BMP,TIFF & PNG formats.
• Output formats supported are RTF,TXT,DOC.
• Gray level and black ’n’ white images can be given as input.
• Image dimensions up to 3500 × 3500 pixels.
• Minimum scanning resolution supported 300dpi.
• Maximum input skew supported 15degrees.
• Equipped with Unicode typing tool for typing in Indian Language
• Sakal Bharati font (11 Indian Language scripts in a Single font) is also provided.
• The system recognizes up to 5 pages at a time.

Added on August 1, 2018

152

  More Details
  • Contributed by : OCR Consortium
  • Product Type : General Tools
  • License Type : Freeware
  • System Requirement : Windows

BIS standard "IS 16333 (Part 3)" defines the requirements for mobile handset for inputting of text in English, Hindi and at least one additional official Indian language along with facility of message readability in the phones for all 22 Indian official languages. So to help the mobile manufacturer in the internal verification and to check the effectiveness of language support, TDIL along with CDAC-GIST have prepared a robust test data covering relevant language Consonant (C), Vowels (V), Numerals (N), Matras(M), Halant (H), Diacritic(D), combinations of C, V, N, M, H, D along with word list and sentences. Test data, thus created can be used to test the inputting and display on the mobile handsets.
For best view download SakalBharati Font.

Added on July 27, 2018

240
0

  More Details
  • Contributed by : CDAC- GIST, TDIL
  • Product Type : Linguistic Resources
  • License Type : Freeware
  • System Requirement : Not Applicable

Authorship Identification is the task of identifying who wrote a given piece of text from a given set of candidate authors (suspects). The increasingly large volumes of texts on the Internet enhance the great yet urgent necessity for authorship identification. For this purpose, a large amount of work has already been done for the English language. Comparatively, less
research has been carried out for Indian regional languages such as Tamil, Telugu, Bengali and Punjabi whereas no such experiment is available for Marathi.

Added on July 17, 2018

63

  More Details
  • Contributed by : Individual
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Kale Sunil Digamberrao ,Dr. Rajesh S. Prasad

This is Author wise Marathi Language text corpus for research purpose that includes research area as Data or Text Mining that includes but not limited to Author Identification, Author Profiling, Sentiment Analysis, Text Summarization etc on Marathi Language text. There are two datasets based on two categories one is comedy articles and another category is mixed articles i.e. composed of comedy, novels, Lalit lekhan etc. of well-known authors in Marathi.

Dataset–I is a collection of articles on category comedy by 5 different authors. A file for each author is prepared which contain all articles by that author i.e. dataset-I contain 5 files with the file name as author name. A number of words by each author is ranging from minimum 7006 and maximum 10,411 words.
Dataset–II is composed of articles of the mixed category. In total 10 different authors with minimum 26874 and maximum 33722 words. A file for each author is prepared which contain all articles by that author with the file name as the name of the author name. These files are encoded with UTF 8 encoding.

This resource is contributed by Mr. Sunil Digamberrao Kale & Dr. Rajesh S. Prasad

Disclaimer: The responsibility of performance of this resource/tool lies with the contributor.

Last updated on August 1, 2018

1
31

  More Details
  • Contributed by : Kale Sunil Digamberrao Dr. Rajesh S. Prasad
  • Product Type : Text Corpora
  • License Type : Research
  • System Requirement : Not Applicable