Cloud Tags
  • Ontology
  • Corpora
  • Text Corpora
  • Speech Corpora
  • Lexical Resources
  • Dictionary
  • Thesaurus
  • Term Bank
  • Linguistic Analysis
  • Phonetic Analysis
  • Phonological Analysis
  • Morphological Analysis
  • Syntactic Analysis
  • Semantic Analysis
  • Discourse Analysis
  • Formalism

Language Resource Development

1.Ontology- An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationship that hold among them. It is the hierarchical structuring of knowledge about things by subcategorizing them according to their essential (or at least relevant and/or cognitive) qualities.

2.Corpora- Corpora are the main knowledge base in corpus linguistics. Corpus is a large and structured set of texts, they are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules. These are part of computational linguistics, Speech recognition and machine translation needs analysis and processing of various types of corpora, to create part of speech tagging and morphs, semantics etc. Corpora have further structured levels of analysis; such corpora are usually called Treebanks or Parsed Corpora. Corpora is further divided as:

Text Corpora- A collection of writings used for linguistic analysis. Check Text Corpus available with us.
Speech Corpora- A collection of recorded remarks used for linguistic analysis. Speech Corpus for Assamese, Bangla, Hindi, Manipuri, Marathi and Punjabi are available for Indian Researchers.

3. Lexical Resources can be divided based on their nature and function as

Dictionary- A reference book containing an alphabetical list of words, with information given for each word, usually including meaning, pronunciation and etymology. There is no specified standard for creating Dictionary structure. XML is now mostly used and recommend way of creating structure as it is more logical and useful for creating web based dictionaries.
Thesaurus- A Thesaurus is a book of selected words or concepts, such a specialized vocabulary items of a particular field. It often contains synonyms, and other semantically related words including related and contrasting words and antonyms
Term Bank- A stock of terms used in a particular profession, subject, or style.

4. Linguistic Analysis - Linguistics is the study of the nature, structure, and variation of language or words, and the words are analyzed on the basis of phonetics, phonology, morphology, syntax, semantics, sociolinguistics, and pragmatics :

Phonetic Analysis- The sounds of speech production, combination and representation by written symbols.
Phonological Analysis- Study of speech sounds of a language with reference to distribution and patterning.
Morphological Analysis- Deals with root / base form of the word and the morphemes affixed to it.
Syntactic Analysis- It deals with grammatical analysis of sentences or discourse structure
Semantic Analysis- Concept-based analysis.
Discourse Analysis- Analysis of the discourse structure by using knowledge of the world.

5.Formalism - Refers to the syntax of language or well formed formulas of grammar, such that the inference rules can be derived for language processing. Principally a study of theoretical framework or syntax of language for computational linguistic analysis.