English Monolingual Chunked Text Corpus ILCI

Contributor: ILCI Consortia
Product Code: ILCI-ENG-MONO-TEXT-316

Available Under License: Commercial Research

Sample Download | size: 0B | type: zip

Added on : 29 Jul 2020

Under the Indian Languages Corpora Initiative phase –II (ILCI Phase-II) project, initiated by the MeitY, Govt. of India, Jawaharlal Nehru University, New Delhi had collected monolingual corpus in English. This is the final outcome of the project and there are 30,000 sentences of general domain. The translated sentences have been Chunked tagged according to BIS (Bureau of Indian Standards) tagset. This corpus has following features: unique ID, UTF-8 encoding, and text file format.

Text Corpus Attributes
Language	English
Parallel or Monolingual	Monolingual
Annotation	Chunked Tagged
No. of Sentences	30000 Sentences
Word-Count	640681
File Format	Text File
Encoding	UTF-8
File Size	1.91MB

Tags: English, Monolingual, Chunked Tagged, Text Corpus, ILCI

Write a review