To collect the test data for Indian Language search engine, the prime source of information is from the internet. Primarily in search keywords are the test data that has to be fed into the search engine. Search keywords can be collected from various Indian language web pages. These search keywords can be used to evaluate Indian language search engine.
Total 4735 search keywords prepared for nine languages namely Assamese, Bengali, Gujarati, Hindi, Marathi, Odia, Punjabi, Tamil and Telugu across following categories-
3. NER & Acronyms
4. Data Integrity (Normalization, Spelling Variation)
5. Grammatical Forms handling (Singular/Plural, Lemmatizer, Synonyms, Spell checker)
6. Ranking (Single Best Target)
Domain Covered: Tourism