Lessons learned from Automatic Indexing Projects regarding to Persian Language Specifications

Ghorbani, Mahboubeh; Torkashvand, Fattaneh

Lessons learned from Automatic Indexing Projects regarding to Persian Language Specifications

Files

115-ghorbani-en.pdf (197.24 KB)

Date

2018

Authors

Ghorbani, Mahboubeh

Torkashvand, Fattaneh

Abstract

Persian reading and writing are associated with some difficulties due to specific features of this language. this paper attempts to examine automated indexing experiences, lessons, and outcomes of Persian language documents to provide effective solutions for improvement of indexing and retrieval of them. The most important problems in Persian language and script in automatic indexing include selection of an appropriate keyword, building a vocabulary, Semantic, Verb and word sense ambiguities in the sentences, Spaces and Pseudo-spaces in Persian script, isolated and cursive writing, morphology of Persian language, typographical and spelling errors. Removing the stop words, pre-processing of characters and script, identifying the boundaries of words, equalizing different spellings, the automatic stemming, Weighting and scoring of words, Detection of phrasal verbs and compound phrases, Spellchecking through creating morphological or even syntactic spellcheckers design of a corrector and proposer system , developing an infrastructure database for Persian language and script usage are solutions proposed to facilitate the automatic indexing of Persian texts.

Keywords

URI

https://repository.ifla.org/handle/20.500.14598/6363

Collections

World Library and Information Congress (WLIC) Papers and Presentations

Full item page

Lessons learned from Automatic Indexing Projects regarding to Persian Language Specifications

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections