Lessons learned from Automatic Indexing Projects regarding to Persian Language Specifications
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Persian reading and writing are associated with some difficulties due to specific features of this language. this paper attempts to examine automated indexing experiences, lessons, and outcomes of Persian language documents to provide effective solutions for improvement of indexing and retrieval of them. The most important problems in Persian language and script in automatic indexing include selection of an appropriate keyword, building a vocabulary, Semantic, Verb and word sense ambiguities in the sentences, Spaces and Pseudo-spaces in Persian script, isolated and cursive writing, morphology of Persian language, typographical and spelling errors. Removing the stop words, pre-processing of characters and script, identifying the boundaries of words, equalizing different spellings, the automatic stemming, Weighting and scoring of words, Detection of phrasal verbs and compound phrases, Spellchecking through creating morphological or even syntactic spellcheckers design of a corrector and proposer system , developing an infrastructure database for Persian language and script usage are solutions proposed to facilitate the automatic indexing of Persian texts.
Description
Keywords
Citation
Akasereh, M.; Savoy, J. (2012) Retrieval effectiveness study with Farsi language presented at Actes 9e Conférence en Recherche d’Information et Applications CORIA’12. Retrieved on May 1,2018: https://pdfs.semanticscholar.org/63bf/36734f480ad184823703a40a1486d312bf39.pdf
Bashiri, H; Karbalaei, fatemeh and Mousavi, Shirin (2005) Design and evaluation of Persian indexers automatically, A Poster presented at 11th Annual Conference of Iranian Computer Society. Retrieved on May 9, 2018: https://www.civilica.com/Paper-ACCSI11-ACCSI11_100.html
Danesh, M.; Minaei, B., and Kashefi, O. (2013) A Distributed N-Gram Indexing System to Optimizing Persian Information Retrieval, International Journal of Computer Theory and Engineering, 5(2), 214-22, DOI: 10.7763/IJCTE.2013.V5.681
Dolamic, L.; Savoy, J. Ad (2009) Hoc Retrieval with the Persian Language. Retrieved on May 1,2018: https://pdfs.semanticscholar.org/a1a2/43b19aa31c48c8380acffa15a828df8ccb2b.pdf
Doulani, A.; Farhadpour, M.(2009) Automatic indexing and common softwares: a review , National Studies on Librarianship and Information Organization (NASTINFO),20(3),291-310. Retrieved on May 1, 2018: http://nastinfo.nlai.ir/article_259_9922d1d7321ceafaf0ef78afed481a14.pdf
Farsi Concept Master Plan (2009) Research Project, Supreme Council of Information and Iran University of Science and technology.
Moradi Moqadam, H. (2009) Indexing auto-extraction with a glance over indexing problems in Persian language, Journal of studies in library and Information Science,3(2), 135-168.
Parseh, S. & Baraani, A. (2014) Improving Persian Document Classification Using Semantic Relations between Words. Retrieved on May 9,2018: https://arxiv.org/ftp/arxiv/papers/1412/1412.8147.pdf
Rasouli, M.S.; Minaei Bidgoli, B. (2008) A new way to spell mistakes in Farsi, presented at 2nd Iranian Data Mining Conference. Retrieved on May 9,2018: https://www.civilica.com/Paper-IDMC02-IDMC02_026.htm1
Sediqi, M.; Zamani far, K. and Sahidi, M. (2005) A way to resolve the language challenges of Persian-language web sites, Journal of Information Science and Technology, 21(2),47-69. Retrieved on May 1,2018: http://jipm.irandoc.ac.ir/article-1-97-fa.pdf
Sotudeh, H., Honarjooyan, Z. (2012) An overview of the difficulties of Persian language in the digital environment and their effects on the effectiveness of automatic text processing and data retrieval, Library and Information Science, 15(4),92-59. Retrieved on May 1,2018: http://lis.aqr-libjournal.ir/article_42651_05011294ef5c5cbaa8986071ca229087.pdf
Tashakori, M. and Meybodi, M. (2003) Build an auto indexer for Persian texts, Presented at 11th International Electrical Conference. Retrieved on May 9, 2018: https://www.civilica.com/Paper-ICEE11-ICEE11_018.html
Tavakolizadeh-Ravari, M. (2015) Two Steps Break-Cull Model for Automatic Indexing of Persian Texts, Research on Information Science & Public Libraries,21(1),13-40. Retrieved on May 9,2018: http://publij.ir/article-1-966-en.html
Varedi, M. (2010 (Problems and Disadvantages, Retrieved on May 9,2018: http://islamicdoc.org/article-section/articles-namayeh/602-namayeh-mashin.html