An ontology based fully automatic document classification system using an existing semi-automatic system

dc.audienceAudience::Classification and Indexing Section
dc.conference.sessionTypeClassification and Indexing
dc.contributor.authorWijewickrema, Chaaminda Manjula
dc.contributor.authorGamage, Ruwan
dc.contributor.translatorZhang, Shinan
dc.contributor.translatorJiménez Huerta, Pascual
dc.date.accessioned2025-09-24T08:02:32Z
dc.date.available2025-09-24T08:02:32Z
dc.date.issued2013
dc.description.abstractAutomatic classification of documents has become an important research area due to the exponential growth of digital content and because manual or semi-automatic organization is not effective. On one hand, manual and semi-automatic classification is very painstaking and labor-intensive. On the other hand, misclassifications due to vagueness of the documents and classification schemes are inevitable in these two methods. Hence, the current study sought to shed a light on these issues. This research proposes an automated system that can completely classify a given text document by minimizing the vocabulary ambiguities. One of our previous studies has developed a semi-automatic system for document classification and here we propose to extend it furthermore to obtain a fully automatic document classification system.en
dc.description.abstract由于数字内容的指数增长和手动组织、半自动组织的非高效性,文档自动分类已经成为一个重要的研究领域。一方面,手动和半自动分类需耗费大量精力并且是劳动密集型,另一方面,这两种方法中由于文档的模糊性和分类表所带来的误分类不可避免。 因此,本研究试图解决这些问题。本研究提出一个自动化系统,这个自动化系统完全可以通过最小化词汇歧义为一个给定的文本文档进行分类。我们前期已经开发了一个半自动文档分类系统,这里对其进一步优化以获得一个全自动的文档分类系统。zh
dc.description.abstractLa clasificación automática de documentos se ha convertido en un área de investigación muy importante, debido al crecimiento exponencial de los contenidos digitales y a que la organización manual o semiautomática no es especialmente eficaz. Por una parte, la clasificación manual y semiautomática es muy minuciosa y laboriosa. Por otro lado, son inevitables en estos dos métodos los errores de clasificación debidos a las imprecisiones de los documentos y de los esquemas de clasificación. Por tanto, el presente estudio trata de arrojar luz sobre estas cuestiones. Esta investigación propone un sistema automatizado que pueda realizar una clasificación completa de un documento de texto minimizando las ambigüedades del vocabulario. Uno de nuestros estudios anteriores ha desarrollado un sistema semiautomático para la clasificación de documentos y aquí proponemos extenderlo algo más, para obtener un sistema de clasificación de documentos totalmente automático. Palabras clave: Clasificación automática, clasificación textual, Ontología, función de frecuencia de término -tf idfes
dc.identifier.citationAbbas, M., Smaïli, K., & Berkani, D. (2010). Efficiency of TR-Classifier versus TFIDF. 2010 First International Conference on Integrated Intelligent Computing. doi: 10.1109/ICIIC.2010.60 Best, B. J., Nathan, G., & Lebiere, C. (2010). Extracting the Ontological Structure of OpenCyc for Reuse and Portability of Cognitive Models. Proceedings of the 19th Conference on Behavior Representation in Modeling & Simulation (BRiMS 2010). Retrieved from http://www.adcogsys.com/pubs/Brims2010-best-gerhart-lebiere-opencyc.pdf Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1990). Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography. Retrieved from http://courses.media.mit.edu/2002fall/mas962/MAS962/miller.pdf Morato, J., Marzal, M. A., Llorens, J., & Moreiro, J. (2004). WordNet Applications. Proceedings of the 2nd International Conference on Global WordNet. Retrieved from http://www.fi.muni.cz/gwc2004/proc/105.pdf Prabowo, R., Jackson, M., Burden, P., & Knoell, H. D. (2002). Ontology-Based Automatic Classification for the Web Pages: Design, Implementation and Evaluation. Proceedings of the 3rd International Conference on Web Information Systems Engineering. Retrieved from http://portal.acm.org/citation.cfm?id=674083 Prevot, L., Borgo, S., & Oltramari, A. (2005). Interfacing Ontologies and Lexical Resources. Proceedings of OntoLex 2005. Retrieved from http://www.loa-cnr.it/Papers/%5B22%5DprevotBorgoOltramari-3.pdf Song, M. H., Lim, S. Y., Kang, D. J., & Lee, S. J. (2005). Automatic Classification of Web Pages based on the Concept of Domain Ontology. Proceedings of the 12th Asia-Pacific Software Engineering Conference (APSEC’05), 645-651. doi: 10.1109/APSEC.2005.46 Tenenboim, L., Shapira, B., & Shoval, P. (2008). Ontology-based Classification of News in an Electronic Newspaper. Proceedings of the International Conference on Intelligent Information and Engineering Systems. Retrieved from http://www.foibg.com/ibs_isc/ibs-02/IBS-02-p12.pdf Valitutti, A., Strapparava, C., & Stock, O. (2004). Developing Affective Lexical Resources. PsychNology Journal, 2(1), 61-83. Retrieved from http://www.psychnology.org/File/PSYCHNOLOGY_JOURNAL_2_1_VALITUTTI.pdf Wijewickrema, P. K. C. M. & Gamage, R. C. G. (2012). Automatic Document Classification Using a Domain Ontology, Proceedings of the 09th National Conference on Library and Information Science (NACLIS 2012), ISSN 978-955-9075-17-2, 85-107. Wijewickrema P. K. C. M. & Gamage, R. C. G. (2012). An enhanced text classifier for automatic document classification, Journal of the University Librarians’ Association of Sri Lanka, 16 (2), ISSN 1391-4081, 138-159.
dc.identifier.relatedurlhttp://2013.ifla.org
dc.identifier.urihttps://repository.ifla.org/handle/20.500.14598/5119
dc.language.isoes
dc.rightsAttribution 3.0 Unported
dc.rights.accessRightsopen access
dc.rights.urihttps://creativecommons.org/licenses/by/3.0/
dc.subject.keywordAutomatic classification
dc.subject.keywordText classification
dc.subject.keywordOntology
dc.subject.keywordtf-idf weight function
dc.titleAn ontology based fully automatic document classification system using an existing semi-automatic systemen
dc.title利用现有的半自动分类系统开发基于本体的全自动文档分类系统zh
dc.titleUna ontología basada en un sistema de clasificación de documentos totalmente automático utilizando un sistema semiautomático existentees
dc.typeArticle
ifla.UnitSection:Classification and Indexing Section
ifla.oPubIdhttps://library.ifla.org/id/eprint/159/

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
112-wijewickrema-en.pdf
Size:
348.54 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
112-wijewickrema-zh.pdf
Size:
752.01 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
112-wijewickrema-es.pdf
Size:
656.85 KB
Format:
Adobe Portable Document Format