Between two worlds: harmonizing automated and manual term labelling

dc.audienceAudience::Knowledge Management Section
dc.audienceAudience::Digital Humanities – Digital Scholarship Special Interest Group
dc.conference.date22 August 2019
dc.conference.placeCorfu, Greece
dc.conference.sessionTypeKnowledge Management with Digital Humanities/Digital Scholarship
dc.conference.titleArtificial Intelligence (AI) and its impact on libraries and librarianship
dc.conference.venueIonian University
dc.contributor.authorSfakakis, Michalis
dc.contributor.authorZoutsou, Kyriaki
dc.contributor.authorPapachristopoulos, Leonidas
dc.contributor.authorTsakonas, Giannis
dc.contributor.authorPapatheodorou, Christos
dc.date.accessioned2025-09-24T09:13:45Z
dc.date.available2025-09-24T09:13:45Z
dc.date.issued2017
dc.description.abstractIn the era of enormous information production human capabilities have reached their limits. The need for automatic information processing which would not be incommensurate to human sophistication seems to be more than imperative. Information scientists have focused on the development of techniques and processes that would assist human contribution while improve, or at least guarantee, information quality. Automatic indexing techniques may lay on various approaches offering different results in information retrieval. In this paper, we introduce an automated methodology for subject analysis, including both the determination of the aboutness of the documents and the translation of the related concepts to system terms. Focusing on a corpus consisting of articles related to the Digital Library Evaluation domain, topic modeling algorithms are utilized for the aboutness of the documents, while the context of the words in topics, as captured by Word Embeddings, are used for the translation of the extracted topics to EuroVoc concepts.en
dc.identifier.citationAfiontzi, E., Kazadeis, G., Papachristopoulos, L., Sfakakis, M., Tsakonas, G., & Papatheodorou, C. (2013). Charting the Digital Library Evaluation Domain with a Semantically Enhanced Mining Methodology. Proceedings of the 13th ACMIEEECS Joint Conference on Digital Libraries, 125–134. Retrieved from https://doi.org/10.1145/2467696.2467713 Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022. Brown, K., & Barrière, C. (2006). Indexing, Automatic. Encyclopedia of Language & Linguistics, 603–610. https://doi.org/10.1016/B0-08-044854-2/00963-9 Chu, C. M., & Ajiferuke, I. (1989). Quality of indexing in library and information science databases. Online Review, 13(1), 11–35. Dunham, G. S., Pacak, M. G., & Pratt, A. W. (1978). Automatic indexing of pathology data. Journal of the American Society for Information Science, 29(2), 81–90. https://doi.org/10.1002/asi.4630290207 Fox, C. (1989). A stop list for general text. ACM SIGIR Forum, 24(1–2), 19–21. https://doi.org/10.1145/378881.378888 Fuhr, N., Tsakonas, G., Aalberg, T., Agosti, M., Hansen, P., Kapidakis, S., … Solvberg, I. (2007). Evaluation of Digital Libraries. Int. J. Digit. Libr., 8(1), 21–38. https://doi.org/10.1007/s00799-007-0011-z Hjørland, B. (2001). Towards a theory of aboutness, subject, topicality, theme, domain, field, content and relevance. Journal of the American Society for Information Science and Technology, 52(9), 774–778. https://doi.org/10.1002/asi.1131 Lau, J. H., Grieser, K., Newman, D., & Baldwin, T. (2011). Automatic labelling of topic models. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, 1536–1545. Lau, J. H., Newman, D., Karimi, S., & Baldwin, T. (2010). Best topic word selection for topic labelling. 605–613. Retrieved from http://dl.acm.org/citation.cfm?id=1944566.1944635 Li, Y., Xu, L., Tian, F., Jiang, L., Zhong, X., & Chen, E. (2015). Word embedding revisited: A new representation learning and explicit matrix factorization perspective. IJCAI International Joint Conference on Artificial Intelligence, 2015–Janua(Ijcai), 3650–3656. Magatti, D., Calegari, S., Ciucci, D., & Stella, F. (2009). Automatic labeling of topics. 2009 Ninth International Conference on Intelligent Systems Design and Applications, 1227–1232. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2018). Advances in Pre-Training Distributed Word Representations. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018). Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 3111–3119. Mimno, D. (2018). jsLDA: An implementation of latent Dirichlet allocation in javascript. Retrieved February 10, 2015, from https://github.com/mimno/jsLDA Névéol, A., Shooshan, S. E., Humphrey, S. M., Mork, J. G., & Aronson, A. R. (2009). A recent advance in the automatic indexing of the biomedical literature. Journal of Biomedical Informatics, 42(5), 814–823. https://doi.org/10.1016/J.JBI.2008.12.007 Papachristopoulos, L., Kleidis, N., Sfakakis, M., Tsakonas, G., & Papatheodorou, C. (2015). Discovering the Topical Evolution of the Digital Library Evaluation Community. In E. Garoufallou, R. Hartley, & P. Gaitanou (Eds.), Metadata and Semantics Research SE - 9 (pp. 101–112). https://doi.org/10.1007/978-3-319-24129-6_9 Papachristopoulos, L., Tsakonas, G., Sfakakis, M., Kleidis, N., & Papatheodorou, C. (2016). The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain. https://doi.org/10.1007/978-3-319-43997-6_19 Publications Office of the European Union. (2015). EuroVoc thesaurus Volume 1 Alphabetical version Part B. Retrieved from http://europa.eu Pulgarı́n, A., & Gil-Leiva, I. (2004). Bibliometric analysis of the automatic indexing literature: 1956–2000. Information Processing & Management, 40(2), 365–377. https://doi.org/10.1016/S0306-4573(02)00101-2 Thellefsen, T. L., Brier, S., & Thellefsen, M. L. (2003). Problems concerning the process of subject analysis and the practice of indexing. Semiotica, 2003(144), 177–218. https://doi.org/10.1515/semi.2003.022
dc.identifier.relatedurlhttps://2019.ifla.org/conference-programme/satellite-meetings/
dc.identifier.urihttps://repository.ifla.org/handle/20.500.14598/6718
dc.language.isoen
dc.rightsAttribution 4.0 International
dc.rights.accessRightsopen access
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject.keywordSubject Indexing
dc.subject.keywordSimilarity Measures
dc.subject.keywordText Classification
dc.subject.keywordMachine Learning
dc.subject.keywordWord Embedding
dc.titleBetween two worlds: harmonizing automated and manual term labellingen
dc.typeArticle
ifla.UnitSection:Knowledge Management Section
ifla.UnitSection::Digital Humanities – Digital Scholarship Special Interest Group
ifla.oPubIdhttps://library.ifla.org/id/eprint/2759/

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
s02-2019-sfakakis-en.pdf
Size:
255.2 KB
Format:
Adobe Portable Document Format