Between two worlds: harmonizing automated and manual term labelling
dc.audience | Audience::Knowledge Management Section | |
dc.audience | Audience::Digital Humanities – Digital Scholarship Special Interest Group | |
dc.conference.date | 22 August 2019 | |
dc.conference.place | Corfu, Greece | |
dc.conference.sessionType | Knowledge Management with Digital Humanities/Digital Scholarship | |
dc.conference.title | Artificial Intelligence (AI) and its impact on libraries and librarianship | |
dc.conference.venue | Ionian University | |
dc.contributor.author | Sfakakis, Michalis | |
dc.contributor.author | Zoutsou, Kyriaki | |
dc.contributor.author | Papachristopoulos, Leonidas | |
dc.contributor.author | Tsakonas, Giannis | |
dc.contributor.author | Papatheodorou, Christos | |
dc.date.accessioned | 2025-09-24T09:13:45Z | |
dc.date.available | 2025-09-24T09:13:45Z | |
dc.date.issued | 2017 | |
dc.description.abstract | In the era of enormous information production human capabilities have reached their limits. The need for automatic information processing which would not be incommensurate to human sophistication seems to be more than imperative. Information scientists have focused on the development of techniques and processes that would assist human contribution while improve, or at least guarantee, information quality. Automatic indexing techniques may lay on various approaches offering different results in information retrieval. In this paper, we introduce an automated methodology for subject analysis, including both the determination of the aboutness of the documents and the translation of the related concepts to system terms. Focusing on a corpus consisting of articles related to the Digital Library Evaluation domain, topic modeling algorithms are utilized for the aboutness of the documents, while the context of the words in topics, as captured by Word Embeddings, are used for the translation of the extracted topics to EuroVoc concepts. | en |
dc.identifier.citation | Afiontzi, E., Kazadeis, G., Papachristopoulos, L., Sfakakis, M., Tsakonas, G., & Papatheodorou, C. (2013). Charting the Digital Library Evaluation Domain with a Semantically Enhanced Mining Methodology. Proceedings of the 13th ACMIEEECS Joint Conference on Digital Libraries, 125–134. Retrieved from https://doi.org/10.1145/2467696.2467713 Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022. Brown, K., & Barrière, C. (2006). Indexing, Automatic. Encyclopedia of Language & Linguistics, 603–610. https://doi.org/10.1016/B0-08-044854-2/00963-9 Chu, C. M., & Ajiferuke, I. (1989). Quality of indexing in library and information science databases. Online Review, 13(1), 11–35. Dunham, G. S., Pacak, M. G., & Pratt, A. W. (1978). Automatic indexing of pathology data. Journal of the American Society for Information Science, 29(2), 81–90. https://doi.org/10.1002/asi.4630290207 Fox, C. (1989). A stop list for general text. ACM SIGIR Forum, 24(1–2), 19–21. https://doi.org/10.1145/378881.378888 Fuhr, N., Tsakonas, G., Aalberg, T., Agosti, M., Hansen, P., Kapidakis, S., … Solvberg, I. (2007). Evaluation of Digital Libraries. Int. J. Digit. Libr., 8(1), 21–38. https://doi.org/10.1007/s00799-007-0011-z Hjørland, B. (2001). Towards a theory of aboutness, subject, topicality, theme, domain, field, content and relevance. Journal of the American Society for Information Science and Technology, 52(9), 774–778. https://doi.org/10.1002/asi.1131 Lau, J. H., Grieser, K., Newman, D., & Baldwin, T. (2011). Automatic labelling of topic models. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, 1536–1545. Lau, J. H., Newman, D., Karimi, S., & Baldwin, T. (2010). Best topic word selection for topic labelling. 605–613. Retrieved from http://dl.acm.org/citation.cfm?id=1944566.1944635 Li, Y., Xu, L., Tian, F., Jiang, L., Zhong, X., & Chen, E. (2015). Word embedding revisited: A new representation learning and explicit matrix factorization perspective. IJCAI International Joint Conference on Artificial Intelligence, 2015–Janua(Ijcai), 3650–3656. Magatti, D., Calegari, S., Ciucci, D., & Stella, F. (2009). Automatic labeling of topics. 2009 Ninth International Conference on Intelligent Systems Design and Applications, 1227–1232. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2018). Advances in Pre-Training Distributed Word Representations. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018). Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 3111–3119. Mimno, D. (2018). jsLDA: An implementation of latent Dirichlet allocation in javascript. Retrieved February 10, 2015, from https://github.com/mimno/jsLDA Névéol, A., Shooshan, S. E., Humphrey, S. M., Mork, J. G., & Aronson, A. R. (2009). A recent advance in the automatic indexing of the biomedical literature. Journal of Biomedical Informatics, 42(5), 814–823. https://doi.org/10.1016/J.JBI.2008.12.007 Papachristopoulos, L., Kleidis, N., Sfakakis, M., Tsakonas, G., & Papatheodorou, C. (2015). Discovering the Topical Evolution of the Digital Library Evaluation Community. In E. Garoufallou, R. Hartley, & P. Gaitanou (Eds.), Metadata and Semantics Research SE - 9 (pp. 101–112). https://doi.org/10.1007/978-3-319-24129-6_9 Papachristopoulos, L., Tsakonas, G., Sfakakis, M., Kleidis, N., & Papatheodorou, C. (2016). The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain. https://doi.org/10.1007/978-3-319-43997-6_19 Publications Office of the European Union. (2015). EuroVoc thesaurus Volume 1 Alphabetical version Part B. Retrieved from http://europa.eu Pulgarı́n, A., & Gil-Leiva, I. (2004). Bibliometric analysis of the automatic indexing literature: 1956–2000. Information Processing & Management, 40(2), 365–377. https://doi.org/10.1016/S0306-4573(02)00101-2 Thellefsen, T. L., Brier, S., & Thellefsen, M. L. (2003). Problems concerning the process of subject analysis and the practice of indexing. Semiotica, 2003(144), 177–218. https://doi.org/10.1515/semi.2003.022 | |
dc.identifier.relatedurl | https://2019.ifla.org/conference-programme/satellite-meetings/ | |
dc.identifier.uri | https://repository.ifla.org/handle/20.500.14598/6718 | |
dc.language.iso | en | |
dc.rights | Attribution 4.0 International | |
dc.rights.accessRights | open access | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject.keyword | Subject Indexing | |
dc.subject.keyword | Similarity Measures | |
dc.subject.keyword | Text Classification | |
dc.subject.keyword | Machine Learning | |
dc.subject.keyword | Word Embedding | |
dc.title | Between two worlds: harmonizing automated and manual term labelling | en |
dc.type | Article | |
ifla.Unit | Section:Knowledge Management Section | |
ifla.Unit | Section::Digital Humanities – Digital Scholarship Special Interest Group | |
ifla.oPubId | https://library.ifla.org/id/eprint/2759/ |
Files
Original bundle
1 - 1 of 1