Topic or Metadata Modeling for Cross-Disciplinary Scholarship: Challenges and Opportunities for Academic Libraries

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

At the University of Notre Dame, we have been exploring automatic classification of texts via topic modeling and user-generated metadata to support cross-disciplinary scholarship. This effort originated in 2015 from a collaboration between the libraries and the Center for Civil and Human Rights to create an online comparative research tool to explore documents of Catholic social teaching and international human rights law. The library built the infrastructure for indexing, retrieving, and visualizing records while the researchers provided the controlled vocabulary and initial classification scheme. From the onset, the project team realized there were limitations with current library classification standards and practices. To provide satisfactory discovery for cross-disciplinary content, the group "crowdsourced" the controlled vocabulary task to researchers and students of each respective discipline. Through the selection of controlled vocabulary, initial hand-tagging, and a more robust topic modeling, the researchers provided semantic linking of similar or different concepts at full-text and paragraph level. The modeling disambiguated terms (i.e., the use of child - biological vs. child of God ) and bridged the gap between different disciplines description of equivalent concepts. For example, users can select the topic “solidarity/cooperation” and explore meaningful search results from the two fields about working together to improve human lives. The modeling enables a user from one discipline to overcome the problem of nuanced vocabulary in the other domain and, hence, uncover relevant information that might otherwise remain hidden within the context of current classification schema. The project team is currently reconciling issues of transparency by providing detailed documentation on the application of the controlled vocabulary and in the process of implementing features for crowdsourcing data to enhance classification. The paper will present the updates of our topic modeling endeavor, and provide insights on considerations of the scalability and sustainability for academic libraries to support cross-disciplinary scholarship.

Description

Keywords

Citation

Ackerson, L. G. (2008) Challenges for Engineering Libraries, Science & Technology Libraries, 21:1-2, 43-52. DOI: 10.1300/J122v21n01_05 Beghtol, C. (1998). Knowledge Domains: Multidisciplinarity and Bibliographic Classification Systems. Knowledge Organization, 25(1), 1-12. Bijalwan, V., Kumar, V., Kumari, P., & Pascual, J. (2014, 02). KNN based Machine Learning Approach for Text and Document Mining. International Journal of Database Theory and Application, 7(1), 61-70. DOI: 10.14257/ijdta.2014.7.1.06 Campbell, & Fast. (2004). Academic Libraries and the Semantic Web: What the Future May Hold for Research-Supporting Library Catalogues. The Journal of Academic Librarianship, 30(5), 382-390. DOI: 10.1016/j.acalib.2004.06.007 Caragea, C., Wu, J., Williams, K., Gollapalli, S. D., Khabsa, M., & Giles, C. L. (2014). Automatic Identification of Research Articles From Crawled Documents. Paper presented at the 2014 WSDM Workshop on Web-Scale Classification: Classifying Big Data. http://php.scripts.psu.edu/users/k/i/kiw5209/papers/2014/wscbd2014_caragea.pdf Danilevsky, M., Wang, C., Desai, N., Ren, X., Guo, J., & Han, J. (2014). Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents. Proceedings of the 2014 SIAM International Conference on Data Mining, 398-406. DOI: 10.1137/1.9781611973440.46 Denda, K. (2005). Beyond subject headings - A structured information retrieval tool for interdisciplinary fields. Library Resources & Technical Services, 49(4), 266-275, DOI: http://dx.doi.org.proxy.library.nd.edu/10.5860/lrts.49n4.266. Golub, K. (2006) Automated subject classification of textual web documents. Journal of Documentation, Vol. 62 Issue: 3, pp.350-371, DOI: https://doi.org/10.1108/00220410610666501 Gross, T., Taylor, A., & Joudrey, D. (2014). Still a Lot to Lose: The Role of Controlled Vocabulary in Keyword Searching. Cataloging & Classification Quarterly, 1-39, DOI: 10.1080/01639374.2014.917447. OCLC. (2009). Online catalogs what users and librarians want : An OCLC report. Dublin, Ohio: OCLC. Retrieved from https://www.oclc.org/content/dam/oclc/reports/onlinecatalogs/fullreport.pdf OCLC. (2011) Perceptions of libraries, 2010 context and community : A report to the OCLC membership. Dublin, Ohio: OCLC. Retrieved from https://eric.ed.gov/?id=ED532601 Olson, H. (1996) Between control and chaos: An ethical perspective on authority control. Paper presented at the Authority Control in the 21st Century: An Invitational Conference. http://worldcat.org/arcviewer/1/OCC/2003/06/20/0000003520/viewer/file97.html. Palmer, C. L. (1996). Navigating among the disciplines: The library and interdisciplinary inquiry - Introduction. Library Trends, 45(2), 129-133. Romanowski, C. A. (2016) A comparative analysis of the distinct evolution of cataloging and information technology towards the creation of the next generation library system. Paper presented at the IFLA WLIC 2016 – Columbus, OH – Connections. Collaboration. Community in Session 93 - Cataloguing and Information Technology. http://library.ifla.org/1323/1/093-romanowski-en.pdf Searing, S. E. (1996). Meeting the Information Needs of Interdisciplinary Scholars: Issues for Administrators of Large University Libraries. Library Trends, 45(2), 315–42. Qin, J. & Paling, S. (2001). Converting a Controlled Vocabulary into an Ontology: The Case of GEM. Information Research, 6(2). Retrieved from http://informationr.net/ir/6-2/paper94.html