Finding Old Images through a New Lens: Use of Computer Vision for Searching Historic Newspaper Collections

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The capacity to find images with granularity—i.e., finding images through image-based searching—is requisite to increase the usefulness of image-based research for the digital humanities. As evident through the National Endowment for the Humanities’s data challenge, which invited scholars and students “to produce creative web-based projects demonstrating the potential for using the [textual] data found in Chronicling America,” digital newspaper collections are a rich source for digital humanities research. Chronicling America’s search interface and application programming interface (API), however, are restricted to text-based searches, therein limiting the findability of image-based content. In their paper on “Library Collections as Humanities Data: The Facet Effect,” Thomas Padilla and Devin Higgins elucidate the value of images as a resource which might meet digital humanists’ inquiry of images as a substantive source for research. The use of computer vision for querying image-based digital collections enables users to find image content which, due to the lack of image tagging and description, might not otherwise be found through keyword searching. For example, in a digitized newspaper collection a user might query the image of an advertisement for spectacles which would not return through a keyword search. The authors propose a case study that applies computer vision image searching to the Farm, Field, and Fireside Collection, a collection of 22 historic agricultural newspapers published across the United States’s Midwestern region between 1841-1983. The authors will investigate the success of using VGG Image Search Engine (VISE), an open source computer vision software developed by the Visual Geometry Group (VGG) at the University of Oxford, for searching Farm, Field, and Fireside Collection images to find like-images. Using image recognition, VISE identifies and establishes correspondences between images to create an index of image features. Through this research, the authors will investigate the potential for integrating VISE within existing newspaper access system frameworks, such as Open-ONI, an open source software for searching and browsing digitized newspaper collections. Successful implementation of this tool would enable researchers an avenue to capitalize upon extensive image collections, thus expanding the capacity for inquiries in the digital humanities.

Description

Keywords

Citation

Ashenfelder, Mike. “The NEH ‘Chronicling America’ Challenge: Using Big Data to Ask Big Questions.” The Signal (blog). 2016, August 4. https://blogs.loc.gov/thesignal/2016/08/the-neh-chronicling-america-challenge-using-big-data-to-ask-big-questions/ Padilla, Thomas G. and Devin Higgins. 2014. “Library Collections as Humanities Data: The Facet Effect.” Public Services Quarterly 10, no. 4: pp. 324-335. Information gathered from Google Analytics statistics of the IDNC website: https://idnc.library.illinois.edu/ Visual Geometry Group. Accessed May 14, 2019, http://www.robots.ox.ac.uk/~vgg/. VGG Image Search Engine. Accessed May 14, 2019, http://www.robots.ox.ac.uk/~vgg/software/vise/. Franklin, Alexandra. “Where did you get that hat?” Bodleian Ballads Blog (blog). 2012, April 4. http://balladsblog.bodleian.ox.ac.uk/blog/174 Archived version of the project website: https://web.archive.org/web/20180429160809/https://www.challenge.gov/challenge/chronicling-america-historic-american-newspapers-data-challenge/ Beyond Words. Accessed May 14, 2019, http://beyondwords.labs.loc.gov. Thomas, Deborah and Leah Weinryb Grohsgal. 2017. “Opening the Doors Wide: The US National Digital Newspaper Program, Open Data and the NEH Chronicling America Data Challenge.” Paper presented at the 2017 IFLA News Media Conference, Reykjavik, Iceland: p. 4. https://www.ifla.org/files/assets/newspapers/2017_Iceland/2017-thomas-grohsgal-en.pdf. ScribeAPI. Accessed on May 14, 2019, https://github.com/LibraryOfCongress/scribeAPI. Zooniverse. Accessed on May 14, 2019, https://www.zooniverse.org/. Other projects include Shakespeare's World, Galaxy Zoo, and Snapshot Serengeti, among dozens of others. For more information regarding Medusa see: Rimkus, Kyle, and Thomas G. Habing. “Medusa at the University of Illinois at Urbana-Champaign: A Digital Preservation Service Based on PREMIS.” Paper presented at the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, Indianapolis, IN, 2013: pp. 49-52. http://hdl.handle.net/2142/45232. Schlaack, William and Anna Oates. 2018. “Digital Migrations: A Case of Turn-of-the-Century Chicago-Immigrant Newspapers.” Paper presented at the 2017 IFLA News Media Conference, Gainesville, FL: p. 9. http://hdl.handle.net/2142/101852. Schlaack, William and Anna Oates. 2018. “Digital Migrations: A Case of Turn-of-the-Century Chicago-Immigrant Newspapers.” Paper presented at the 2017 IFLA News Media Conference, Gainesville, FL: p. 9. http://hdl.handle.net/2142/101852. Maintained by Ignite Technologies, Olive Software, previously Olive ActivePaper Archive is a proprietary solution for discovery and access of newspapers http://www.ignitetech.com/olive-software/. “Guidelines & Resources.” National Digital Newspaper Program. Accessed on May 14, 2019, https://www.loc.gov/ndnp/guidelines/. I.e. newspaper pages. VISE treats each digitized newspaper page as an individual image. oxvgg/vise docker container: https://hub.docker.com/r/oxvgg/vise. See the VISE User Guide for instructions search instructions: https://github.com/ox-vgg/vise/blob/master/UserGuide.md. Open-ONI. Accessed on May 14, 2019, https://github.com/open-oni. chroman. Accessed on May 14, 2019, https://github.com/LibraryOfCongress/chronam. Project websites: https://chroniclingamerica.loc.gov/, https://oregonnews.uoregon.edu/, http://panewsarchive.psu.edu/ Veridian Software. Accessed on May 14, 2019, https://veridiansoftware.com/.