Archiving and Accessing HTML-Based Newspapers Using XML and CDATA Strings

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This article outlines one in-house model for archiving and providing access to HTML-based news in the Kentucky Digital Newspaper Program (KDNP) at the University of Kentucky (UK). To allow for search and retrieval of HTML-based news in the KDNP which already contains news content digitized from analog sources, the encapsulation of HTML content using XML encoded CDATA strings read by a prototype open-source PHP viewer is described.

Description

Keywords

Citation

1. The World Wide Web Consortium (W3C) . What Is Hypertext? [Internet]. The World Wide Web Consortium (W3C); [cited 2016 Feb 10] . Available from: http://www.w3.org/WhatIs.html 2. Herborth C. 2010 Jan 12. Dealing with Data in XML. [Internet]. IBM DeveloperWorks; [cited 2016 Feb 10]. Available from: http://www.ibm.com/developerworks/library/x-cdata/ 3. Nielsen J. 1995 Feb 01. History of Hypertext [Internet]. Nielsen Norman Group; [cited 2016 Feb 10]. Available from: https://www.nngroup.com/articles/hypertext-history/ 4. Newspaper Digitization Interest Group (NDIG). 2014. Metadata Application Profile - Digital Newspapers [Internet]. Newspaper Digitization Interest Group (NDIG); [cited 2016 Feb 10]. Available from: https://sites.google.com/site/digitalnewspaperspractices/technical-specifications/metadata-specfication 5. Geiger B. 20 Jan 2016. Fate of Your Archives Is ... Uncertain [Internet]. California Newspaper Publishers Association; [cited 2016 Feb 10]. Available from: http://www.cnpa.com/california_publisher/features/fate-of-your-archives-is-uncertain/article_8f3e2cda-bfd4-11e5-86f8-9797680b24ed.html 6. Pierce V. 2012 Feb 09. Finding That Needle in the Haystack: The Power of Full Text Searching in Chronicling America. [Internet]. South Carolina Digital Newspaper Program; [cited 2016 Feb 10]. Available from: http://library.sc.edu/blogs/newspaper/2012/02/09/finding-that-needle-in-the-haystack-the-power-of-full-text-searching-in-chronicling-america/ 7. Lepore J. 2015. What the Web Said Yesterday. The New Yorker [Internet]. [cited 2016 Feb 10] Available from: http://www.newyorker.com/magazine/2015/01/26/cobweb 8. Grainger S. 2000. Emulation as a Digital Preservation Strategy. D-lib Magazine [Internet]. [cited 2016 Feb 10]. Available from: http://www.dlib.org/dlib/october00/granger/10granger.html 9. Johnston L. 2014 Feb 11. Considering Emulation for Digital Preservation [Internet]. The Signal Digital Preservation: Library of Congress; [cited 2016 Feb 10]. Available from: https://blogs.loc.gov/digitalpreservation/2014/02/considering-emulation-for-digital-preservation/ 10. Sawers P. 2015 Oct 22. The Internet Archive Is Rebuilding the Wayback Machine to Make Web History Easier to search [Internet]. VentureBeat; [cited 2016 Feb 10]. Available from: http://venturebeat.com/2015/10/22/the-internet-archive-is-rebuilding-the-wayback-machine-to-make-the-webs-history-easier-to-search/ 11. University of Kentucky Libraries. 2016. Newz Viewer [Internet]. GitHub Code Repository; [cited 2016 Feb 10]. Available from: https://github.com/uklibraries/newz-viewer 12. Project Blacklight, 2016. Blacklight Discovery Platform Framework [Internet]; [cited 2016 Mar 31]. Availabel from: http://projectblacklight.org/