Please use this identifier to cite or link to this item: https://repository.ifla.org/handle/123456789/3399
Title: High Fidelity Web Archiving of News Sites and New Media with Browsertrix
Authors: Walsh, Tessa
Wilkinson, Henry
Kreymer, Ilya
Keywords: Subject::Digital preservation
Subject::Online news
Subject::News media
Issue Date: 30-May-2024
Publisher: International Federation of Library Associations and Institutions (IFLA)
Series/Report no.: IFLA International News Media Conference 2024;Aarhus, Denmark, 29 - 31 May 2024
Abstract: This paper discusses how Webrecorder’s free and open source browser-based web archiving tools such as Browsertrix can and have been used by libraries and archives to create and provide access to high fidelity web archives of online news sites, social media, digital publications, digital humanities projects, and other historically difficult to preserve forms of online news media. Emphasis is placed on recently developed assistive quality assurance (QA) tools implemented in Browsertrix that allow users to assess the quality of captured content with the assistance of automatically calculated metrics such as screenshot and text comparison between the site as visited by a browser during crawling and its replay from the captured archive. This exciting new development builds on existing features which differentiate Webrecorder’s browser-based crawling from alternative web archiving methods, such as the use of browser profiles to archive material behind log-ins and on personalized social media feeds, ad and cookie blocking features, and a suite of extendable behaviors that drive the browser during capture, allowing for autoscroll as well as automated navigation of certain social media sites. The paper discusses how these features enable librarians to easily and effectively preserve and provide access to news media, referencing several recent collaborations between Webrecorder, libraries, journalists, and others invested in high fidelity archiving of important and often complex online content.
URI: https://repository.ifla.org/handle/123456789/3399
Appears in Collections:Event Materials

Files in This Item:
File Description SizeFormat 
Walsh_IFLA2024.pdfHigh Fidelity Web Archiving of News Sites and New Media with Browsertrix198.25 kBAdobe PDFThumbnail
View/Open


This item is licensed under a Creative Commons License Creative Commons