Collecting Online Newspapers and Bypassing Paywalls

Loading...
Thumbnail Image

Date

2024-05-30

Journal Title

Journal ISSN

Volume Title

Publisher

International Federation of Library Associations and Institutions (IFLA)

Abstract

The Legal Deposit Office of the National Library of Finland has been systematically collecting articles from online newspaper sites and media platforms, as well as other web materials, since 2007. Currently, the initiative extends to around 800 Finnish newspapers and journals, engaging in an ongoing process of article harvesting. This project does not encompass the harvesting of digital editions of periodicals; therefore, it is imperative to select periodicals that provide article content on their websites. Although numerous online newspapers offer open access, the challenge persists with many being subscription-based, with articles concealed behind paywalls. Consequently, the web crawler is limited to retrieving merely images and snippets of text from the article's commencement. Confronting this impediment, the National Library of Finland has conceived a methodology for accessing articles behind paywalls. There are two primary strategies for harvesting paywalled articles: one involves IP address recognition; and the other entails obtaining login credentials directly from the newspaper publishers. These credentials are then integrated into the collection tool, facilitating the harvest. This approach necessitates a sustained partnership with publishers, especially as they frequently revise their login procedures, which in turn requires the harvesting tool to be updated with new protocols. Presently, the Library successfully collects articles behind the paywalls of approximately 100 online newspapers. Acknowledging that the endeavour to harvest paywalled articles is an ongoing task in the face of evolving technical landscapes, it is essential to remain continuously adaptable and vigilant. Nonetheless, the endeavour is useful, considering the discrepancies that may exist between the content, illustrations, and headlines in online newspapers compared to their printed counterparts. Through this paywall project, the National Library of Finland diligently addresses the complexities involved in archiving the evolving landscape of online media.

Description

Keywords

Subject::Newspapers, Subject::Legal deposit, Subject::Online news, Subject::News media

Citation

Collections