Collecting Online Newspapers and Bypassing Paywalls

dc.audienceAudience::Information Literacy Sectionen_US
dc.audienceAudience::News Media Sectionen_US
dc.audienceAudience::Digital Humanities – Digital Scholarship Special Interest Groupen_US
dc.contributor.authorHeikkinen, Jari
dc.contributor.authorChamchoon, Topi
dc.contributor.authorSairanen, Samuli
dc.contributor.authorNieminen, Joel
dc.contributor.authorHaukkala, Sanna
dc.coverage.spatialLocation::Finlanden_US
dc.date.accessioned2024-06-25T09:34:59Z
dc.date.available2024-06-20
dc.date.available2024-06-25T09:34:59Z
dc.date.issued2024-05-30
dc.description.abstractThe Legal Deposit Office of the National Library of Finland has been systematically collecting articles from online newspaper sites and media platforms, as well as other web materials, since 2007. Currently, the initiative extends to around 800 Finnish newspapers and journals, engaging in an ongoing process of article harvesting. This project does not encompass the harvesting of digital editions of periodicals; therefore, it is imperative to select periodicals that provide article content on their websites. Although numerous online newspapers offer open access, the challenge persists with many being subscription-based, with articles concealed behind paywalls. Consequently, the web crawler is limited to retrieving merely images and snippets of text from the article's commencement. Confronting this impediment, the National Library of Finland has conceived a methodology for accessing articles behind paywalls. There are two primary strategies for harvesting paywalled articles: one involves IP address recognition; and the other entails obtaining login credentials directly from the newspaper publishers. These credentials are then integrated into the collection tool, facilitating the harvest. This approach necessitates a sustained partnership with publishers, especially as they frequently revise their login procedures, which in turn requires the harvesting tool to be updated with new protocols. Presently, the Library successfully collects articles behind the paywalls of approximately 100 online newspapers. Acknowledging that the endeavour to harvest paywalled articles is an ongoing task in the face of evolving technical landscapes, it is essential to remain continuously adaptable and vigilant. Nonetheless, the endeavour is useful, considering the discrepancies that may exist between the content, illustrations, and headlines in online newspapers compared to their printed counterparts. Through this paywall project, the National Library of Finland diligently addresses the complexities involved in archiving the evolving landscape of online media.en_US
dc.identifier.urihttps://repository.ifla.org/handle/20.500.14598/3408
dc.language.isoenen_US
dc.publisherInternational Federation of Library Associations and Institutions (IFLA)en_US
dc.rights.holderInternational Federation of Library Associations and Institutions (IFLA)en_US
dc.rights.licenseCC BY 4.0en_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.subjectSubject::Newspapersen_US
dc.subjectSubject::Legal depositen_US
dc.subjectSubject::Online newsen_US
dc.subjectSubject::News mediaen_US
dc.titleCollecting Online Newspapers and Bypassing Paywallsen_US
dc.typeEvents Materialsen_US
ifla.UnitUnits::Section::News Media Sectionen_US
ifla.UnitUnits::Special Interest Group::Digital Humanities – Digital Scholarship Special Interest Groupen_US
ifla.UnitUnits::Section::Information Literacy Sectionen_US
ifla.oPubId0en_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Heikkinen_IFLA2024.pdf
Size:
320.89 KB
Format:
Adobe Portable Document Format
Description:
Collecting Online Newspapers and Bypassing Paywalls

Collections