Innovative Approaches of Historical Newspapers: Data Mining, Data Visualization, Semantic Enrichment

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In this age of Big Data this paper describes how digital librairies can apply at large scale innovative approaches to better valorize and bring better experiences of old newspapers. On the first hand, the state-of-the-art OLR (optical layout recognition) technique in one of the largest heritage press digitization projects in Europe (Europeana Newspapers, www.europeana-newspapers.eu, 2012-2015) was used in a data mining experiment. Data analysis was applied to quantitative metadata derived from a 850K pages subset of six XIXth-XXth c. French newspaper titles from the BnF collection. The METS/ALTO XML data was analyzed with data mining and data visualization techniques that show promising ways for the production of knowledge about historical newspapers that are of great interest for library professionals (digitization programs management, curation and mediation of newspaper collections) and for end-users, particularly the digital humanities community. On the other hand, the Retronews web portal showcases how advanced semantic annotation techniques can improve the retrieval efficiency on a digital newspapers collection; thus the rediscovery and reappropriation of these documents by various types of users: teachers, students, researchers, general public.

Description

Keywords

Citation

1. Cukier K., Mayer-Schönberger V., Big Data: A Revolution That Will Transform How We Live, Work, and Think, Eamon Dolan/Houghton Mifflin Harcourt, 2013. 2. Green R., Panzer M., “The Interplay of Big Data, WorldCat, and Dewey”, in Advances In Classification Research Online, 24(1). 3. Teets M., Goldner M., “Libraries’ Role in Curating and Exposing Big Data”, Future Internet 2013, 5, 429-438. 4. Lapôtre, R. “Faire parler les données des bibliothèques : du Big Data à la visualisation de données – Let the data do the talking: from Big Data to Dataviz”. Library Curator memorandum, ENSSIB, 2014. http://www.enssib.fr/bibliotheque-numerique/notices/65117-faire-parler-les-donnees-des-bibliotheques-du-big-data-a-la-visualisation-de-donnees 5. The Front Page, http://dhistory.org/frontpages. 6. Sherratt, T., “4 million articles later…”, June 29, 2012. http://discontents.com.au/4-million-articles-later 7. www.europeana-newspapers.eu 8. Neudecker, C., Wilms L., KB National Library of the Netherlands, “Europeana Newspapers, A Gateway to European Newspapers Online”, FLA Newspapers/GENLOC PreConference Satellite Meeting, Singapore, August 2013. 9. Beranger, F., “Big Data – Collecte et valorisation de masses de données”, Livre blanc Smile, 2015. http://www.smile.fr/Livres-blancs/Erp-et-decisionnel/Big-data 10. Joffredo, L. “La fabrication de la presse”. http://expositions.bnf.fr/ presse/arret/07-2.htm. 16 11. Langlais, P.-C., “La formation de la chronique boursière dans la presse quotidienne fran aise (1 01-1870). Métamorphoses textuelles d'un journalisme de données – The Stock exchange section in the French daily (1801-1 70)”. Thèse de doctorat en science de l'information et de la communication, CELSA Université Paris-Sorbonne, 2015 12. Feyel, G., La Presse en France des origines à 1944. Histoire politique et matérielle, Ellipses, 2007 13. Lease Morgan, E., “Use and understand: the inclusion of services against texts in library catalogs and discovery systems”, Libray Hi Tech, Vol 30 Iss 1 pp. 35-59. 14. Jeanneret, Y., « Complexité de la notion de trace. De la traque au tracé » In: Galinon-Mélénec Béatrice (dir.). L’Homme trace. Perspectives anthropologiques des traces contemporaines. CNRS Editions, Paris, 2011 15. Aiden, E., Michel, J.-B., Uncharted: Big Data as a Lens on Human Culture. New York: Riverhead Books, 2013 16. Dunning A., and Neudecker, C., “Representation and Absence in Digital Resources: The Case of Europeana Newspapers”, Digital Humanities 2014, Lausanne, Switzerland. http://dharchive.org/paper/ DH2014/Paper-773.xml 17. IPTC, https://iptc.org/standards/subject-codes 18. Bibliothèque nationale de France, « Référentiel d’enrichissement du texte », 2015. http://www.bnf.fr/fr/professionnels/numerisation_boite _outils/ a.nu