Empirical Research on Web Harvesting in the Process of Text and Data Mining in National Libraries of EU Member States

Papadopoulos, Marinos and Botti, Maria and Paraskevi (Vicky) Ganatsiou, M. A. and Zampakolas, Christos (2020) Empirical Research on Web Harvesting in the Process of Text and Data Mining in National Libraries of EU Member States. Open Journal of Philosophy, 10 (01). pp. 88-112. ISSN 2163-9434

[thumbnail of ojpp_2020020615371387.pdf] Text
ojpp_2020020615371387.pdf - Published Version

Download (2MB)

Abstract

Almost two decades of experience on web harvesting and archiving are counted; the subject of web harvesting and web archiving have been top in the interest of researchers, technologists and librarians-information scientists. Web harvesting projects and pilot programs on archiving content traced on the Web are becoming priorities for national libraries and cultural heritage organizations in the EU. This paper pertains to web harvesting as a process for data mining from web and only through web (“pull” function); this paper elaborates upon research implemented in the framework of the funded research project titled “Web Archiving in Public Libraries and IP Law” that focused on the processes of web-harvesting and archiving as well as Text and Data Mining (TDM) operations in the national libraries of EU Member States. Web archiving as an official operation in national libraries of EU Member States creates web collections and preserves them for the purpose of being accessible and usable in perpetuity. This paper pertains to research on various components of web harvesting and archiving through an online survey (qualitative research) which targeted the national libraries of EU Member States. The research team of authors posed seventeen questions to EU national libraries. The survey output comes from answers delivered by 22 national libraries of EU Member States. The questionnaire was created through the use of Google forms. The researchers reached the EU national libraries via email and follow up telephone calls seeking libraries’ participation in the research. The aim of the research was to delve on participant libraries’ Text and Data Mining operation leveraging on Web harvesting and Web archiving technologies and operations. Results analysis reveals that web harvesting is considered among national libraries’ top priorities; the relevant projects increase in number, the web collections become more and more and the technological infrastructures and tools for web harvesting improve. Yet, there are many issues that remain unresolved. A significant number of surveyed libraries consider that legal and technical issues remain the most important to resolve. Access to harvested material is still under legal restrictions. The Directive 2019/790/EU on Copyright in the Digital Single Market (DSM) creates a favorable legal foundation for the deployment of web harvesting operations in national libraries of the EU Member States. TDM technologies make possible new areas of research. Web harvesting that was initially aimed for preservation purposes now expands to unprecedented research of national heritage through state-of-the-art automated TDM processes.

Item Type: Article
Subjects: STM Repository > Social Sciences and Humanities
Depositing User: Managing Editor
Date Deposited: 14 Oct 2023 04:30
Last Modified: 14 Oct 2023 04:30
URI: http://classical.goforpromo.com/id/eprint/3638

Actions (login required)

View Item
View Item