Digital preservation has emerged in recent years as a fast-moving and growing community of practice that is of ubiquitous relevance, but in which capability is unevenly distributed. Digital preservation in the research community has a close alignment to the FAIR principles and is delivered, albeit unevenly, through a complex specialist infrastructure comprising not simply technology but also the capacity of staff and 'know why' of policy (See Currie, Amy, & Kilbride, William. (2021). FAIR Forever? Long Term Data Preservation Roles and Responsibilities, Final Report (Version 7). Zenodo https://doi.org/10.5281/zenodo.4574234)
The European Open Science Cloud (EOSC) initiative has extensively worked to promote and enable access to Open Science data with the stated aim of ensuring that researchers can maximize the value of their research processes, sharing large-scale Research Infrastructures (RIs). The importance of advanced long-term preservation to allow reproducibility of research results is emphasized by the EOSC Strategic Research and Innovation Agenda (SRIA) and different reports of relevant bodies such as the Digital Preservation Coalition.
Started in January 2019, ARCHIVER is a unique initiative currently running in the EOSC framework that is competitively procuring R&D services for archiving and digital preservation. The ARCHIVER tenderers were selected through an open and competitive procurement process. Between December 2020 and August 2021 three consortia worked on innovative, prototype solutions for long-term data preservation, in close collaboration with CERN, EMBL-EBI, DESY and PIC. ARCHIVER procured R&D services that address the long-term preservation needs across the entire research data management cycle. The resulting services are sustainable and provide the needed functionality at scale that can implement FAIR Data Management Plans, using Trustworthy Digital Repositories (TDRs) certified according to best practices (e.g. ISO 16363 and CoreTrustSeal).
The initial assessment started by gathering some basic information about the current repositories from the organisations involved in the ARCIHVER project, namely EMBL-EBI, DESY, PIC and CERN to get familiarised with the tool.
The following information was shared:
The following data sets were used for a preliminary test of the FAIR assessment tool:
|EMBL-EBI||The ‘1000 genomes’ dataset contains 1000 human genomes, all publicly available with no restriction|
|DESY||Serial femtosecond crystallography data and metadata including links to CrystFEL Beam File, CrystFEL Geometry File, Processing Scripts and diffraction patterns|
|CERN||Audiovisual recordings of talk of a conference; Example of a CMS collision dataset in AOD format; example of a CMS simulated dataset in AODSIM format; a simple example of an OPERA neutrino event dataset|
|PIC||Fake dataset mimicking one night of raw data from the MAGIC Telescopes|