Archiving and preservation for research environments

How are the innovative companies involved in ARCHIVER supporting EOSC

09 September 2020

On the 31st of August, the open consultation for the EOSC Strategic Research and Innovation Agenda closed after one and a half months where stakeholders from inside and outside the EOSC Community had the opportunity to help to develop the work programmes for EOSC in Horizon Europe.
The European Open Science Cloud (EOSC) is the envisioned federation of research (data) infrastructures that will enable the Web of FAIR Data and Services, helping researchers to perform Open Science, and open up and exploit their data, publications and code.
The ARCHIVER project took part in the open consultation by submitting a response which collected the input of the five commercial consortia participating in the Design Phase of the project, which are the following: Arkivum – Google Cloud, GMV – PIQL – AWS – SafeSpring, LIBNOVA - CSIC – University of Barcelona – Giaretta Associates, RHEA System Spa – DEDAGROUP – GTT, T-Systems International – GWDG – Onedata. These innovative companies and organisations of different sizes and complementary fields of expertise, gave valuable feedback about the issues posed by the consultation illustrating which are the priorities that the private sector envisions for EOSC.

FAIR principles not limited to data; must be applicable to software, workflows & services

One of the EOSC guiding principles states “Towards a Web of FAIR Data and Related Services for Science”. ARCHIVER recognizes that FAIR principles are critical, and highlights that the respective guiding principle should go further including also the implications of FAIR on the actual data objects. The extension of the concept of FAIR should be emphasised and adapted to any other research associated products, like software, workflows, services and even infrastructures, taking into account not only live data or tools, but also those to be preserved.
The ARCHIVER contribution raised the need for a new way to certify the FAIRness of data, research objects and services, and the development of new skills and training for the researchers (and their institutions). To help the researchers to make this extra effort, they need to be rewarded and recognized. Therefore, the priority should be to encourage, promote, reward and recognise behaviours that lead to an environment where FAIR is the default and not the exception.
Another aspect highlighted by ARCHIVER in its contribution is the need for transparency of machine-run algorithms, standards, guidelines and regulations.
That said, ARCHIVER wishes the EOSC principles to be applied to the research fields across national borders, at an international level.

Improving EOSC Action Areas to incorporate the innovations from the private sector

The EOSC governing bodies have identified fourteen Action Areas (AA) to help deploy the EOSC ecosystem. With regard to these, ARCHIVER highlighted that, from a business perspective, stable and fair business models are key to incorporating the innovations that are fostered by the private sector into this initiative.
An Action Area that drew the attention of ARCHIVER and the commercial consortia of the project is AA4 – Authentication and Authorisation Infrastructure, where it has been noticed that federation is currently being over-emphasized as a key building block, but it is not creating any added value by itself. EOSC should be able to offer more and better services, e.g. for specific analytics or long-term observations that otherwise would not be available, in order to be adopted at a large scale.
When asked if there are some missing Areas of Action, some ARCHIVER companies put emphasis on the fact that stimulating use cases, innovation aspects and optimising capital expenditure are Actions Areas underrepresented in EOSC. They also noticed that the actions should include a better quantification and optimisation of the current use of capital expenditure in e-Infrastructures. In addition, it is important to support the growing volume of data and objects produced within the whole research life cycle, including also the stage of preservation of the knowledge produced.
Another important point made by ARCHIVER is that in the Action Areas there is not enough emphasis on the long-term curation and stewardship of data, i.e. ensuring that data is interoperable and re-usable over very long timescales. This means not only having certified repositories and ensuring data is preserved and accessible, but also ensuring that data is long-lived by design and that longevity and reuse are part of the mindset when data is first created, stimulating the creation of a “Preservation by Default” mentality.

Prioritising the use of interoperable services and resources across Europe

The general comment ARCHIVER made about EOSC priorities is that “everything should be interoperable at European level, where possible”. Otherwise, the risk is that local and national infrastructures, initiatives, standards, policies, practices, etc. will never be fully integrated effectively , which makes EOSC harder to achieve.
Some ARCHIVER Contractors suggested that prioritising the use of pan-European services and resources should be stimulated , and they see this as a missing priority.
In addition, one of the priorities that should be included is the identification of support of sustainable European federated e-Infrastructures capable of handling the full research life cycle, including the resources needed to preserve the knowledge produced by the diverse communities and scientific domains involved.
Another point raised by ARCHIVER is the need for a task force or working group to look at the long-term aspects of achieving FAIR, such as the curation and quality of data, the importance of TDRs and TRUST, the role of digital preservation in data stewardship, and the frequently overlooked economics of long-term data access and reuse.

Adopting an “EOSC first” approach in Horizon Europe

The ARCHIVER project includes in its R&D an hybrid cloud service approach that relies on most of the principles highlighted by EOSC. The Results of the R&D activities in the PCP Project (Design, Prototype and Pilot),) can be very relevant to the baseline of definition, development and implementation of the EOSC SRIA.
In addition, collaboration between ongoing initiatives would be a great opportunity to align and improve all of them. For example, the results of projects, such as OpenAIRE and Freya, which cover domains that are targeted by the Action Areas, must be taken as the baseline for the actions implemented by EOSC and this must be integrated into the strategy.  
Aligning with what has been mentioned above, Horizon Europe should implement an “EOSC first” principle in order to avoid projects replicating alternatives when not necessary.
Clear EOSC guidelines must be established and must be dutifully fulfilled in a coordinated manner by future projects under the Horizon Europe umbrella in order to avoid fragmentation at national level and by scientific domain.