The document presents the NEP Data Management Plan (NEP-DMP), and describes the measures envisaged to efficiently manage the Research Data collected and generated during the project.
The NEP-DMP is intended to be a living document in which information can be made available on a finer level of granularity through updates as the implementation of the project progresses and when significant changes occur. The document is therefore versioned in order to keep track of changes and improvements.
The NEP-DMP describes the standards and methodologies for the collection and generation of Research Data that will be applied throughout the duration of the project, as well as the conditions for publishing such data. This document aims to facilitate the creation of common understanding and, where possible, common practices.
The present document, NFFA-Europe Pilot Data Management Plan (NEP-DMP), addresses the anagement of all the Research Data produced within the NEP project. This is done in compliance with EU legislation and rules. Its purpose is to define a common strategy related to the management of data throughout the entire project life cycle.
The NEP project aims at enhancing the Open Access to NFFA-Europe distributed European research infrastructure at the nanoscale, available for academic and industrial researchers, set up with the previous NFFA-Europe project. The wide spectrum of Instruments and Measurement Techniques available in Access Providers’ Sites across all Europe are accessible to Registered Users after the submission and approval of a Proposal.
The majority of the Research Data produced within NEP are created by Research Users during User Access to NFFA-Europe Infrastructure and by researchers affiliated with a Beneficiary or Third Party within the Joint Activities (WPs 11 to 15) and while performing in-house research related to the NFFA-Europe PILOT project. These individuals are hereinafter referred to as Recipients.
For an overview of the obligations and responsibilities to which the Recipients are subjected in terms of data management, please refer to the Research Data Policy (Annex 3).
Readers can consider the NEP-DMP as a living document, which can and will be updated throughout the entire project lifecycle. In order to keep track of different versions, the version number of each NEP-DMP is always included in the administrative section above. All Beneficiaries will be notified when a new version of the NEP-DMP is released.
The objective of this document is to describe the Research Data generated and processed during the entire project lifecycle, and how they will be managed, curated and preserved inside and outside the project.
Moreover, this document will give indications to manage Research Data in a FAIR way. In compliance with rules and recommendations described in this deliverable, each Laboratory will provide a DMP (Lab-DMP) related to the management of their Research Data.
This document is complemented by a list of other documents that provide information on the way data will be collected and managed within the NEP project:
Annex 1: Glossary containing the definitions needed to deal with data management and NEP procedures. The glossary has the aim of providing a common language with all terms clearly defined.
Annex 2: Proposal Metadata Schema, with all the metadata available for each Research Users’ accepted Proposal, acquired centrally through the Proposal submission form on the NFFA-Europe portal.
Annex 3: Research Data Policy document that articulates the responsibilities of the individuals involved in Research Data management within the entire life of the NEP project.
The NEP project will generate data - including associated Metadata - in a wide range of R&D activities, including those needed to validate the Results of the project that will be presented in Scientific Publications and those associated with reports and other documents.
The format of the data and associated Metadata collected during NEP activities will be mainly electronic and can be classified in two major categories as follows:
Before the publication of any Research Data, the Heads of Laboratory that operate within NFFA-Europe infrastructure are bound to draft a DMP (Lab-DMP) related to the management of the Research Data produced during the project. In the case of Access Providers, the Lab-DMP will be drafted before welcoming the first Research User in the Laboratory. The Lab-DMP is drafted and updated whenever needed using an online tool, called Data Stewardship Wizard, made available to Beneficiaries and Third Parties at the link https://dsw.nffa.eu/.
The Lab-DMPs produced by every Laboratory will integrate and extend the NEP-DMP in the next versions of the document.
In case of discrepancy or disagreement between the DMPs, the NEP-DMP shall prevail.
As a project participating in the Open Research Data Pilot (ORD-Pilot) in Horizon 2020, NEP will work to make its Research Data Findable, Accessible, Interoperable, and Reusable (FAIR).
Work Package 16, Implementing FAIR Data approach within NEP, is fully devoted to this challenging task and aims at consolidating what was achieved within NFFA-Europe and at further developing new tools and services to provide guidelines and procedures for a FAIR Data approach. This specific activity will strongly benefit from the suggestions and contributions of the EOSC experts within the executive and strategy committee (ESC) of NEP. This joint activity is actively working to provide data services and support to Recipients.
For each of the Proposals approved within the infrastructure, the objective of NEP is to provide Research Users and Access Providers with tools ensuring that the Research Data are managed in a FAIR by design way.
Making data FAIR ensures they can be found, understood and reused by the creators as well as by others. A useful tool for researchers and providers is the FAIR Data checklist.
General scheme of FAIR principles:
Every Recipient will have the possibility to choose whether to adopt tools provided by the project or to use their own tools and good practices that have to be compliant with the FAIR principles.
Recipients are bound to make Research Data needed to validate the Results presented in a Scientific Publication or appearing in it (Publication Data) identifiable and locatable by means of a persistent identifier (PID), such as a Digital Object Identifiers (DOI), depositing the data in an appropriate OpenAIRE (http://www.openaire.eu) compatible open access Data Repository of their choice. Recipients generating the Research Data are allowed to choose a discipline-specific Data Repository, an institutional one, or a multi-disciplinary open repository like Zenodo (https://zenodo.org). Thus, NFFA-Europe Pilot relies on external services as regards the supply of
persistent identifiers to the generated datasets.
Metadata of deposited Publication Data will be released under a Creative Common Public Domain Licence (CC 0), Attribution International (CC-BY) or a licence with equivalent rights and will include at least the following:
To promote findability and reusability of Publication Data, Task 16.1 of JA6 on FAIR data foresees the realisation of the MetaRepo, a generic metadata repository and Metadata Schema registry which provides metadata versioning and, at later time, Data Provenance, metadata search and visualisation. It enables Data Curators to register Metadata Schemas in one of the two supported formats: XML Schema Definition (XSD) or JSON Schema, and it allows Research Users to store Metadata Documents which are automatically validated at upload time against the corresponding registered Metadata Schema. A more detailed description of the MetaRepo deployment is provided in Deliverable 9.1.
The MetaRepo will be populated with Metadata Documents linked to scientific Datasets resulting from Transnational Access Experiments, as well as with administrative Metadata Documents regarding the related Proposal and the Research User information.
A standard set of Metadata for each Research Users’ accepted Proposal, namely the Proposal Metadata Document, will be automatically mapped from the NEP central database and registered into the MetaRepo after an accepted Proposal is assigned to the Site(s) where the Measurements will be carried out. The Proposal Metadata Document contains the information acquired centrally through the Proposal submission form on the NFFA-Europe portal. The related Metadata Schema can be found in Annex 2 (Proposal Metadata Schema).
This will allow all the Research Data produced and made available within the NEP to be linked to the corresponding Proposal and to be accessible to the Team Members only (unless access lists are modified by the owners of the resources). Proposal metadata will not be made openly available unless the Research User decides to publish them via the MetaRepo.
We underline the fact that with more than 180 different Measurement Techniques over a wide spectrum of different scientific disciplines it is not possible to identify a unique and meaningful Metadata Schema to describe the Experiment parameters.
Metadata definition and acquisition associated with scientific Instruments and Measurement Techniques require a strong commitment and involvement of the research groups. NEP Beneficiaries and Third Parties generating Research Data are strongly recommended to use Electronic Laboratory Notebooks (ELNs), in order to facilitate good data management practices, data and documentation sharing among researchers, prove provenance and protect from data loss.
One of the main goals of WP16 is to provide guidance on the definition of procedures and associated Metadata to help Recipients to have full control of data provenance.
In particular, Task 16.3 within the JA6 (EPFL/CNR/eXact lab/KIT) will elaborate and implement FAIR-oriented procedures and recommendations to enforce data provenance in the NFFA scientific Experiment’s workflow, from data creation to data usage. The set of procedures will be developed by taking into account needs coming from various communities within NEP. Close attention will be paid to identify and tailor existing Electronic Laboratory Notebooks (ELN) and Laboratory Information Management System (LIMS) solutions for describing Sample processing workflows and (semi-) automated Metadata recording during the Experiments as initial steps for implementing FAIR by design Datasets. KIT can provide support for provenance (e.g., versioning) of existing Metadata created using an already adopted schema and for its storage.
In the NEP project, Publication Data will be made openly available as soon as possible and at the latest by the date the Scientific Publication is published, using an appropriate Data Repository, as stated at point 3.1.
To make data interoperable, that is allowing data exchange and integration between researchers, Institutions, organisations, countries, etc, Publication Data produced in the NEP project will be in a file format that can be opened with an open-source (or at least free) multi-platform software, making possible for third parties to access, mine, exploit, reproduce and disseminate it - free of charge for any user. NEP encourages the use of platform-independent and non-proprietary file formats, to ensure accessibility by others and long-term preservation, to make them reusable in the future.
To improve interoperability, Recipients are recommended to use standard and ratified Metadata Schemas to describe Research Data (Metadata Standards), such as those listed on the Research Data Alliance’s Metadata Standards Directory (http://rd-alliance.github.io/metadata-directory/), on the Digital Curation Center website (http://www.dcc.ac.uk/resources/metadata-standards) or on FAIRsharing.org (https://fairsharing.org/standards).
One of the main objectives of NEP is the creation of an advanced system for data and Metadata management and the implementation of Metadata Schemas for some of the Measurement Techniques in the NEP catalogue for which a commonly accepted Metadata Standard is not available. These Metadata Schemas aim to be commonly accepted by the relevant scientific community.
The reference format we propose for the creation of the nomenclature, Vocabulary and the Metadata Schemas is NeXus (https://www.nexusformat.org/), suitably enriched with new entries consistent with it, if necessary.
NeXus is an international standard for the storage and exchange of neutron, x-ray, and muon Experiment data, but can in principle be extended to other techniques. The structure of NeXus files is extremely flexible, allowing the storage of both simple data sets, such as a single data array and its axes, and highly complex data and their associated Metadata, such as Measurements on a multi-component Instrument or numerical simulations. NeXus is built on top of the container format HDF5 (Hierarchical Data Format 5), and adds domain-specific rules for organising data within HDF5 files in addition to a dictionary of well-defined domain-specific field names. The documentation of the NeXus format can be found at the following link: https://manual.nexusformat.org/ref_doc.html
The Metadata Schemas to be implemented should be able to describe, with a standardised structure and standard parameters, Datasets obtained from various types of Experiments, with the objective of becoming a standard. A minimum set of mandatory and recommended parameters will be accompanied by a dictionary containing the largest set of Metadata that may be needed depending on the Experiment and the Instrumentation used, to be added as needed.
Providers and experts involved in NEP are invited to cooperate and to submit their suggestions, any Metadata Standards already in use for their Instrument and their ideas for new ones.
New professional figures, defined as Data Curators or Data Stewards, will be trained and will act as a reference point for the curation and management of data within the NEP project. They will be in charge of reviewing, enhancing, cleaning, or standardising Metadata and the associated data, ensuring the FAIRness of the data.
To allow for the widest possible reuse, all Publication Data will be published using the latest available version of the Creative Commons Attribution International Public Licence (CC BY) or Creative Commons Public Domain Dedication (CC 0) or a licence with equivalent rights.
In accordance with the FAIR guidelines, the intent is that the Research Data generated within the infrastructure can be reusable even after the end of the project, if allowed, and supported by the project resources and the necessary infrastructure. The exact length of time the data will remain reusable will be defined at a later stage.
Costs for repository development, infrastructure and curation will be covered by EC funding within the timeframe of the project and imputed to a specific Work Package (WP16).
Furthermore, CNR-IOM, the coordinator of the project, signed an agreement with Area Science Park in order to host all data services and data infrastructures available to NEP within ORFEO, the AREA Science Park data centre.
Resources for long term preservation have not yet been discussed, as this involves the general progress of the project and will be addressed in further versions of NEP-DMP.
NEP offers all Recipients and all Registered Users NFFA Datashare (https://datashare.nffa.eu), a file sharing and collaboration platform hosted on Italian servers managed by the project Beneficiary eXact lab. Although its use is not mandatory, the Consortium recommends its use to all Beneficiaries and Third Parties, especially if the facilities offering transnational access are not equipped with a secure and efficient cloud storage and sharing system. This instrument offers a secure tool for data storage and retrieval and gives the possibility to access, process and share scientific data, collaborating in real time with other team members. Therefore, team members will not carry data with them (e.g. on laptops, USB sticks, or other external media).
Authentication to the data management tools used in the project (NFFA Datashare, Data tewardship Wizard and MetaRepo) are managed by the single sign-on system and stored exclusively in the Identity Provider database (Keycloak), while only an anonymous identifier is propagated. The MetaRepo collects monitoring data (ID of the authenticated user and the ID of the service) in order to send them to the NEP backend, without storing them locally.
All data centres where project data are stored carry sufficient certifications. All project web services are addressed via secure Hypertext Transfer Protocol (https). Any personal data contained in the Metadata Documents stored in the MetaRepo are neither checked or analysed, and are under the responsibility of the creator of the resource. Metadata Documents registered on the MetaRepo are stored on servers of NEP partner Karlsruher Institut für Technologie, which offer the necessary cyber security standards to guarantee protection from possible data leaks.
The archive will be backed up both on-site and off-site to protect the data against disasters. The archive is protected against loss or theft. An access control policy is implemented to provide physical access to the archive. Personal data will be protected from unauthorised access, corruption, or theft throughout the entire project’s lifecycle.
Research Data generated by Research Users and by research staff within the NFFA-Europe PILOT project do not include questionnaires dealing with personal data nor do they raise ethical questions for their sharing to the scientific community.
All personal data collected for the execution of NEP services comply with the principles of purpose limitation, data minimisation, accuracy, storage limitation, integrity and confidentiality.
In the Proposal Metadata Document -the set of metadata mapped automatically by the NEP central database and registered in the MetaRepo after an accepted proposal is assigned to the access provider(s)- there are some non-sensitive personal data of the members of the proposing research team, i.e. : user ID, email, first name, last name, affiliation, country, role of the research user in the proposal. This personal data is information deemed useful for the description of the datasets produced within the project and for their authorship, but are initially accessible only to the members of the proposal team and can be published only upon explicit action by the user on the MetaRepo.
Furthermore, among the metadata describing the datasets generated within the research institutes, there may be non-sensitive personal data of the author, for the purpose of recognizing the authorship and correct citation and recognition of the research work. We suggest that participating institutions use metadata formats that contain the minimum of personal data necessary to acknowledge the authorship of the dataset, but we do not have full control over the practices of each laboratory. The formats supported in NEP are NeXus and JSON, and the personal data we recommend collecting is: name, role, affiliation, email and ORCID.
This personal data is deliberately provided by the researcher who generates the research data, who can upload the Metadata Document on the MetaRepo, managing its accessibility (private or public).
In conjunction with the submission of the first version of the project DMP (D1.1), an extensive glossary was produced with the aim of standardising the terms used in the context of the NFFA-Europe Infrastructure.
Its content is constantly being updated and it is reachable online at https://www.nffa.eu/apply/data-policy/glossary
Metadata describing the accepted Research User Proposal, automatically acquired from the database.
Online version: https://www.nffa.eu/apply/data-policy/