DATA infrastructure of NFFA-Europe is taking shape
The First NFFA-EUROPE Science Workshop, recently held in Trieste, featured the first official release of the Information and Data management Repository Platform (IDRP), the tool that allows NFFA-EUROPE users to register, access, share and publish all the data produced within the project.
The IDRP is meant to be a pervasive tool within the project as a central metadata platform, allowing users to semi automatize the registration of scientific data collected at the NFFA-EUROPE facilities, and making the associated metadata retrievable accordingly to the FAIR Data Principles (www.nature.com/articles/sdata201618). A metadata schema has been already defined and accepted within the NFFA-EUROPE project, and is currently under discussion with other nanoscience communities. Both the IDRP prototype and the associated metadata schema are complemented by a well-defined data policy document, approved by the general assembly of the project. Such an important document defines and regulates the usage of the IDRP complying to the EU requirements of open data access and the FAIR principles.
A pervasive tool within the project as a central metadata platform
Perfectly on track, the first IDRP prototype was presented by CNR-IOM and Karlsruhe Institute of Technology (KIT) which developed jointly the prototype, which was then deployed on the CNR-IOM OpenStack cloud platform. The infrastructure comprises several different elements as sketched in the Figure, where the numbers indicate the typical workflow:
- The user access the global entry point of the prototype, which is a modified version of the already well known NFFA-EUROPE portal (www.nffa.eu), including a link to access the IDRP instance;
- A Python Command Line Interface called 'click' (Command Line Interface Cnr-iom Kit) (https://gitlab.com/NFFA-Europe-JRA3/CLI-KITDM) was developed to ingest/download instruments data and basic metadata produced at the CNR/IOM facilities to/from the local Data Management Service (DMS), based on KIT Data Manager. A first example of seamless integration into the local instrumentation was realized by an ad-hoc plugin for the Scanning Electron Microscope (SEM) with automatic metadata inclusion.
- As a first example of data analysis service, the SEM plugin already integrates a deep neural network engine for image recognition specifically trained to automatically classify SEM images. An equivalent plugin for the Beamline for Advanced diCHroism (BACH) is currently under development.
- The IDRP is linked to the local DMS at CNR-IOM to register the metadata associated with each ingested data set. The dashed arrow indicates that we are currently working to also integrate MaterialsCloud (www.materialscloud.org), the data repository developed at École polytechnique fédérale de Lausanne (EPFL).
- The user can access and manage his/her data registered on the IDRP. As a data pilot of the EUDAT project (https://www.eudat.eu/communities/information-and-data-management-repository-pla tform-for-nanoscience-in-europe), NFFA-EUROPE also integrates the B2SHARE service (https://b2share.eudat.eu/) into the IDRP, to access, share, and publish data with an associated DOI.
After the first public release, we are receiving positive feedback from project partners, and we are currently developing a more user-friendly IDRP web interface. The next release of the prototype is planned by the end of year. We look forward to interacting with interested NFFA-EUROPE partners, to use, exploit, and further enhance the IDRP prototype by including other facilities.