1 petabyte (PB)
In Part 1 in our Spring issue we concluded that flexible, adaptable, affordable and usable solutions for Open Science are required. Now let’s look at the people creating these solutions for EUROfusion.
Back in 2016 a consortium of scientists published data sharing principles called “FAIR”, which stands for its four foundational principles: Findability, Accessibility, Interoperability, and Reusability. “Or reproducibility,” adds Pär. “Because recreating data or redoing subsets of analysis is what’s important. For this we need metadata: data about data. For example if we want to reproduce electron temperature measurements we need to know which methods were used and what processes were followed.” Put simply, the more metadata the better. “Including who created the data,” advises David, “in order to give recognition. And who has used the data because if we discover an error we need to fix it everywhere.” However, the storage and use of personal metadata is governed by Europe’s GDPR (General Data Protection Regulation), which complicates its collection and requires special administration. Which raises the question: how do you store all this data in a way that is FAIR, compliant and cost-effective? “Science is self-correcting,” tells David. “But sometimes this takes years, even decades, to disprove. So determining how long we should store and share data is another aspect we will be exploring.”
Even with all this metadata, reproducing fusion data is a challenge. “There is always variability between fusion devices,” explains David. “Each is one-of-a-kind right down to its software.” This is because prior to the 2014 EUROfusion collaborative agreement each institute set their own standards. So each experimental fusion device generates different outputs (data) from the same process. Comparing the results of a single aspect of a shot (an experiment) requires a detailed understanding of each machine. Data therefore has to be decoded for inter-machine data sharing and analysis. But changing results into a comparable format can take months. This is now changing.
“We’ve adopted an Interface Data Structure (IDS) which we’ll use for all of ITER’s research data,” explains Denis Kalupin. “Then for each research machine we’ll build a step that converts their disparate results into the IDS format. We’ll also have an entire package of tools able to analyse IDS data. Each institute or researcher can use them according to their needs. Once in place, EUROfusion scientists will be able to efficiently compare data from different machines, leveraging decades of investment in fusion research.”
Already the WEST fusion research device has adopted IDS. Given ITER’s importance, IDS is expected to become the global standard for fusion research data. It will be the backbone of the ITER Integrated Modelling & Analysis Suite (IMAS) that all ITER members will use.
EIROforum represents seven intergovernmental scientific research organisations and EUROfusion. Standing on the shoulders of these giants is the charmingly enthusiastic Rupert Lück. Coming from EIROforum’s IT group, he is setting up the governance for the European Open Science Cloud (EOSC) initiative. EOSC is designing the technology infrastructure for Open Science. They’re already sharing petabytes (PB) of data.
Combining vision with pragmatism, Rupert speaks convincingly about the objectives of the EOSC Science Cloud and the promotion of Open Science. “Pollution, cancer, food security, sustainable energy… there are so many challenges we need to solve and the best way is to do it together,” shares Rupert. “Cross-disciplinary sharing of scientific information and data is creating new insights and solutions – some we could never have predicted.”
EOSC is committed to finding practical ways to increase sharing and build new momentum within the broader scientific community, and is exploring services, policies, framework, data storage standards, and ways to get scientists to share and use Open Science. “We want to create the best possible infrastructure for Open Science that can bring together resources and build on what is there,” tells Rupert.
“The more we cross-share within wider scientific communities, the more we can benefit,” enthuses Rupert. It comes down to what can be activated and accessed. This is why increasing sharing is the goal instead of forcing everyone to share everything. “Creativity, openness and of course funding are the critical factors. Let’s see what happens!”
This openness and support is reflected at EUROfusion. “This will be an enormous overhaul but there is the potential for a lot of good to come out of it,” agrees Tony Donné, EUROfusion Programme Manager. “We are coming up with a practical proposal that fits to our field of science. This is the key to making the most of Open Science.”
As EUROfusion adopts Open Science, we are already realising benefits and efficiencies. Sharing massive amounts of research data, metadata and tools in a cost-effective manner will definitely be a challenge. But the prospective gains are tremendous. “Having the entire world using one language for fusion will take collaboration and sharing to a whole new level,” shares Pär Strand. “It will accelerate fusion research and increase returns on public investment.”
We have started Fair4fusion for which we are bringing together technical expertise and EUROfusion data providers to prototype open data access solutions. The aim is to provide a blueprint for a full scale implementation for the European Fusion community to decide and build their future data access on.