Efforts in materials science face challenges due to the heterogeneity and complexity of data sources, disparate data formats, and the need for standardized metadata. The NFDI-MatWerk ontology (MWO) [1] and the Materials Science and Engineering Knowledge Graph (MSE-KG) [2] aim to address these challenges by providing a unified framework for representing and integrating diverse data types and...
NFFA-DI (Nanoscience Foundries and Fine Analysis – Digital Infrastructure) is the NFFA upgrade for realizing a Full-Spectrum Research Infrastructure for nanoscience and
nanotechnology, capable of enhancing the Italian research competitiveness on the fundamental interactions of multi-atomic matter to
explore the origins of materials behaviour. The rationale of NFFA-DI is to integrate...
In order to fulfill the interoperability requirement for FAIR research data, (meta)data need to comply with a community-agreed-upon language.
In the NOMAD Archive, materials science data are collected from heterogeneous sources, spanning synthesis, experimental characterization, and computations for modelling and analysis. This diversity necessitates flexible storage options, allowing users...
NOMAD [nomad-lab.eu] [1] is an open-source, community-driven data infrastructure, focusing on materials science data. Originally built as a repository for data from DFT calculations, the NOMAD software can automatically extract data from the output of over 60 simulation codes. Over the past 2 years, NOMAD’s functionalities have been extensively expanded to support advanced many-body...
When gathering your research data and creating a knowledge graph, two aspects are key for achieving high data quality: making your data globally understandable and meaningful by semantic enrichment, and ensuring local conformance and completeness of your data by running validations. There are widely used languages within the Resource Description Framework (RDF) ecosystem to support these...
An extensible open-source platform to support digitalization in materials science is proposed. The platform provides a modular framework for flexible web-based implementation of research data management strategies at scales ranging from a single laboratory to international collaborative projects involving multiple organizations.
The platform natively supports object types related to...
Developing new materials requires extensive experimentation in synthesis and characterization, generating vast data sets. To keep this wealth of knowledge and adhere to the FAIR principles, effective data management is essential, involving standardized metadata schemas and integrated analysis tools. NOMAD has recently incorporated structured metadata schemas to manage experimental data from...
Research data management i(RDM) has been receiving much attention, being in the focus of many institutes often upon pressure from funding agencies and by thriving for good scientific practice.
Several solutions are being developed, mostly focusing on central database systems allowing for structured data storage. These solutions allow for classification, access control, publishing of data....
Material databases contain vast amounts of information, often harboring intricate connections and dependencies within material systems, some of which may remain undiscovered. Their structured organization naturally lends itself to the application of machine learning techniques. Through machine learning, we can unlock the tools necessary to discover potentially hidden structure-property...
In materials development, creating new data points is often very costly due to the effort needed for materials synthesis, sample preparation and characterization. Therefore, all available knowledge in terms of data, physical models and expert knowledge should be exploited in the most efficient way (optimal knowledge exploitation). Moreover, the number of new samples/data points to be...
Abstract:
With advancements in the sensitivity of present synchrotron facilities and the refinement of analytical methods, X-ray based techniques have become a standard approach for the structural characterization of intricate solid material systems. X-ray absorption spectroscopy (XAS) stands out as one of the most effective methodologies utilized for the analysis of various...
X-ray absorption spectroscopy (XAS) is one of the characterisation techniques which can be employed to probe electronic structure as well as local structure of functional materials. XAS data analysis involves comparison with theoretical or experimental references and processing of the data includes steps, i.e., calibration, background subtraction, normalization etc. Thus, for the extraction of...
Automating instrumentation is a big challenge for any lab. In established labs, there is often large amount of existing infrastructure, with the benefits of automation only tangible after several components of a set-up are automated. In smaller labs, automation is often hampered by lack of personnel and know-how.
Here, we present tomato, an open-source, python-based, cross-platform...
The advent of data-driven approaches in materials science requires the aggregation of heterogeneous data from various sources, including simulation and experiments, which span different length scales and encompass a wide range of compositions, structures and thermodynamic conditions. In materials design, a major challenge arises from the combination of different software and file formats,...
The aim of this development is to create a tool to automatically gather
and represent ’Knowledge’. The user of this tool should be able to present a collection of Portable Document Format files (PDFs) to the system, and to generate therefrom graphs representing Concepts and Relationships mentioned within the files.
An application for the tool is the Analysis of norms and standards, which...
In modern material science the amount of generated experimental data is rapidly increasing while analysis methods still require many manual work hours. Especially, this is the case for X-ray photoelectron spectroscopy (XPS), where quantification is a complex task and, in many cases, can be properly done by experts only. However, these problems could be overcome by the use of a neural...
Achieving an interoperable representation of knowledge for experiments and computer simulations [1-4] is the key motivation behind the implementation of tools for FAIR research data management in the condensed-matter physics and materials engineering communities. Electron microscopy and atom probe tomography are two key materials characterization techniques used globally and across...
The FAIR principles (Findable, Accessible, Interoperable, Reusable) serve as a reference for assessing the quality of data storage and publication [1]. NOMAD [nomad-lab.eu] [2, 3] is an open-source data infrastructure for materials science data that is built upon these principles. In this contribution, we will demonstrate the interplay between high-quality data and knowledge using the...
Recent discoveries in astroparticle physics, including cosmic accelerators, gravitational waves from black-hole mergers, and astronomical neutrino sources, underscore the importance of a multi-messenger approach. The transient and rare nature of these astrophysical phenomena necessitates interdisciplinary work with diverse modern and historical data, emphasizing the need for FAIR (Findable,...
Optical spectroscopy covers experimental techniques such as ellipsometry, Raman spectroscopy, or photoluminescence spectroscopy. In the upcoming transformation process of the research environment towards FAIR data structures, these techniques will play a crucial role as they govern various fundamental and easily accessible material properties such as reflectivity, light absorption, bandgap, or...
The emergence of big data in science underscores the need for FAIR (Findable, Accessible, Interoperable, Reusable) [1] data management. NOMAD [nomad-lab.eu] [2, 3] is an open-source data infrastructure that meets this demand in materials science, enabling cross-disciplinary data sharing and annotation for both computational and experimental users. In this contribution, we will present our...
The PATOF project builds on work at MAMI particle physics experiment A4. A4 produced a stream of valuable data for many years which already released scientific output of high quality and still provides a solid basis for future publications. The A4 data set consists of 100 TB and 300 million files of different types (Vague context because of hierarchical folder structure and file format with...
There has been a distinct lack of FAIR data principles in the field of photoemission spectroscopy (PES). Within the FAIRmat consortium, we have been developing an end-to-end workflow for data management in PES experiments using NOMAD and NeXus, a community-driven data-modeling framework for experiments [1]. We will present an extensive and elaborated standard (NXmpes) for harmonizing PES data...
In order to achieve interoperability for data of different origin, FAIRmat is contributing to the materials science data management platform, NOMAD. It features flexible, but structured data modeling, allows custom data ingestion, while providing efficient search capabilities and online visualization of datasets. Several standard data formats are supported by NOMAD including the NeXus format...
State-of-the-art Bayesian optimization algorithms have the shortcoming of relying on a rather fixed experimental workflow. The possibility of making on-the-fly decisions about changes in the planned sequence of experiments is usually excluded and the models often do not take advantage of known structure in the problem or of information given by intermediate proxy measurements [1-3]. We...
Nanophotonic structures that enhance light-matter interaction can increase the sensitivity of spectroscopic optical measurements, such as detection and enantiomer discrimination of chiral molecules. However, this improved sensitivity comes at the cost of complicated modification of the spectra, and it is necessary to account for this during the experiment and in data analysis. This calls for...
Digital twin platform design for battery manufacturing and battery materials life cycle assessment
BatCAT is the project that realizes the Battery2030+ manufacturability programme from 2024 to 2027 by developing a digital twin platform and data space for battery manufacturing; primarily, BatCAT considers vanadium-based redox-flow batteries as well as Li-ion and Na-ion coin cells. The...
In the rapidly evolving field of materials science, the shift towards data-centric research needs enhanced strategies for data management, sharing, and publication. This presentation introduces NOMAD (https://nomad-lab.eu), a web-based platform developed by the NFDI consortium FAIRmat. Designed to address these challenges, NOMAD pioneers the application of FAIR principles (Findable,...
Data Science (DS) is a multidisciplinary field combining different aspects of mathematics, statistics, computer science, and domain-specific knowledge to extract meaningful insights from diverse data sources. DS and AI involve various artifacts, e.g., datasets, models, ontologies, code repositories, execution platforms, repositories, etc. The NFDI4DataScience (NFDI4DS) project endeavors to...
Self-assembling peptides (SAPs) are a type of biomaterial consisting of short aminoacid sequences that can be controlled under specific physicochemical conditions. SAPs form nanostructures that can mimic biological scaffolds giving them numerous applications such as in drug delivery, tissue engineering, biosensors, etc.
In this project we will create new SAP sequences based on desired...
Many phenomena and functional devices in optics and photonics rely on discrete objects, called scatterers, that interact with light in a predefined way. The optical properties of these scatterers are entirely described by the T-matrix. The T-matrix is computed for a given scatterer from a larger number of solutions to the Maxwell equations. Still, once known, various photonic materials made...
Vapor deposition encompasses a vast array of techniques ranging from chemical vapor deposition (CVD) processes like metal-organic vapor phase epitaxy (MOVPE) to physical vapor deposition (PVD) processes like pulsed laser deposition (PLD). These processes are used within a diverse set of industries to deposit thin films and coatings for everything from television screens to corrosion...
Introduction
The acquisition and storage of experimental data in the field of catalysis according to the FAIR principles (Findable, Accessible, Interoperable, and Reusable) necessitates the automation and digitization of experimental setups. In this work, we present our local solutions, in which we have integrated the concept of Standard Operating Procedures (SOPs) into automation...
The advancement of digitalization in catalysis and other scientific domains is marked by a transition from paper-based documentation to electronic lab notebooks, standardized protocols, and experiment automation. This shift promises enhanced reproducibility, comparability, and overall scientific progress. However, at the moment the field of catalysis still lacks universal standards for...
A transition from polluting fossil fuels to cleaner energy sources is underway. However, the intermittent nature of renewables such as solar and wind, dependent on fluctuating environmental conditions, presents a challenge for maintaining a reliable energy supply. Water electrolysis offers a solution by employing excess renewable energy to split water into $\mathrm{H}_2$ and $\mathrm{O}_2$,...
Computational databases are pivotal in modern chemistry, enabling the advanced data-driven exploration of chemical space. Transition metal complexes are a particularly versatile class of molecules due to their tunability of metal center and coordinating ligands, offering broad applications in therapeutics, catalysis and supramolecular chemistry. However, exploring the vast chemical space of...
Atom Probe Tomography (APT) is widely used for nanoscale structure and composition characterization across various disciplines, including materials science, geosciences, and biological sciences. Therefore, it is essential to have standardized workflows for analysis and post-processing that can combine software tools from different research communities in an interoperable manner. We demonstrate...
Aiming at data-driven design of magnetic materials as a demonstration of using NOMAD to integrate automated workflows, metadata formulation, and machine learning, we elucidate how research data management can be implemented for first-principles calculations on magnetic materials. On the one hand, we have established workflows to perform high-throughput calculations on the intrinsic magnetic...
Structured data, in which properties of materials, systems, or devices, are tabulated in a systematic way is a foundation for the methodical optimization and design of novel materials or devices. One of the most widely known databases in materials science is the metal-halide perovskite solar cells database. While this database found widespread use it is difficult to update and extend as it has...
Inorganic halide perovskites are promising for optoelectronic applications, offering greater thermal stability over hybrid counterparts but are prone to phase instabilities. Phase stability can be improved by compositional engineering, e.g., varying the Cs/Pb and I/Br ratio. Combinatorial vacuum coevaporation allows the investigation of the large compositional space of Cs(Sn,Pb)(I,Br)3 in the...
New materials are conventionally developed via trial and error in laboratory experiments.
This process is in general slow and involves significant resources and research eEorts.
Furthermore, it can overlook potential candidates, properties, or business-case criteria
related to their use. Computational simulation methods can help solve these problems by
accelerating the screening process...
While rapid exploration and optimisation of solution-processable materials in self-driving laboratories (SDLs) is advanced, adapting these approaches for inorganic materials using physical vapour deposition (PVD) presents challenges due to increased experimental complexity and higher time and energy demands for sample production. It is thus critical that the SDL’s underlying algorithms learn...
Enhancing data interpretation and interfacing in energy systems analysis are key concepts to make research in the energy domain more FAIR and, thus, more efficient.
When producing data, their enrichment with metadata, describing them in a standardised way, is a challenge for every researcher. The publication of these metadata on a distributed data infrastructure (e.g. the databus within the...
State-of-the-art Bayesian optimization algorithms have the shortcoming of relying on a rather fixed experimental workflow. The possibility of making on-the-fly decisions about changes in the planned sequence of experiments is usually excluded and the models often do not take advantage of known structure in the problem or of information given by intermediate proxy measurements [1-3]. We...
Infrared Spectroscopy (IR) is crucial in heterogeneous catalysis for identifying active sites, yet existing simulations lack comprehensice peak broadening output. We propose an application to generate complete spectra from Density Functional Theory (DFT) data, facilitating comparison with experimental results. Built on CaRMeN, it manages data in an SQL database, ensuring efficiency and...
Most current explainable AI methods are post-hoc methods that analyze trained models and only generate importance annotations, which often leads to an accuracy-explainability tradeoff and limits interpretability. Here, we propose a self-explaining multi-explanation graph attention network (MEGAN) [1]. Unlike existing graph explainability methods, our network can produce node and edge...
NOMAD [nomad-lab.eu] [1, 2] is an open-source data infrastructure for materials science data. NOMAD already supports an array of computational codes and techniques, with over 60 parsers that automatically extract essential (meta)data from the raw output of standard calculations. Traditionally, the NOMAD repository has focused on contributions from DFT calculations, accumulating over 12.5...
Advancements in materials science are significantly dependent on the detailed characterization of samples, which in turn generates complex measurement data. This poses challenges in data management, notably in metadata preservation and the need for extensive manual processing, often exceeding the expertise of researchers. The FAIR principles offer a pathway towards resolving these issues...
The rise of digitalization has significantly reshaped scientific practices, positioning research data as a valuable asset. New research paradigms have emerged that extend the use of these data beyond their original research purposes. As a result, proper data preservation in line with the FAIR principles,[1] as well as the legal aspects relevant to the preservation and reuse of these data, have...
A key challenge in experimental high-resolution microscopy is the real-time interpretation of the observed images in conjunction with the parameters adjusted by the experimenter during data acquisition, e.g. to obtain a certain contrast. The parameter space of candidate structures, experimental parameters, and resulting image contrast can be vast and complex, often requiring a scientist who is...
Electronic Laboratory Notebooks (ELNs) are crucial for moving research data from paper to digital formats, streamlining lab workflows and digitizing data. This study examines integrating ELNs into Research Data Management (RDM) platforms like NOMAD, focusing on challenges like user acceptance and data structuring.
ELNs need to be user-friendly and structure data effectively for integration...
Recently, funding agencies have begun to require sections on research data management in grant applications and the submission of a detailed Data Management Plan (DMP) during the initial phase of a funded research project. These DMPs are set as milestones to be achieved for a successful research project. Scientists often view DMPs as a burden and additional work that distracts them from active...
The vast chemical landscape of metal–organic frameworks (MOFs) offers a rich array of compositions, structures, and potential applications.[1] Advancements in Artificial Intelligence (AI) and computer-assisted techniques have not only enhanced MOF discovery but also the field of MOF synthesis.[2] In this presentation, I will offer an experimentalist’s perspective on integrating AI into MOF...
Integrating artificial intelligence (AI) with metal-organic frameworks (MOFs) and highly versatile and structurally diverse materials heralds a new era in material science, offering groundbreaking solutions to longstanding challenges in engineering and data analytics. MOFs, known for their exceptional porosity and customizable frameworks, have shown promising applications across various...
Hydride materials that can reversibly abs-/desorb hydrogen have been intensively investigated due to their potential as a hydrogen storage medium and functional materials for many applications. The dynamic reaction between hydride/hydride-forming material and the gaseous phase is complex and has several intermediary processes, making the insightful description of these phenomena impractical,...
ALBA synchrotron has pledged to follow the FAIR data management principles by which results produced by academic users will be available to the public. One key step of this commitment is to standardize the process by which data are stored. At ALBA this process is being made by rigorously following NeXusFormat application definitions that determine which metadata are essential to replicate the...
Surprisingly, despite the rapid progress of machine learning in materials science, the prediction of optical spectra for crystalline materials remains underexplored, although this gap presents an opportunity to discover novel or tailored materials for various optical applications, including photovoltaic systems, photocatalytic water splitting, epsilon-near-zero materials, optical sensors, and...
In an era of rapid technological advancement and data proliferation, the ability to efficiently access and utilise scientific knowledge has become paramount. Here, we present the development of an advanced research assistant chatbot specifically designed to navigate and interpret scientific publications. Our approach uses the Retrieval-Augmented Generation (RAG) architecture, a...
A lot of materials knowledge is obtained in an indirect manner, e.g. by fitting model parameters to data that is being acquired in some potentially very complex experiment. Electron microscopy data, for example, can be several 10s of GB; and especially for these very large sets of data, complex data analysis workflows (DAWs) must then be run, for extracting the materials property information...