Speaker
Description
The advent of data-driven approaches in materials science requires the aggregation of heterogeneous data from various sources, including simulation and experiments, which span different length scales and encompass a wide range of compositions, structures and thermodynamic conditions. In materials design, a major challenge arises from the combination of different software and file formats, leading to interoperability issues. To achieve workflow and data reusability, and meaningful interpretation, it is crucial to ensure well-described (meta)data at each step of the simulation workflow. Our aim is to establish a machine-readable standard for representing material structures, workflows and calculated properties, including their intrinsic relationships.
To describe simulations at the atomistic level, we have developed an ontology for computational material samples, CMSO, complemented by ontologies for crystallographic defects, which are often neglected in standardization approaches. Another essential aspect to achieve interoperability is describing the simulation method, this is facilitated by the Atomistic Simulation Methods Ontology (ASMO). Data annotation using these ontologies is embedded directly in the workflow with the software atomRDF. This allows users to semantically annotate jobs using pyiron as an example for the workflow environment and build an application-level knowledge graph.
We demonstrate the benefits of such a knowledge graph for: (i) aggregating data from heterogeneous sources in a scale-bridging fashion, (ii) allowing complex queries through an automated system to explore the data; (iii) identifying new trends and extracting material properties that were not explicitly calculated. We illustrate these benefits with two examples: the calculation of formation energies of crystal defects and the extraction of thermodynamic quantities from existing simulation data. This innovative approach, combining simulation workflows and semantic technologies, accelerates the analysis, sharing and reuse of data. Leveraging the advantages of a knowledge graph enhances interoperability and data quality, increasing compliance with the FAIR principles.