November 27, 2024
Europe/Berlin timezone

Achieving semantic interoperability in materials science data and simulation workflows

Nov 27, 2024, 11:10 AM
25m

Speaker

Abril Azocar Guzman

Description

The multiscale and multidisciplinary nature of materials science leads to complex scientific workflows and highly dimensional data. A lack of structured (meta)data hinders researchers' ability to find, access, interoperate, and reuse data [1] — all critical limitations for data-driven approaches and enhancing research sustainability. To address the manifold challenges in digitalization efforts within the materials science community, we develop ontologies with the goal of achieving semantic interoperability in the context of NFDI-MatWerk across various applications and use cases. In the field of atomistic simulations specifically, several challenges impair data reusability: (1) To facilitate the understanding and reuse of atomic structure data, well-described and harmonized metadata is essential. However, most existing approaches focus solely on perfect crystal structures, often overlooking defects. (2) Calculations frequently involve a combination of different software tools and diverse file formats, resulting in heterogeneous metadata that lacks semantic interoperability. (3) Workflow provenance detailing the processes used to set up digital samples is often absent. To tackle these challenges and facilitate data reuse, we have developed the Computational Materials Sample Ontology, an application-level ontology initially focused on describing structures at the atomistic level [2]. Its use is complemented by the development of domain-level ontologies that describe crystallographic defects [3] and atomistic simulation concepts [4]. To assist domain scientists in implementing ontologies in their research, the software tool atomRDF [3] enables users to annotate their data with ontologies automatically, creating application-level knowledge graphs. This enhances the querying and findability of research data. The combination of controlled vocabularies and software tools for generating linked open data promotes interoperability across file formats and software, while also offering potential for knowledge engineering and AI-ready data, which accelerates materials discovery.
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al., Sci Data, 2016, 3, 160018.
[2] https://purls.helmholtz-metadaten.de/cmso/
[3] https://github.com/OCDO/
[4] https://purls.helmholtz-metadaten.de/asmo/
[5] https://github.com/pyscal/atomRDF

Presentation materials

There are no materials yet.