Speaker
Description
NOMAD [nomad-lab.eu] [1] is an open-source, community-driven data infrastructure, focusing on materials science data. Originally built as a repository for data from DFT calculations, the NOMAD software can automatically extract data from the output of over 60 simulation codes. Over the past 2 years, NOMAD’s functionalities have been extensively expanded to support advanced many-body calculations, including GW, the Bethe-Salpeter equation (BSE), and dynamical mean-field theory (DMFT), as well as classical molecular dynamics simulations. Both standardized and custom complex simulation workflows not only streamline data provenance and analysis but also facilitate the curation of AI-ready datasets. In this contribution, we will show how these features, along with NOMAD’s adherence to the FAIR principles (Findability, Accessibility, Interoperability, Reusability) [2], provide a powerful framework for enhancing data utility and discovery [3]. In particular, the distinguishing characteristics of NOMAD from other Big-Data infrastructures will be highlighted through this FAIR-compliant perspective, e.g., the ability of users to clearly specify their own data quality needs. Finally, we will present an outlook, demonstrating NOMAD’s potential for creating a cohesive, interconnected scientific data landscape, where datasets can synergistically find a second life beyond their initial publications.
[1] Scheidgen, M. et al., JOSS 8, 5388 (2023).
[2] Wilkinson, M. D. et al., Sci. Data 3, 160018 (2016).
[3] Scheffler, M. et al., Nature 604, 635-642 (2022).