Jul 2 – 3, 2025
Ruhr-Universität Bochum
Europe/Berlin timezone

Creating Data Analysis Pipelines Using the MADAS Framework

Jul 2, 2025, 11:30 AM
30m
UFO (Ruhr-Universität Bochum)

UFO

Ruhr-Universität Bochum

Querenburger Höhe 283, 44801 Bochum

Speaker

Martin Kuban

Description

The NOMAD data infrastructure provides access to vast amounts of data that can be used for data analytics and machine learning (ML). Often, however, not all (meta)data are relevant for every task, making it necessary to apply filtering and processing steps to prepare input data for ML.
Here, we present MADAS, a Python framework that supports all steps of data analytics and machine learning, including automated download and storage of data, generation of material descriptors, and computing similarity metrics, and integrates well with established ML frameworks and libraries. MADAS allows to write robust, re-usable data analysis pipelines, while its modular structure allows to quickly extend the data processing with custom functions.
We demonstrate its capabilities and features by finding interoperable data within a large computational dataset hosted on NOMAD, and by finding distinct materials that exhibit similar electronic structures.

Presentation materials

There are no materials yet.