FAIR-DI European Conference on Data Intelligence 2024

Europe/Berlin
Achat Hotel Karlsruhe City

Achat Hotel Karlsruhe City

Mendelssohnplatz, 76131 Karlsruhe
Description

Building on the success of previous editions, we are pleased to announce the upcoming FAIR-DI European Conference on Data Intelligence 2024. Following the remarkable achievements and insightful discussions at the 2020 and 2022 Conferences on a FAIR Data Infrastructure for Materials Genomics, we are excited to continue the discussion and further collaboration and innovation in this growing field with the theme of Data Intelligence.

Conference topics:

  • The “NFDI Cosmos”
  • Data management, stewardship & databases (FAIR research, repositories, ontologies, error quantification, etc.)
  • Machine learning (ML) applications using existing data repositories
  • New strategies for materials synthesis based on ML-approaches
  • Developing strategies for scale-bridging workflows in computational materials science
  • Digital twins
  • High-throughput simulation
  • Automation (autonomous experimentation)
    • 4:30 PM 6:30 PM
      Arrival
    • 6:30 PM 8:00 PM
      Dinner
    • 8:00 PM 9:00 PM
      Welcome
    • 8:45 AM 10:30 AM
      Session - click on "Detailed view" on the top right to see all contributions
      • 8:45 AM
        Machine-learning assisted discovery and characterization of materials 40m

        The development of density-functional theory in the 1960s and the dissemination of computers led to a revolution in materials science. A third kind of physics, computational physics, emerged to complement its theoretical and experimental sisters. Nowadays, with the availability of ever faster supercomputers and novel computer methodologies, we are living what one can call the second computer revolution in materials science. High throughput techniques, together with ever faster supercomputers, allow for the automatic screening of thousands or even millions of hypothetical materials to find solutions to present technological challenges. Moreover, machine learning methods are used to accelerate materials discovery by complementing density-functional theory with extremely efficient statistical models. In this talk we summarize our recent attempts to discover, characterize, and understand inorganic compounds using these novel approaches. We start by motivating why the search for new materials is nowadays one of the most pressing technological problems. Then we summarize our recent work in using crystal-graph attention neural networks for the prediction of materials properties. To train these networks, we constructed a dataset of over 4.5 million density-functional calculations with consistent calculation parameters. Combining the data and the newly developed networks we have already scanned more than two thousand prototypes spanning a space of more than several billion materials and identified tens of thousands of theoretically stable compounds. We then show how this data can be used to scan for material with interesting properties.

        Speaker: Miguel Marques
      • 9:25 AM
        C01 20m
      • 9:45 AM
        C02 20m
      • 10:05 AM
        C03 20m
    • 10:30 AM 11:00 AM
      Coffee Break
    • 11:00 AM 1:00 PM
      Session - click on "Detailed view" on the top right to see all contributions
      • 11:00 AM
        Auto-generated Materials Databases and Language Models 40m

        Data-driven materials discovery is coming of age, given the rise of 'big data' and machine-learning (ML) methods. However, the most sophisticated ML methods need a lot of data to train them. Such data may be custom materials databases that comprise chemical names and their cognate properties for a given functional application; or data may comprise a large corpus of text to train a language model. This talk showcases our home-grown open-source software tools that have been developed to auto-generate custom materials databases for a given application. The presentation will also demonstrate how domain-specific language models can now be used as interactive engines for data-driven materials science; The talk concludes with a forecast of how this 'paradigm shift' away from the use of static databases will likely evolve next-generation materials science.

        Speaker: Jacqui Cole
      • 11:40 AM
        C04 20m
      • 12:00 PM
        C05 20m
      • 12:20 PM
        C06 20m
      • 12:40 PM
        C07 20m
    • 1:00 PM 2:00 PM
      Lunch
    • 2:00 PM 4:00 PM
      Session - click on "Detailed view" on the top right to see all contributions
      • 2:00 PM
        From the Roots of Data Science to Machine Learning of Materials’ Microstructures and Properties 40m

        We start our journey by taking a look at the root of "data science" and observe that data and data analysis was often related to imaging.
        Therefore, a major part of this presentation is dedicated to image data, be it from microscopy or from simulations. We demonstrate how Deep Learning approaches allow to quantitatively analyze in-situ TEM data, uncovering the complex behavior of the motion of crystalline defects.
        Furthermore, we take a look at microsctructure-property relations and discuss how creating this bi-directional link between (again: image) data and material properties through tailored machine learning approaches might speed up or sometimes even replace costly computer simulations and experiments -- one of the stepping stones towards accelerated materials design.

        Speaker: Stefan Sandfeld
      • 2:40 PM
        C08 20m
      • 3:00 PM
        C09 20m
      • 3:20 PM
        C10 20m
      • 3:40 PM
        C11 20m
    • 4:00 PM 4:30 PM
      Coffee Break
    • 4:30 PM 6:30 PM
      Session - click on "Detailed view" on the top right to see all contributions
      • 4:30 PM
        Transforming chemistry with transformers 40m

        The field of chemical sciences has seen significant advancements with the use of data-driven techniques, particularly with large datasets structured in tabular form.
        However, collecting data in this format is often challenging in practical chemistry, and text-based records are more commonly used [1]. Using text data in traditional machine-learning approaches is also difficult. Recent developments in applying large language models (LLMs) to chemistry have shown promise in overcoming this challenge. LLMs can convert unstructured text data into structured form and can even directly solve predictive tasks in chemistry. [2, 3] In my talk, I will present the impressive results of using LLMs, showcasing how they can autonomously utilize tools and leverage structured data and “fuzzy” inductive biases. To enable the training of a chemical-specific large language model, we have curated a new dataset along with a comprehensive toolset to utilize datasets from knowledge graphs, preprints, and unlabeled molecules. To evaluate frontier models trained on such a dataset, we specifically designed a benchmark to evaluate the chemical knowledge and reasoning abilities. I will present the latest results, demonstrating the potential of LLMs in advancing chemical research. [4]

        References:
        [1] Jablonka, K. M.; Patiny, L.; Smit, B. Nat. Chem. 2022, 14 (4), 365–376.
        [2] Jablonka, K. M; et al. Digital Discovery 2023, 2 (5), 1233–1250.
        [3] Jablonka, K. M.; Schwaller, P.; Ortega-Guerrero, A.; Smit, B. Leveraging large language models for predictive chemistry. Nat. Mach. Int. 2024, 6, 161–169.
        [4] Mirza, A.; Alampara, N.; Kunchapu, S.; Emoekabu, B.; Krishnan, A.; Wilhelmi, M.; Okereke, M.; Eberhardt, J.; Elahi, A. M.; Greiner, M.; Holick, C. T.; Gupta, T.; Asgari, M.; Glaubitz, C.; Klepsch, L. C.; Köster, Y.; Meyer, J.; Miret, S.; Hoffmann, T.; Kreth, F. A.; Ringleb, M.; Roesner, N.; Schubert, U. S.; Stafast, L. M.; Wonanke, D.; Pieler, M.; Schwaller, P.; Jablonka, K. M. Are Large Language Models Superhuman Chemists? arXiv 2024. https://doi.org/10.48550/ARXIV.2404.01475.

        Speaker: Kevin Jablonka
      • 5:10 PM
        C12 20m
      • 5:30 PM
        C13 20m
      • 5:50 PM
        C14 20m
      • 6:10 PM
        C15 20m
    • 6:30 PM 8:00 PM
      Dinner
    • 8:00 PM 9:00 PM
      Poster Session
    • 8:45 AM 10:30 AM
      Session - click on "Detailed view" on the top right to see all contributions
      • 8:45 AM
        Machine-learning you can trust: interpretability and uncertainty quantification in chemical machine learning 40m

        Molecular dynamics simulations combined with first-principles calculations have long been the gold-standard of atomistic modeling, but have also been associated with steep computational cost, and with limitations on the accessible time and length scales. Machine-learning models have greatly extended the range of systems that can be studied, promising an accuracy comparable with that of the first-principles reference they are fitted against.
        Given the interpolative nature of machine-learning models, it is crucial to be able to determine how reliable are the predictions of simulations that rely on them, as well as to understand the physical underpinnings -- if any -- for the successes and failures of different frameworks.
        I will discuss a few examples of how understanding the mathematical structure of ML models helps to use them to interpret the outcome of atomistic simulations, in terms of familiar concepts such as locality, range and body order of interactions.
        Then, I will give a brief overview of the different approaches that are available to obtain a quantitative measure of the uncertainty in a machine-learning prediction, and discuss in particular an inexpensive and reliable scheme based on an ensemble of models. By a scheme that we refer to as "direct propagation of shallow ensembles" (DPOSE) we estimate not only the accuracy of individual predictions, but also that of the final properties resulting from molecular dynamics and sampling based on ML interatomic potentials.

        Speaker: Michele Ceriotti
      • 9:25 AM
        C16 20m
      • 9:45 AM
        C17 20m
      • 10:05 AM
        C18 20m
    • 10:30 AM 11:00 AM
      Coffee Break
    • 11:00 AM 1:00 PM
      Session - click on "Detailed view" on the top right to see all contributions
      • 11:00 AM
        AI-ready materials science data 40m

        In the rapidly evolving field of materials science, the shift towards data-centric research needs enhanced strategies for data management, sharing, and publication. This presentation introduces NOMAD (https://nomad-lab.eu), a web-based platform developed by the NFDI consortium FAIRmat. Designed to address these challenges, NOMAD pioneers the application of FAIR principles (Findable, Accessible, Interoperable, and Reusable) to materials science data, thereby facilitating a more efficient, open and collaborative research environment in a federated infrastructure fashion. The core focus of this talk is the striking changes NOMAD has undergone in transitioning from an archive and repository for ab-initio calculations to a global platform for managing materials science data. I will introduce NOMAD Oasis, a locally installable and customizable version of this platform designed to enable the creation of FAIR data from its inception, while simultaneously becoming nodes in a rapidly expanding network of interconnected data hubs. These platforms support a broad spectrum of data-driven research activities within materials science. I will showcase how NOMAD's infrastructure serves as a critical backbone for data-driven research across various domains. These include the accelerated synthesis of materials via physical vapor deposition methods, complex computational workflows, big data strategies for developing novel solar cells, hosting databases for experimental heterogeneous catalysis and metal-organic frameworks, and powering the application of generative AI in materials research.

        Speaker: Jose Marquez (Humboldt University of Berlin)
      • 11:40 AM
        C19 20m
      • 12:00 PM
        C20 20m
      • 12:20 PM
        C21 20m
      • 12:40 PM
        C22 20m
    • 1:00 PM 2:00 PM
      Lunch
    • 2:00 PM 4:00 PM
      Session - click on "Detailed view" on the top right to see all contributions
      • 2:00 PM
        Correlative characterization and data science in functional materials 40m
        Speaker: Francesca Toma
      • 2:40 PM
        C23 20m
      • 3:00 PM
        C24 20m
      • 3:20 PM
        C25 20m
      • 3:40 PM
        C26 20m
    • 4:00 PM 4:30 PM
      Coffee Break
    • 4:30 PM 6:30 PM
      Session - click on "Detailed view" on the top right to see all contributions
      • 4:30 PM
        The Role of Data Intelligence in Chemistry Research Data Infrastructures 40m

        The utilization of data intelligence tools presents numerous advantages for scientists and holds significant potential to streamline and expedite scientific endeavors across various domains. Specifically, research data infrastructures must address the opportunities and obstacles posed by data intelligence to ensure optimal support for their users and the broader scientific community. This talk will describe two general aspects within the realm of chemistry research data: (1) How does research data infrastructure benefit from the development and implementation of data intelligence tools and how can data intelligence support different areas of a research data infrastructure? (2) How can a research infrastructure contribute to promote the development of data analysis tools? What are suitable measures to design the future of chemistry work by promoting data intelligence in the long run? For both aspects, examples taken from the Chemotion ELN and the Chemotion repository will be utilized to describe current, already implemented tools and workflows, as well as those planned within NFDI4Chem. A highlight dealing with the half-automated curation of data will show the impact of data intelligence on efficient review options for scientific data. In this context, the impact of data intelligence on the establishment of automated synthesis platforms will be discussed.

        Speaker: Nicole Jung
      • 5:10 PM
        C27 20m
      • 5:30 PM
        C29 20m
      • 5:50 PM
        C30 20m
      • 6:10 PM
        C28 20m
    • 6:30 PM 8:00 PM
      Dinner
    • 8:00 PM 9:00 PM
      Poster Session
    • 8:45 AM 10:30 AM
      Session - click on "Detailed view" on the top right to see all contributions
      • 8:45 AM
        Active learning for data-efficient optimisation of materials and processes 40m

        The arrival of materials science data infrastructures in the past decade has ushered in the era of data-driven materials science based on artificial intelligence (AI) algorithms, which has facilitated breakthroughs in materials optimisation and design. Of particular interest are active learning algorithms, where datasets are collected on-the-fly in the search for optimal solutions. We encoded such a probabilistic algorithm into the Bayesian Optimization Structure Search (BOSS) Python tool for materials optimisation [1]. BOSS builds N-dimensional surrogate models for materials’ energy or property landscapes to infer global optima, allowing us to conduct targeted materials engineering. The models are iteratively refined by sequentially sampling materials data with high information content. This creates compact and informative datasets. We utilised this approach for computational density functional theory studies of molecular surface adsorbates [2], thin film growth [3], solid-solid interfaces [4] and molecular conformers [5]. With experimental colleagues, we applied BOSS to accelerate the development of novel materials with targeted properties, and to optimise materials processing [7]. With recent multi-objective and multi-fidelity implementations for active learning, BOSS can make use of different information sources to help us discover optimal solutions faster in both academic and industrial settings.

        [1] npj Comput. Mater., 5, 35 (2019)
        [2] Beilstein J. Nanotechnol. 11, 1577-1589 (2020), Adv. Func. Mater., 31, 2010853 (2021)
        [3] Adv. Sci. 7, 2000992 (2020)
        [4] ACS Appl. Mater. Interfaces 14 (10), 12758-12765 (2022)
        [5] J. Chem. Theory Comput. 17, 1955 (2020)
        [6] MRS Bulletin 47, 29-37 (2022)
        [7] ACS Sustainable Chem. Eng. 10, 9469 (2022)

        Speaker: Milica Todorovic
      • 9:25 AM
        C31 20m
      • 9:45 AM
        C32 20m
      • 10:05 AM
        C33 20m
    • 10:30 AM 11:00 AM
      Coffee Break
    • 11:00 AM 1:00 PM
      Session - click on "Detailed view" on the top right to see all contributions
      • 11:00 AM
        Accelerating formulation design by understanding the physical properties of complex molecular ensembles 40m

        Modern chemical technology makes extensive use of formulated products, from flavours and fragrances to surfactants and resins. These products are traditionally created and optimised using trial and error, an inefficient and costly process. This situation arises in part due to the complexity of the task. Understanding and accounting for the huge number of (inter)molecular interactions in a design task is currently a huge challenge. The Big Chemistry Consortium aims to transform formulation from an art to a science-based technology by establishing the RobotLab, an autonomous, self-driving laboratory combining AI, chemical data and high-throughput experimentation. Our consortium is spread across several institutions in The Netherlands (Radboud University Nijmegen, TU Eindhoven, AMOLF, Rijksuniversiteit Groningen and Fontys Hogeschool), each contributing to a pool of experimental methods, data and expertise. In this talk, I will present a selection of our initial investigations in high-throughput data collection, how we are approaching AI methods such as chemical language models for property prediction, and how we are working towards establishing efficient and secure data sharing methods within the consortium.

        Speaker: William Robinson
      • 11:40 AM
        C34 20m
      • 12:00 PM
        C35 20m
      • 12:20 PM
        C36 20m
      • 12:40 PM
        C37 20m
    • 1:00 PM 2:00 PM
      Lunch and Departure