Speakers
Description
Scientific progress increasingly depends on the ability to transform unstructured information into accessible, structured data. However, the rapid growth of scientific literature has made manual data extraction and curation a major bottleneck across disciplines. Recent advances in artificial intelligence offer new opportunities to automate this process and unlock knowledge at unprecedented scale.
This session will explore emerging AI-assisted approaches for scientific data extraction, focusing on multimodal workflows that convert diverse information sources into structured, machine-readable datasets. We will present recent developments in NOMAD that leverage large language models and domain-specific validation to extract scientific information from research publications, enabling the creation of continuously updated knowledge resources. In addition, we will showcase new capabilities that extend data extraction beyond traditional documents, including the use of speech and audio inputs as alternative pathways for capturing and structuring scientific knowledge.
Through examples from materials science and photovoltaics, we will discuss the opportunities, challenges, and limitations of AI-driven extraction systems, including issues of accuracy, validation, reproducibility, and integration with existing scientific infrastructures. The session aims to provide researchers with an overview of how AI can accelerate the transformation of scientific content into reusable data and support data-driven discovery in an era of rapidly expanding scientific output.