Sixth FAIRmat Users Meeting

Name: Sixth FAIRmat Users Meeting
Start: 2025-07-02T09:00:00+02:00
End: 2025-07-03T16:00:00+02:00
Location: Ruhr-Universität Bochum

Jul 2 – 3, 2025

Ruhr-Universität Bochum

Europe/Berlin timezone

Contact

fairmat-events@physik.hu-berlin.de

Text Mining in Materials Science

Jul 3, 2025, 11:00 AM

30m

UFO (Ruhr-Universität Bochum)

UFO

Ruhr-Universität Bochum

Querenburger Höhe 283, 44801 Bochum

Language Models for Materials Science

Markus Stricker

Lei Zhang, Doaa Mohamed, Sepideh Baghaee Ravari, Markus Stricker

Beyond the direct raw data sources experiments and simulations, scientific publication are an underused resource at scale. The content of scientific publications can be converted into high-dimensional vector representations to gain access to the underlying correlations. Raw text can be converted to word embeddings (word2vec) and combined vision-language models can be used to extract structured datasets from scientific publications. These high-dimensional representations can then be used for data mining. I will demonstrate the potential of text mining using two examples: (1) how correlations in word embedding space can accelerate active learning loops in materials discovery, and (2) workflows for converting unstructured scientific publications to structured. However, robust and standardized pipelines for these methods are still a work in progress but these result, among others, already demonstrate useful applications.

There are no materials yet.

Sixth FAIRmat Users Meeting

Contact

Text Mining in Materials Science

UFO

Ruhr-Universität Bochum

Speaker

Description

Presentation materials