Jul 2 – 3, 2025
Ruhr-Universität Bochum
Europe/Berlin timezone

Text Mining in Materials Science

Jul 3, 2025, 11:00 AM
30m
UFO (Ruhr-Universität Bochum)

UFO

Ruhr-Universität Bochum

Querenburger Höhe 283, 44801 Bochum

Speaker

Markus Stricker

Description

Lei Zhang, Doaa Mohamed, Sepideh Baghaee Ravari, Markus Stricker

Beyond the direct raw data sources experiments and simulations, scientific publication are an underused resource at scale. The content of scientific publications can be converted into high-dimensional vector representations to gain access to the underlying correlations. Raw text can be converted to word embeddings (word2vec) and combined vision-language models can be used to extract structured datasets from scientific publications. These high-dimensional representations can then be used for data mining. I will demonstrate the potential of text mining using two examples: (1) how correlations in word embedding space can accelerate active learning loops in materials discovery, and (2) workflows for converting unstructured scientific publications to structured. However, robust and standardized pipelines for these methods are still a work in progress but these result, among others, already demonstrate useful applications.

Presentation materials

There are no materials yet.