Knowledge-Grounded Machine Learning for Biomedical Domains: Extraction, Injection and Evaluation

IRIS

The adoption of machine learning in biomedical and clinical domains is frequently constrained by structural challenges: data scarcity, large dimensionality, and a misalignment between statistical optimization and established medical practice. While data-driven approaches can learn complex patterns, they may lack the transparency and reasoning capabilities required for high-stakes healthcare environments. This thesis proposes a knowledge grounded framework that bridges this gap by integrating structured domain knowledge into the machine learning (ML) pipeline through three aspects: extraction, injection, and evaluation. First, to address knowledge injection in sequential data, this work introduces a novel Structural Positional Encoding (SPE) method for Transformer-based process monitoring. Applied to stroke management guidelines, this approach embeds clinical ontologies directly into the model architecture, yielding better performance and adherence to guidelines compared to standard baselines. Second, addressing injection in high-dimensional data, the thesis presents a Graph Representation Learning (GRL) framework for multi-omics microbiome data. By encoding taxonomic knowledge into a graph structure, this method effectively handles feature sparsity and creates an encoder for microbiome data which can then be optimized for downstream tasks. Finally, addressing the extraction and evaluation gap, the thesis presents a framework that leverages Large Language Models (LLMs) to extract structured clinical reasoning trees from medical literature. This creates a benchmark for assessing whether LLMs reasoning follows clinical knowledge, moving evaluation beyond simple accuracy metrics. Collectively, these contributions demonstrate that explicit knowledge integration acts as a powerful inductive bias, creating systems that are not only statistically robust but also aligned with the logical structures of medical reasoning.

Knowledge-Grounded Machine Learning for Biomedical Domains: Extraction, Injection and Evaluation / Christopher Irwin - Torino. , 2026 May 25. 38. ciclo, Anno Accademico 2024/2025.

Knowledge-Grounded Machine Learning for Biomedical Domains: Extraction, Injection and Evaluation

IRWIN, CHRISTOPHER

2026-05-25

Abstract

The adoption of machine learning in biomedical and clinical domains is frequently constrained by structural challenges: data scarcity, large dimensionality, and a misalignment between statistical optimization and established medical practice. While data-driven approaches can learn complex patterns, they may lack the transparency and reasoning capabilities required for high-stakes healthcare environments. This thesis proposes a knowledge grounded framework that bridges this gap by integrating structured domain knowledge into the machine learning (ML) pipeline through three aspects: extraction, injection, and evaluation. First, to address knowledge injection in sequential data, this work introduces a novel Structural Positional Encoding (SPE) method for Transformer-based process monitoring. Applied to stroke management guidelines, this approach embeds clinical ontologies directly into the model architecture, yielding better performance and adherence to guidelines compared to standard baselines. Second, addressing injection in high-dimensional data, the thesis presents a Graph Representation Learning (GRL) framework for multi-omics microbiome data. By encoding taxonomic knowledge into a graph structure, this method effectively handles feature sparsity and creates an encoder for microbiome data which can then be optimized for downstream tasks. Finally, addressing the extraction and evaluation gap, the thesis presents a framework that leverages Large Language Models (LLMs) to extract structured clinical reasoning trees from medical literature. This creates a benchmark for assessing whether LLMs reasoning follows clinical knowledge, moving evaluation beyond simple accuracy metrics. Collectively, these contributions demonstrate that explicit knowledge integration acts as a powerful inductive bias, creating systems that are not only statistically robust but also aligned with the logical structures of medical reasoning.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di discussione
	
				25-mag-2026
			
	Parole chiave
	
				Graph Neural Networks
			
	Parole chiave
	
				Machine Learning
Deep Learning
Multiomics
Data mining
Large Language Models
Microbiome
Clinical AI
			
	Citazione
	
				Knowledge-Grounded Machine Learning for Biomedical Domains: Extraction, Injection and Evaluation / Christopher Irwin - Torino. , 2026 May 25. 38. ciclo, Anno Accademico 2024/2025.
			
	Appare nelle tipologie:
	
				8.1 Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
PhD_Irwin_Christopher.pdf accesso aperto Tipologia: Tesi di dottorato Licenza: Creative commons Dimensione 12.92 MB Formato Adobe PDF Visualizza/Apri	12.92 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/95179

Citazioni

ND

ND

ND

social impact