From AI for Personalized Medicine to Ethical Model Transparency: A Journey Through Explainable AI, Model Editing, and Bias Mitigation

IRIS

This thesis investigates the theory of trustworthiness in deep learning, focusing on the critical challenges of bias, privacy, and control in Large Language Models (LLMs). Through diverse research works, the thesis provides valuable insights and advancements towards building more reliable AI systems. My focus is on empirically quantifying bias in a medical context, obtaining systematic methodologies for its measurement, and developing novel techniques for model editing and memory control. These findings contribute to the broader field of AI ethics and safety, advancing the development of effective and reliable machine learning systems. After the introduction provided in Chapter~1, in the second chapter i analyze algorithmic bias using a Treatment Prediction System (TPS) as a case study, thus quantifying its impact and establishing the need for standardized tools like the Prompt Association Test (P-AT) and its Italian adaptation, Ita-P-AT. In the third chapter novel model editing techniques, including Private Association Editing (PAE) and Private Memorization Editing (PME), to enhance data privacy, and the MeMo framework for improving a model's associative memory mechanisms are introduced. Chapter~4 applies these principles in a legal context through the CINi project, showcasing a domain-specific model for legal summarization. The last chapter summarizes the key findings and contributions and also hints at possible future directions. Collectively, this research work contributes to understanding the challenges of AI deployment and enables the development of more reliable and accountable models.

From AI for Personalized Medicine to Ethical Model Transparency: A Journey Through Explainable AI, Model Editing, and Bias Mitigation / Davide Venditti , 2026. 38. ciclo

From AI for Personalized Medicine to Ethical Model Transparency: A Journey Through Explainable AI, Model Editing, and Bias Mitigation

VENDITTI, DAVIDE

2026-01-01

Abstract

This thesis investigates the theory of trustworthiness in deep learning, focusing on the critical challenges of bias, privacy, and control in Large Language Models (LLMs). Through diverse research works, the thesis provides valuable insights and advancements towards building more reliable AI systems. My focus is on empirically quantifying bias in a medical context, obtaining systematic methodologies for its measurement, and developing novel techniques for model editing and memory control. These findings contribute to the broader field of AI ethics and safety, advancing the development of effective and reliable machine learning systems. After the introduction provided in Chapter~1, in the second chapter i analyze algorithmic bias using a Treatment Prediction System (TPS) as a case study, thus quantifying its impact and establishing the need for standardized tools like the Prompt Association Test (P-AT) and its Italian adaptation, Ita-P-AT. In the third chapter novel model editing techniques, including Private Association Editing (PAE) and Private Memorization Editing (PME), to enhance data privacy, and the MeMo framework for improving a model's associative memory mechanisms are introduced. Chapter~4 applies these principles in a legal context through the CINi project, showcasing a domain-specific model for legal summarization. The last chapter summarizes the key findings and contributions and also hints at possible future directions. Collectively, this research work contributes to understanding the challenges of AI deployment and enables the development of more reliable and accountable models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di discussione
	
				2026
			
	Citazione
	
				From AI for Personalized Medicine to Ethical Model Transparency: A Journey Through Explainable AI, Model Editing, and Bias Mitigation / Davide Venditti , 2026. 38. ciclo
			
	Appare nelle tipologie:
	
				8.1 Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
PhD_Venditti_Davide.pdf accesso aperto Tipologia: Tesi di dottorato Licenza: Creative commons Dimensione 3.82 MB Formato Adobe PDF Visualizza/Apri	3.82 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/93744

Citazioni

ND

ND

ND

social impact