This thesis investigates the theory of trustworthiness in deep learning, focusing on the critical challenges of bias, privacy, and control in Large Language Models (LLMs). Through diverse research works, the thesis provides valuable insights and advancements towards building more reliable AI systems. My focus is on empirically quantifying bias in a medical context, obtaining systematic methodologies for its measurement, and developing novel techniques for model editing and memory control. These findings contribute to the broader field of AI ethics and safety, advancing the development of effective and reliable machine learning systems. After the introduction provided in Chapter~1, in the second chapter i analyze algorithmic bias using a Treatment Prediction System (TPS) as a case study, thus quantifying its impact and establishing the need for standardized tools like the Prompt Association Test (P-AT) and its Italian adaptation, Ita-P-AT. In the third chapter novel model editing techniques, including Private Association Editing (PAE) and Private Memorization Editing (PME), to enhance data privacy, and the MeMo framework for improving a model's associative memory mechanisms are introduced. Chapter~4 applies these principles in a legal context through the CINi project, showcasing a domain-specific model for legal summarization. The last chapter summarizes the key findings and contributions and also hints at possible future directions. Collectively, this research work contributes to understanding the challenges of AI deployment and enables the development of more reliable and accountable models.

From AI for Personalized Medicine to Ethical Model Transparency: A Journey Through Explainable AI, Model Editing, and Bias Mitigation / Davide Venditti , 2026. 38. ciclo

From AI for Personalized Medicine to Ethical Model Transparency: A Journey Through Explainable AI, Model Editing, and Bias Mitigation

VENDITTI, DAVIDE
2026-01-01

Abstract

This thesis investigates the theory of trustworthiness in deep learning, focusing on the critical challenges of bias, privacy, and control in Large Language Models (LLMs). Through diverse research works, the thesis provides valuable insights and advancements towards building more reliable AI systems. My focus is on empirically quantifying bias in a medical context, obtaining systematic methodologies for its measurement, and developing novel techniques for model editing and memory control. These findings contribute to the broader field of AI ethics and safety, advancing the development of effective and reliable machine learning systems. After the introduction provided in Chapter~1, in the second chapter i analyze algorithmic bias using a Treatment Prediction System (TPS) as a case study, thus quantifying its impact and establishing the need for standardized tools like the Prompt Association Test (P-AT) and its Italian adaptation, Ita-P-AT. In the third chapter novel model editing techniques, including Private Association Editing (PAE) and Private Memorization Editing (PME), to enhance data privacy, and the MeMo framework for improving a model's associative memory mechanisms are introduced. Chapter~4 applies these principles in a legal context through the CINi project, showcasing a domain-specific model for legal summarization. The last chapter summarizes the key findings and contributions and also hints at possible future directions. Collectively, this research work contributes to understanding the challenges of AI deployment and enables the development of more reliable and accountable models.
2026
From AI for Personalized Medicine to Ethical Model Transparency: A Journey Through Explainable AI, Model Editing, and Bias Mitigation / Davide Venditti , 2026. 38. ciclo
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/93744
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact