Enhancing Neoantigen Identification: The Role of Artificial Intelligence and Proteogenomics in Personalized Cancer Therapy

IRIS

Cancer neoantigens, unique tumor-specific proteins, are emerging as promising targets for personalized cancer vaccines. However, accurately identifying these neoantigens remains a significant challenge, primarily due to the limitations of computational prediction tools. Proteogenomics, a field integrating genomics and proteomics, offers a more comprehensive approach to neoantigen discovery. Simultaneously, artificial intelligence (AI), particularly advanced transformer models, has revolutionized numerous fields, including healthcare. This thesis undertakes a fully comprehensive research of the design, implementation, and improvement of AI models aimed at neoantigen classification. We begin with a detailed review of cancer biology, advancing into a technical exploration of contemporary AI methodologies, emphasizing state-of-the-art models and the datasets essential for their training. To boost the training and evaluation of our AI models, we conducted a comprehensive re-analysis of 75 publicly accessible mass spectrometry datasets, culminating in the creation of the CARMEN dataset. Utilizing this dataset, along with custom algorithms, we delineated and curated multiple corpora specifically for training our machine learning models in neoantigen classification. These models were rigorously evaluated against established benchmarks, demonstrating comparable or superior performance. Surprisingly, models trained and tested solely on peptide sequences achieved similar or better results compared to those incorporating HLA information. In response to the escalating complexity and computational intensity of AI models, we introduced a novel pruning algorithm named KEN. This algorithm automates the reset of non-essential parameters within a trained model, yielding smaller, more efficient subnetworks with negligible performance loss. When applied to our neoantigen classification models, KEN produced models that were not only smaller but also more efficient. Notably, the integration of HLA sequences into these pruned models enhanced their performance relative to models relying solely on peptide sequences. In summary, this thesis represents a substantial advancement in cancer immunotherapy by innovating in the development and optimization of AI models for neoantigen classification. By effectively leveraging proteogenomics datasets and pioneering AI methodologies, we have established robust, efficient models capable of accurately identifying neoantigens, thus laying the groundwork for personalized cancer vaccines that hold promise for improved patient outcomes.

Enhancing Neoantigen Identification: The Role of Artificial Intelligence and Proteogenomics in Personalized Cancer Therapy / Michele Mastromattei , 2025 Jun 03. 37. ciclo, Anno Accademico 2021/2022.

Enhancing Neoantigen Identification: The Role of Artificial Intelligence and Proteogenomics in Personalized Cancer Therapy

MASTROMATTEI, MICHELE

2025-06-03

Abstract

Cancer neoantigens, unique tumor-specific proteins, are emerging as promising targets for personalized cancer vaccines. However, accurately identifying these neoantigens remains a significant challenge, primarily due to the limitations of computational prediction tools. Proteogenomics, a field integrating genomics and proteomics, offers a more comprehensive approach to neoantigen discovery. Simultaneously, artificial intelligence (AI), particularly advanced transformer models, has revolutionized numerous fields, including healthcare. This thesis undertakes a fully comprehensive research of the design, implementation, and improvement of AI models aimed at neoantigen classification. We begin with a detailed review of cancer biology, advancing into a technical exploration of contemporary AI methodologies, emphasizing state-of-the-art models and the datasets essential for their training. To boost the training and evaluation of our AI models, we conducted a comprehensive re-analysis of 75 publicly accessible mass spectrometry datasets, culminating in the creation of the CARMEN dataset. Utilizing this dataset, along with custom algorithms, we delineated and curated multiple corpora specifically for training our machine learning models in neoantigen classification. These models were rigorously evaluated against established benchmarks, demonstrating comparable or superior performance. Surprisingly, models trained and tested solely on peptide sequences achieved similar or better results compared to those incorporating HLA information. In response to the escalating complexity and computational intensity of AI models, we introduced a novel pruning algorithm named KEN. This algorithm automates the reset of non-essential parameters within a trained model, yielding smaller, more efficient subnetworks with negligible performance loss. When applied to our neoantigen classification models, KEN produced models that were not only smaller but also more efficient. Notably, the integration of HLA sequences into these pruned models enhanced their performance relative to models relying solely on peptide sequences. In summary, this thesis represents a substantial advancement in cancer immunotherapy by innovating in the development and optimization of AI models for neoantigen classification. By effectively leveraging proteogenomics datasets and pioneering AI methodologies, we have established robust, efficient models capable of accurately identifying neoantigens, thus laying the groundwork for personalized cancer vaccines that hold promise for improved patient outcomes.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di discussione
	
				3-giu-2025
			
	Parole chiave
	
				Artificial intelligence
Personalized medicine
Neoantigens
Pruning algorithm
			
	Citazione
	
				Enhancing Neoantigen Identification: The Role of Artificial Intelligence and Proteogenomics in Personalized Cancer Therapy / Michele Mastromattei , 2025 Jun 03. 37. ciclo, Anno Accademico 2021/2022.
			
	Appare nelle tipologie:
	
				8.1 Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
PhD_Mastromattei_Michele.pdf accesso aperto Descrizione: PhD Thesis Mastromattei Michele Tipologia: Tesi di dottorato Licenza: Creative commons Dimensione 51.78 MB Formato Adobe PDF Visualizza/Apri	51.78 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/94244

Citazioni

ND

ND

ND

social impact