Cancer neoantigens, unique tumor-specific proteins, are emerging as promising targets for personalized cancer vaccines. However, accurately identifying these neoantigens remains a significant challenge, primarily due to the limitations of computational prediction tools. Proteogenomics, a field integrating genomics and proteomics, offers a more comprehensive approach to neoantigen discovery. Simultaneously, artificial intelligence (AI), particularly advanced transformer models, has revolutionized numerous fields, including healthcare. This thesis undertakes a fully comprehensive research of the design, implementation, and improvement of AI models aimed at neoantigen classification. We begin with a detailed review of cancer biology, advancing into a technical exploration of contemporary AI methodologies, emphasizing state-of-the-art models and the datasets essential for their training. To boost the training and evaluation of our AI models, we conducted a comprehensive re-analysis of 75 publicly accessible mass spectrometry datasets, culminating in the creation of the CARMEN dataset. Utilizing this dataset, along with custom algorithms, we delineated and curated multiple corpora specifically for training our machine learning models in neoantigen classification. These models were rigorously evaluated against established benchmarks, demonstrating comparable or superior performance. Surprisingly, models trained and tested solely on peptide sequences achieved similar or better results compared to those incorporating HLA information. In response to the escalating complexity and computational intensity of AI models, we introduced a novel pruning algorithm named KEN. This algorithm automates the reset of non-essential parameters within a trained model, yielding smaller, more efficient subnetworks with negligible performance loss. When applied to our neoantigen classification models, KEN produced models that were not only smaller but also more efficient. Notably, the integration of HLA sequences into these pruned models enhanced their performance relative to models relying solely on peptide sequences. In summary, this thesis represents a substantial advancement in cancer immunotherapy by innovating in the development and optimization of AI models for neoantigen classification. By effectively leveraging proteogenomics datasets and pioneering AI methodologies, we have established robust, efficient models capable of accurately identifying neoantigens, thus laying the groundwork for personalized cancer vaccines that hold promise for improved patient outcomes.
Enhancing Neoantigen Identification: The Role of Artificial Intelligence and Proteogenomics in Personalized Cancer Therapy / Michele Mastromattei , 2025 Jun 03. 37. ciclo, Anno Accademico 2021/2022.
Enhancing Neoantigen Identification: The Role of Artificial Intelligence and Proteogenomics in Personalized Cancer Therapy
MASTROMATTEI, MICHELE
2025-06-03
Abstract
Cancer neoantigens, unique tumor-specific proteins, are emerging as promising targets for personalized cancer vaccines. However, accurately identifying these neoantigens remains a significant challenge, primarily due to the limitations of computational prediction tools. Proteogenomics, a field integrating genomics and proteomics, offers a more comprehensive approach to neoantigen discovery. Simultaneously, artificial intelligence (AI), particularly advanced transformer models, has revolutionized numerous fields, including healthcare. This thesis undertakes a fully comprehensive research of the design, implementation, and improvement of AI models aimed at neoantigen classification. We begin with a detailed review of cancer biology, advancing into a technical exploration of contemporary AI methodologies, emphasizing state-of-the-art models and the datasets essential for their training. To boost the training and evaluation of our AI models, we conducted a comprehensive re-analysis of 75 publicly accessible mass spectrometry datasets, culminating in the creation of the CARMEN dataset. Utilizing this dataset, along with custom algorithms, we delineated and curated multiple corpora specifically for training our machine learning models in neoantigen classification. These models were rigorously evaluated against established benchmarks, demonstrating comparable or superior performance. Surprisingly, models trained and tested solely on peptide sequences achieved similar or better results compared to those incorporating HLA information. In response to the escalating complexity and computational intensity of AI models, we introduced a novel pruning algorithm named KEN. This algorithm automates the reset of non-essential parameters within a trained model, yielding smaller, more efficient subnetworks with negligible performance loss. When applied to our neoantigen classification models, KEN produced models that were not only smaller but also more efficient. Notably, the integration of HLA sequences into these pruned models enhanced their performance relative to models relying solely on peptide sequences. In summary, this thesis represents a substantial advancement in cancer immunotherapy by innovating in the development and optimization of AI models for neoantigen classification. By effectively leveraging proteogenomics datasets and pioneering AI methodologies, we have established robust, efficient models capable of accurately identifying neoantigens, thus laying the groundwork for personalized cancer vaccines that hold promise for improved patient outcomes.| File | Dimensione | Formato | |
|---|---|---|---|
|
PhD_Mastromattei_Michele.pdf
accesso aperto
Descrizione: PhD Thesis Mastromattei Michele
Tipologia:
Tesi di dottorato
Licenza:
Creative commons
Dimensione
51.78 MB
Formato
Adobe PDF
|
51.78 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


