Resilient Multimodal Learning with Incomplete Information in Biomedical Applications

Caruso, Camillo Maria

The deployment of Artificial Intelligence (AI) in clinical settings is increasingly viewed as a transformative step toward precision medicine and improved patient outcomes. However, the clinical reality of healthcare data is far from ideal. One of the most pervasive challenges remains the issue of missing and incomplete data, a systemic characteristic rather than a sporadic error. Conventional predictive models are often inadequately equipped to handle such irregularities, relying heavily on data preprocessing and imputation strategies that introduce biases and limit generalizability. This dissertation proposes a paradigm shift in the development of machine learning models, advocating for resilient learning systems, architectures inherently designed to operate under partial observability while maintaining high performance and stability. Rather than viewing missingness as a problem to be corrected, the proposed approach treats it as a fundamental property of clinical data that must be embraced and strategically incorporated into the learning process. A key focus is the application of these principles to multimodal learning, where information is drawn from heterogeneous sources, including clinical records, laboratory tests, and radiological images. In such contexts, the absence of entire modalities can significantly hinder model performance if not explicitly addressed. This thesis explores how to design multimodal systems resilient to missing and incomplete data. To this end, the dissertation first presents NAIM (Not Another Imputation Method), a transformer-based model tailored for tabular data characterized by structural missingness. NAIM leverages a combination of feature-specific embeddings, a masked self-attention mechanism, and a novel regularization strategy that randomly masks inputs during training. This framework enables the model to learn directly from incomplete feature sets, avoiding the biases introduced by imputation. Indeed, unlike traditional methods that depend on complex imputation pipelines, NAIM bypasses the reconstruction step entirely, training directly on what is observed. The model is benchmarked against a wide spectrum of state-of-the-art machine learning and deep learning models across five public classification datasets, as well as on a real-world clinical application involving the prediction of overall survival in cancer patients. Across both experimental settings, NAIM demonstrates strong and consistent performance, particularly under high levels of missingness. These results highlight the model’s capacity to generalize to realistic clinical scenarios and confirm its practical relevance for resilient AI systems in healthcare. Building on these insights, the thesis then explores the transition to multimodal modeling by investigating whether combining radiological images (CT scans) and structured clinical data can enhance predictive performance in a real-world setting. A late fusion ensemble approach is employed, where each modality is processed independently and combined at the decision level. This setup reflects a common and practical design in clinical AI systems and ensures a certain degree of modularity. However, by treating modalities independently until the final stage, this approach cannot fully leverage the interdependencies and shared patterns across data types. This observation, supported by the findings of our systematic review on multimodal fusion strategies, underscores the need for more integrated fusion mechanisms capable of modeling cross-modal interactions throughout the learning process. To overcome this limitation, the dissertation introduces MARIA (Multimodal Attention Resilient to Incomplete datA), a novel transformer-based architecture designed for resilient multimodal data fusion. MARIA integrates modality-specific encoders with a shared attention-based fusion module, using a generalized masking mechanism to handle missing modalities. This intermediate fusion design allows the model to dynamically learn interactions between available inputs while maintaining resilience to missing components. Empirical evaluations across a range of simulated missing data scenarios confirm MARIA's ability to deliver accurate and stable predictions, even under severe data fragmentation. Altogether, this work outlines a unified methodological framework for designing AI systems that are resilient to data incompleteness in both unimodal and multimodal settings. Rather than relying on preprocessing fixes, resilience is embedded directly into model architectures and training objectives. This design philosophy aligns more closely with the operational constraints of modern clinical environments, where decisions often need to be made based on partial information and data cannot always be recollected or completed. By embracing the imperfection and variability inherent in healthcare data, this dissertation lays the groundwork for the development of practical, adaptable, and resilient AI models, capable of supporting clinicians even in the most uncertain and data-sparse scenarios.

Resilient Multimodal Learning with Incomplete Information in Biomedical Applications / Camillo Maria Caruso , 2025 Dec 01. 37. ciclo