Background: Biomedical natural language processing (NLP) increasingly relies on large language models and extensive datasets, presenting significant computational challenges. Methods: We propose Blue5, a multi-task model based on SciFive that incorporates instance selection (IS) to enable efficient, multi-task learning (MTL) on biomedical data. We adapt the E2SC-IS framework for the biomedical domain, integrating a calibrated SVM classifier to reduce computational costs. Results: Our approach achieves an average data reduction of 26.6% across the several tasks of the BLUE (Biomedical Language Understanding Evaluation) Benchmark, while maintaining performance comparable with state-of-the-art models. The multi-task SVM configuration emerges as the most effective, demonstrating the power of combining IS with MTL for biomedical NLP. As a result of the unified framework, Blue5 effectively selects the most informative instances across tasks, ensuring model generalization while efficiently handling multiple NLP tasks. Conclusion: Our work offers a practical solution to address growing computational demands, enabling more scalable and accessible applications of advanced NLP techniques in biomedical research and healthcare.

Efficient multi-task learning with instance selection for biomedical NLP

Bacco L.
;
Pecchia L.;Merone M.;
2025-01-01

Abstract

Background: Biomedical natural language processing (NLP) increasingly relies on large language models and extensive datasets, presenting significant computational challenges. Methods: We propose Blue5, a multi-task model based on SciFive that incorporates instance selection (IS) to enable efficient, multi-task learning (MTL) on biomedical data. We adapt the E2SC-IS framework for the biomedical domain, integrating a calibrated SVM classifier to reduce computational costs. Results: Our approach achieves an average data reduction of 26.6% across the several tasks of the BLUE (Biomedical Language Understanding Evaluation) Benchmark, while maintaining performance comparable with state-of-the-art models. The multi-task SVM configuration emerges as the most effective, demonstrating the power of combining IS with MTL for biomedical NLP. As a result of the unified framework, Blue5 effectively selects the most informative instances across tasks, ensuring model generalization while efficiently handling multiple NLP tasks. Conclusion: Our work offers a practical solution to address growing computational demands, enabling more scalable and accessible applications of advanced NLP techniques in biomedical research and healthcare.
2025
Biomedical NLP; BLUE benchmark; Computational efficiency; Instance selection; Multi-task learning
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0010482525004019-main-2.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 1.94 MB
Formato Adobe PDF
1.94 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/87924
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact