Background: Biomedical natural language processing (NLP) increasingly relies on large language models and extensive datasets, presenting significant computational challenges. Methods: We propose Blue5, a multi-task model based on SciFive that incorporates instance selection (IS) to enable efficient, multi-task learning (MTL) on biomedical data. We adapt the E2SC-IS framework for the biomedical domain, integrating a calibrated SVM classifier to reduce computational costs. Results: Our approach achieves an average data reduction of 26.6% across the several tasks of the BLUE (Biomedical Language Understanding Evaluation) Benchmark, while maintaining performance comparable with state-of-the-art models. The multi-task SVM configuration emerges as the most effective, demonstrating the power of combining IS with MTL for biomedical NLP. As a result of the unified framework, Blue5 effectively selects the most informative instances across tasks, ensuring model generalization while efficiently handling multiple NLP tasks. Conclusion: Our work offers a practical solution to address growing computational demands, enabling more scalable and accessible applications of advanced NLP techniques in biomedical research and healthcare.
Efficient multi-task learning with instance selection for biomedical NLP
Bacco L.
;Pecchia L.;Merone M.;
2025-01-01
Abstract
Background: Biomedical natural language processing (NLP) increasingly relies on large language models and extensive datasets, presenting significant computational challenges. Methods: We propose Blue5, a multi-task model based on SciFive that incorporates instance selection (IS) to enable efficient, multi-task learning (MTL) on biomedical data. We adapt the E2SC-IS framework for the biomedical domain, integrating a calibrated SVM classifier to reduce computational costs. Results: Our approach achieves an average data reduction of 26.6% across the several tasks of the BLUE (Biomedical Language Understanding Evaluation) Benchmark, while maintaining performance comparable with state-of-the-art models. The multi-task SVM configuration emerges as the most effective, demonstrating the power of combining IS with MTL for biomedical NLP. As a result of the unified framework, Blue5 effectively selects the most informative instances across tasks, ensuring model generalization while efficiently handling multiple NLP tasks. Conclusion: Our work offers a practical solution to address growing computational demands, enabling more scalable and accessible applications of advanced NLP techniques in biomedical research and healthcare.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S0010482525004019-main-2.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
1.94 MB
Formato
Adobe PDF
|
1.94 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.