Statistical and machine learning methods for cancer research and clinical practice: A systematic review

IRIS

Background: Cancer is progressively becoming the most prevalent disease worldwide, accompanied by significantly increasing investments in research to improve its prevention, early detection, diagnosis, prognosis and treatment. Predictive analytics are showing promising performance when applied to these tasks, with recent reporting guidelines supporting unbiased data analytics whose outcomes demonstrate a clinical benefit. Methods: A systematic review has been conducted to analyse statistical- and ML-based prediction model studies on cancer research from 2010 to 2020. The PRISMA and PROBAST methodologies have been adopted. Findings: Statistical analysis (46.4 %) and linear ML-based methods (36.4 %) predominate over non-linear ML-based methods (17.2 %) among the examined studies. Only 11 % of the studies are associated with a low risk of bias (ROB), whereas the majority of studies (69 %) has been judged as unclear ROB, an aftereffect of the incompleteness (non-transparency) in their reporting. Lastly, 81.6 % of the investigated studies do not report any data quality assessment procedure. A qualitative analysis of the studies from 2021 to 2023 shows a shift to combining data-driven and systems biology computational approaches. Interpretation: The alignment with systematic procedures for reporting and assessing prediction model studies is a prerequisite towards responsible research. These procedures will enable ML-based interventions in the field of cancer research, demonstrating the clinical value of their findings.

Statistical and machine learning methods for cancer research and clinical practice: A systematic review

Lopez-Perez L.;Georga E.;Conti C.;Vicente V.;Garcia R.;Pecchia L.;Fotiadis D.;Licitra L.;Cabrera M. F.;Arredondo M. T.;Fico G.

2024-01-01

Abstract

Background: Cancer is progressively becoming the most prevalent disease worldwide, accompanied by significantly increasing investments in research to improve its prevention, early detection, diagnosis, prognosis and treatment. Predictive analytics are showing promising performance when applied to these tasks, with recent reporting guidelines supporting unbiased data analytics whose outcomes demonstrate a clinical benefit. Methods: A systematic review has been conducted to analyse statistical- and ML-based prediction model studies on cancer research from 2010 to 2020. The PRISMA and PROBAST methodologies have been adopted. Findings: Statistical analysis (46.4 %) and linear ML-based methods (36.4 %) predominate over non-linear ML-based methods (17.2 %) among the examined studies. Only 11 % of the studies are associated with a low risk of bias (ROB), whereas the majority of studies (69 %) has been judged as unclear ROB, an aftereffect of the incompleteness (non-transparency) in their reporting. Lastly, 81.6 % of the investigated studies do not report any data quality assessment procedure. A qualitative analysis of the studies from 2021 to 2023 shows a shift to combining data-driven and systems biology computational approaches. Interpretation: The alignment with systematic procedures for reporting and assessing prediction model studies is a prerequisite towards responsible research. These procedures will enable ML-based interventions in the field of cancer research, demonstrating the clinical value of their findings.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Parole chiave
	
				Cancer research; Data quality; Knowledge transfer; Machine learning; Statistical analysis
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S1746809424001253-main.pdf accesso aperto Licenza: Creative commons Dimensione 1.82 MB Formato Adobe PDF Visualizza/Apri	1.82 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/78603

Citazioni

ND

4

2

social impact