Background: Cancer is progressively becoming the most prevalent disease worldwide, accompanied by significantly increasing investments in research to improve its prevention, early detection, diagnosis, prognosis and treatment. Predictive analytics are showing promising performance when applied to these tasks, with recent reporting guidelines supporting unbiased data analytics whose outcomes demonstrate a clinical benefit. Methods: A systematic review has been conducted to analyse statistical- and ML-based prediction model studies on cancer research from 2010 to 2020. The PRISMA and PROBAST methodologies have been adopted. Findings: Statistical analysis (46.4 %) and linear ML-based methods (36.4 %) predominate over non-linear ML-based methods (17.2 %) among the examined studies. Only 11 % of the studies are associated with a low risk of bias (ROB), whereas the majority of studies (69 %) has been judged as unclear ROB, an aftereffect of the incompleteness (non-transparency) in their reporting. Lastly, 81.6 % of the investigated studies do not report any data quality assessment procedure. A qualitative analysis of the studies from 2021 to 2023 shows a shift to combining data-driven and systems biology computational approaches. Interpretation: The alignment with systematic procedures for reporting and assessing prediction model studies is a prerequisite towards responsible research. These procedures will enable ML-based interventions in the field of cancer research, demonstrating the clinical value of their findings.
Statistical and machine learning methods for cancer research and clinical practice: A systematic review
Pecchia L.;
2024-01-01
Abstract
Background: Cancer is progressively becoming the most prevalent disease worldwide, accompanied by significantly increasing investments in research to improve its prevention, early detection, diagnosis, prognosis and treatment. Predictive analytics are showing promising performance when applied to these tasks, with recent reporting guidelines supporting unbiased data analytics whose outcomes demonstrate a clinical benefit. Methods: A systematic review has been conducted to analyse statistical- and ML-based prediction model studies on cancer research from 2010 to 2020. The PRISMA and PROBAST methodologies have been adopted. Findings: Statistical analysis (46.4 %) and linear ML-based methods (36.4 %) predominate over non-linear ML-based methods (17.2 %) among the examined studies. Only 11 % of the studies are associated with a low risk of bias (ROB), whereas the majority of studies (69 %) has been judged as unclear ROB, an aftereffect of the incompleteness (non-transparency) in their reporting. Lastly, 81.6 % of the investigated studies do not report any data quality assessment procedure. A qualitative analysis of the studies from 2021 to 2023 shows a shift to combining data-driven and systems biology computational approaches. Interpretation: The alignment with systematic procedures for reporting and assessing prediction model studies is a prerequisite towards responsible research. These procedures will enable ML-based interventions in the field of cancer research, demonstrating the clinical value of their findings.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.