Aim: To quantify the variability between radiation oncologists (ROs) when outlining axillary nodes in breast cancer. Material and methods: For each participating center, three ROs with different levels of expertise, i.e., junior (J), senior (S) and expert (E), contoured axillary nodal levels (L1, L2, L3 and L4) on the CT images of three different patients (P) of an increasing degree of anatomical complexity (from P1 to P2 to P3), according to contouring guidelines. Consensus contours were generated using the simultaneous truth and performance level estimation (STAPLE) method. Results: Fifteen centers and 42 ROs participated. Overall, the median Dice similarity coefficient was 0.66. Statistically significant differences were observed according to the level of expertise (better agreement for J and E, worse for S); the axillary level (better agreement for L1 and L4, worse for L3); the patient (better agreement for P1, worse for P3). Statistically significant differences in contouring were found in 18% of the inter-center comparison. Less than a half of the centers could claim to have a good agreement between the internal ROs. Conclusions: The overall intra-institute and inter-institute agreement was moderate. Central lymph-node levels were the most critical and variability increased as the complexity of the patient’s anatomy increased. These findings might have an effect on the interpretation of results from multicenter and even mono-institute studies.
Variability in axillary lymph node delineation for breast cancer radiotherapy in presence of guidelines on a multi-institutional platform
Ippolito E.;Silipigni S.;
2017-01-01
Abstract
Aim: To quantify the variability between radiation oncologists (ROs) when outlining axillary nodes in breast cancer. Material and methods: For each participating center, three ROs with different levels of expertise, i.e., junior (J), senior (S) and expert (E), contoured axillary nodal levels (L1, L2, L3 and L4) on the CT images of three different patients (P) of an increasing degree of anatomical complexity (from P1 to P2 to P3), according to contouring guidelines. Consensus contours were generated using the simultaneous truth and performance level estimation (STAPLE) method. Results: Fifteen centers and 42 ROs participated. Overall, the median Dice similarity coefficient was 0.66. Statistically significant differences were observed according to the level of expertise (better agreement for J and E, worse for S); the axillary level (better agreement for L1 and L4, worse for L3); the patient (better agreement for P1, worse for P3). Statistically significant differences in contouring were found in 18% of the inter-center comparison. Less than a half of the centers could claim to have a good agreement between the internal ROs. Conclusions: The overall intra-institute and inter-institute agreement was moderate. Central lymph-node levels were the most critical and variability increased as the complexity of the patient’s anatomy increased. These findings might have an effect on the interpretation of results from multicenter and even mono-institute studies.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.