Tasks and Evaluation 


Tasks

There are two tasks in this data challenge. Participants are free to engage with one or both tasks as they see fit (you are not required to participate in both). 

  • Task 1: Segmentation of tumor volumes (GTVp and GTVn) on pre-RT MRI. See more details on expected model inputs/outputs on the Dataset page.
  • Task 2: Segmentation of tumor volumes (GTVp and GTVn) on mid-RT MRI. See more details on expected model inputs/outputs on the Dataset page.


Evaluation Metric 

Both tasks will be evaluated in the same general manner using the aggregated Dice Similarity Coefficient (DSCagg). DSCagg was employed by Andrearczyk et al. for the segmentation task of the 2022 edition of the HECKTOR Challenge (doi: 10.1007/978-3-031-27420-6_1).

Specifically, the DSCagg metric is defined as:


where Ai and Bi are the ground truth and predicted segmentation for image i, where i spans the entire test set. 

Conceptually, the 2022 edition of the HECKTOR Challenge had similar segmentation outputs (GTVp and GTVn for head and neck cancer patients) as our proposed challenge, so we deem this an appropriate metric. Since the presence of GTVp and GTVn will not be consistent across all cases, the proposed DSCagg metric is well-suited for this task. Unlike the conventional volumetric DSC, which may be disproportionately affected by a single false negative result (yielding a DSC of 0), this metric is designed to accommodate such occurrences more effectively.

For both GTVp and GTVn, we will accumulate the intersections and unions between GTVs and the respective predicted volumes across all images. Note that the intersection and union in an image can be zero for both GTVp and GTVn, as some cases may not contain GTVn or GTVp. Ultimately, we will divide the aggregated intersection by the aggregated union, both for GTVp and GTVn, and will compute the average of these two aggregated indices. In other words, DSCagg will be computed separately for GTVp and GTVn on the test set, and the average of the two will be used for the final ranking. This choice was made to give equal importance to the two GTV types since both primary tumors and metastatic lymph nodes serve as target structures for RT and should be given the full treatment dose.

The metric will be computed individually for GTVp and GTVn, and the average of the two will be used for the final challenge ranking (similar to HECKTOR 2022). The metric will be calculated for Task 1 (pre-RT segmentations) and Task 2 (mid-RT segmentations) separately.


Docker Submission

Test data will not be made public, and participants will be required to submit docker containers of their solutions. More details will be provided soon.