KDD CUP 2008 - Predicting Cancer from Mammography Data

Claudia Perlich
IBM

The KDD CUP 2008 was organized by Siemens Medical Solutions (http://www.kddcup2008.com/). They provided mammography based data for around 1700 patients. Siemens used proprietary software to extract from the original digital image data candidate regions and to characterize such regions in terms of 117 normalized numeric features with unknown interpretation. Task 1 was the identification of malignant candidate regions in mammography pictures with a ranking-based evaluation measure similar to ROC. Task 2 required submitting the longest list of healthy patients. Any submission with even one false negative was disqualified. Our winning submission to both tasks exploited a) the properties of the evaluation metrics to improve the model scores from of a linear SVM and b) some form of data leakage that resulted in predictive information in the patient identifiers.

Paper