Towards a Clinically Useful AI Tool for Prostate Cancer Detection: Recommendations from a PANDA Dataset Analysis
T. J. Hart, Chloe Engler Hart, Aaryn S. Frewing, Dr. Paul M. Urie M.D./ Ph.D., Dr. Dennis Della Corte Dr. rer. nat*
Department of Physics and Astronomy, Brigham Young University, Provo.
*Corresponding author
Dr. Dennis Della Corte, Assistant Professor, Department of Physics & Astronomy Brigham Young University N361 ESC, BYU, Provo.
Email: Dennis.dellacorte@byu.edu
DOI: 10.55920/JCRMHS.2023.05.001216
Figure 1: Binary comparison of regions of interest between pathologist annotation rounds (Patho 1 and Patho 2), annotation labels from PANDA test set (Labels), annotation from network inferences (Model). Agreement in green, disagreement in red. Kappa (K) - see methods for details – and accuracy (Acc) for all six combinations are shown below each panel.
The model yields an accuracy of 0.82 and kappa of 0.64 when evaluated on the test set labels from PANDA. Similar values are found between pathologist’s two sets of annotations (as depicted in Fig. 1). It is worth noting that the comparison between the pathologist’s first and second annotations was conducted using the same specified regions employed in the model’s evaluation on the test set. A low level of agreement between the model or labels and the pathologist annotations (all the combinations of comparisons provided similar accuracy/kappa values, refer to Fig. 1) can be seen.
Figure 2: Provides an overview of the features expected in the ideal prostate adenocarcinoma dataset. Each feature is highlighted in the discussion section.
Based on our findings and previous research7 we put forth criteria for what would constitute the ideal high-quality prostate adenocarcinoma pathology dataset, as depicted in Figure 2. The dataset should be comprised of full-size WSIs, because they capture the entire tissue section at high resolution and allow analysis of tissue structures, cell morphology, and other relevant features, as opposed to patches or pixel clusters. The ideal dataset should be sufficiently sized for an algorithm to train on. We estimate this significant number to be 20,000 WSIs. Sufficient variation is necessary in the following three categories; patient demographics, prostate adenocarcinoma type; and adenocarcinoma severity.25 Finally, the dataset should be easily accessible to the public. It should be organized and stored in a consistent vendor agnostic format that allows researchers to retrieve and use the data efficiently. Providing open access or appropriate permissions for accessing the dataset encourages collaboration, accelerates research progress, and enables the development of innovative techniques for prostate adenocarcinoma diagnosis and treatment.
- Luca AR, Ursuleanu TF, Gheorghe L, et al. Impact of quality, type and volume of data used by deep learning models in the analysis of medical images. Informatics in Medicine Unlocked. 2022:100911.
- Komura D, IShIKawa S. Advanced deep learning applications in diagnostic pathology. Translational and Regulatory Sciences. 2021;3(2):36-42.
- Campanella G, Hanna MG, Geneslaw L, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine. 2019;25(8):1301-1309.
- Van der Laak J, Litjens G, Ciompi F. Deep learning in histopathology: the path to the clinic. Nature medicine. 2021;27(5):775-784.
- Bulten W, Kartasalo K, Chen P-HC, et al. [dataset] Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge. Nature medicine. 2022;28(1):154-163.
- Chen N, Zhou Q. The evolving Gleason grading system. Chinese Journal of Cancer Research. 2016;28(1):58.
- Frewing A, Gibson A, Robertson R, Urie P, Della Corte D. Don't fear the artificial intelligence: a systematic review of machine learning for prostate cancer detection in pathology. Archives of Pathology & Laboratory Medicine. 2023;5(1173)
- Raciti P, Sue J, Ceballos R, et al. Novel artificial intelligence system increases the detection of prostate cancer in whole slide images of core needle biopsies. Modern Pathology. 2020;33(10):2058-2066.
- Kott O, Linsley D, Amin A, et al. Development of a deep learning algorithm for the histopathologic diagnosis and Gleason grading of prostate cancer biopsies: a pilot study. European urology focus. 2021;7(2):347-351.
- da Silva LM, Pereira EM, Salles PG, et al. Independent real‐world application of a clinical‐grade automated prostate cancer detection system. The Journal of pathology. 2021;254(2):147-158.
- Jung M, Jin M-S, Kim C, et al. Artificial intelligence system shows performance at the level of uropathologists for the detection and grading of prostate cancer in core needle biopsy: an independent external validation study. Modern Pathology. 2022;35(10):1449-1457.
- Pantanowitz L, Quiroga-Garza GM, Bien L, et al. An artificial intelligence algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies: a blinded clinical validation and deployment study. The Lancet Digital Health. 2020;2(8):e407-e416.
- Ayyad SM, Shehata M, Shalaby A, et al. Role of AI and histopathological images in detecting prostate cancer: a survey. Sensors. 2021;21(8):2586.
- Bulten W, Pinckaers H, van Boven H, et al. Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. The Lancet Oncology. 2020;21(2):233-241.
- Ström P, Kartasalo K, Olsson H, et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study. The Lancet Oncology. 2020;21(2):222-232.
- Data from: [dataset] The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD). 2023. National Cancer Institute Center for Cancer Genomics.
- Data from: [dataset] Digital Pathology Dataset for Prostate Cancer Diagnosis. 2022. Zenodo. doi:10.5281/zenodo.5971764
- Data from: [dataset] SICAPv2 - Prostate Whole Slide Images with Gleason Grades Annotations. 2020. Mendeley Data. doi:10.17632/9xxm58dvs3.2
- Data from: [dataset] Prostate cancer ndpi images. 2016. Havard Dataverse. doi:10.7910/DVN/GG0D7G
- Kirillov A, Wu Y, He K, Girshick R. Pointrend: Image segmentation as rendering. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020:9799-9808.
- Byrt T, Bishop J, Carlin JB. Bias, prevalence and kappa. Journal of clinical epidemiology. 1993;46(5):423-429.
- Plazas M, Ramos-Pollán R, León F, Martínez F. Towards reduction of expert bias on Gleason score classification via a semi-supervised deep learning strategy. SPIE; 2022:710-717.
- Qiu Y, Hu Y, Kong P, et al. Automatic Prostate Gleason Grading Using Pyramid Semantic Parsing Network in Digital Histopathology. Frontiers in Oncology. 2022;12
- Nagpal K, Foote D, Liu Y, et al. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ digital medicine. 2019;2(1):48.
- Homeyer A, Geißler C, Schwen LO, et al. Recommendations on test datasets for evaluating AI solutions in pathology. arXiv preprint arXiv:220414226. 2022;