To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.

 
 

Description

Objectives: This review aims to critically evaluate current evidence on the application of deep learning in oral cancer screening, with a focus on its effectiveness in early detection, diagnostic accuracy, and potential to improve patient outcomes.

Methods: A comprehensive literature search was conducted across multiple databases, including PubMed, EBSCO, Google Scholar, and National Institute of Health (NIH). The reviewers used various MeSH terms identifying titles and abstracts relevant to the study. Screening criteria included: peer-reviewed original articles, literature written in English, human-based studies, and publications of literature from 2020 to 2025. Full text articles of the selected literature were evaluated based on screening criteria.

Results: Eleven studies were included: nine evaluated convolutional neural networks (CNNs), two assessed large language models (LLMs), and one compared a Swin transformer to a CNN. Six studies utilized clinical intraoral photographs, two used smartphone-captured images, and one study each used optical coherence tomography (OCT), portable endoscope images, and narrow-band imaging (NBI) endoscopic video frames. CNNs consistently achieved the highest diagnostic performance, with accuracies ranging from 83–99%, sensitivities 83–99%, specificities 81–100%, and AUC values between 0.83–0.99. LLMs (ChatGPT 3.5, 4.0, 4o, Gemini) showed lower and more variable performance, with accuracies of 36–67%, sensitivities from 18% (image-only) to 100% (image + clinical context), and specificities of 52–97%. The Swin transformer achieved F1-scores of 0.83–0.84 and AUCs of 0.81–0.93, performing comparably to the high performing CNNs.

Conclusions: Deep learning models, particularly CNN’s and transformers showed strong potential for enhancing early detection and diagnostic accuracy in oral cancer screening. However, large language models (LLM) demonstrated limited reliability for diagnostic applications.

Document Type

Poster

Share

COinS
 

Advancing Oral Cancer Screening Through Deep Learning Models: A Systematic Review

Objectives: This review aims to critically evaluate current evidence on the application of deep learning in oral cancer screening, with a focus on its effectiveness in early detection, diagnostic accuracy, and potential to improve patient outcomes.

Methods: A comprehensive literature search was conducted across multiple databases, including PubMed, EBSCO, Google Scholar, and National Institute of Health (NIH). The reviewers used various MeSH terms identifying titles and abstracts relevant to the study. Screening criteria included: peer-reviewed original articles, literature written in English, human-based studies, and publications of literature from 2020 to 2025. Full text articles of the selected literature were evaluated based on screening criteria.

Results: Eleven studies were included: nine evaluated convolutional neural networks (CNNs), two assessed large language models (LLMs), and one compared a Swin transformer to a CNN. Six studies utilized clinical intraoral photographs, two used smartphone-captured images, and one study each used optical coherence tomography (OCT), portable endoscope images, and narrow-band imaging (NBI) endoscopic video frames. CNNs consistently achieved the highest diagnostic performance, with accuracies ranging from 83–99%, sensitivities 83–99%, specificities 81–100%, and AUC values between 0.83–0.99. LLMs (ChatGPT 3.5, 4.0, 4o, Gemini) showed lower and more variable performance, with accuracies of 36–67%, sensitivities from 18% (image-only) to 100% (image + clinical context), and specificities of 52–97%. The Swin transformer achieved F1-scores of 0.83–0.84 and AUCs of 0.81–0.93, performing comparably to the high performing CNNs.

Conclusions: Deep learning models, particularly CNN’s and transformers showed strong potential for enhancing early detection and diagnostic accuracy in oral cancer screening. However, large language models (LLM) demonstrated limited reliability for diagnostic applications.