23 Generative artificial intelligence for automated unstructured MRI data extraction in prostate cancer care

William Pace; Andrew Liu; Marvin Carlisle; Robert Krumm; Janet Cowan; Peter Carroll; Matthew Cooperberg; Anobel Odisho

doi:10.1017/cts.2024.714

23 Generative artificial intelligence for automated unstructured MRI data extraction in prostate cancer care

Published online by Cambridge University Press: 11 April 2025

Andrew Liu ,

Matthew Cooperberg and

Anobel Odisho

Show author details

William Pace: Affiliation:
University of California, San Francisco
Andrew Liu: Affiliation:
University of California, San Francisco
Marvin Carlisle: Affiliation:
University of California, San Francisco
Robert Krumm: Affiliation:
University of California, San Francisco
Janet Cowan: Affiliation:
University of California, San Francisco
Peter Carroll: Affiliation:
University of California, San Francisco
Matthew Cooperberg: Affiliation:
University of California, San Francisco
Anobel Odisho: Affiliation:
University of California, San Francisco

Article contents

Abstract

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Objectives/Goals: Magnetic resonance imaging (MRI) reports are stored as unstructured text in the electronic health record (EHR), rendering the data inaccessible. Large language models (LLM) are a new tool for analyzing and generating unstructured text. We aimed to evaluate how well an LLM extracts data from MRI reports compared to manually abstracted data. Methods/Study Population: The University of California, San Francisco has deployed a HIPAA-compliant internal LLM tool utilizing GPT-4 technology and approved for PHI use. We developed a detailed prompt instructing the LLM to extract data elements from prostate MRI reports and to output the results in a structured, computer-readable format. A data pipeline was built using the OpenAI Application Programming Interface (API) to automatically extract distinct data elements from the MRI report that are important in prostate cancer care. Each prompt was executed five times and data were compared with the modal responses to determine variability of responses. Accuracy was also assessed. Results/Anticipated Results: Across 424 prostate MRI reports, GPT-4 response accuracy was consistently above 95% for most parameters. Individual field accuracies were 98.3% (96.3–99.3%) for PSA density, 97.4% (95.4–98.7%) for extracapsular extension, 98.1% (96.3–99.2%) for TNM Stage, had an overall median of 98.1% (96.3–99.2%), a mean of 97.2% (95.2–98.3%), and a range of 99.8% (98.7–100.0%) to 87.7% (84.2–90.7%). Response variability over five repeated runs ranged from 0.14% to 3.61%, differed based on the data element extracted (p Discussion/Significance of Impact: GPT-4 was highly accurate in extracting data points from prostate cancer MRI reports with low upfront programming requirements. This represents an effective tool to expedite medical data extraction for clinical and research use cases.

Type: Informatics, AI and Data Science
Information: Journal of Clinical and Translational Science , Volume 9 , Issue s1 , April 2025 , pp. 8

DOI: https://doi.org/10.1017/cts.2024.714 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.

Article contents

23 Generative artificial intelligence for automated unstructured MRI data extraction in prostate cancer care

Abstract

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests