366 Universal representation of human diseases using large language models

Geoffrey Siwo; Ellen R. Bowen; Akbar K. Waljee

doi:10.1017/cts.2024.991

366 Universal representation of human diseases using large language models

Published online by Cambridge University Press: 11 April 2025

Geoffrey Siwo ,

Ellen R. Bowen and

Akbar K. Waljee

Show author details

Geoffrey Siwo: Affiliation:
University of Michigan Medical School
Ellen R. Bowen: Affiliation:
University of Michigan Medical School
Akbar K. Waljee: Affiliation:
University of Michigan Medical School

Article contents

Abstract

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Objectives/Goals: Understanding the interconnections among over 20,000 human diseases spanning organ systems could inform more precise diagnosis and treatment of diseases. Here, we examine whether the ability of large language models (LLMs) to learn universal representations of concepts can be leveraged to discover complex relationships across human diseases. Methods/Study Population: To address the challenge of computationally representing thousands diseases spanning multiple organ systems, we used internal representations of concepts by LLMs to encode diseases based on their descriptions from standard disease ontologies (ICD10 and Phecodes). To do this, we leveraged application programming interfaces (APIs) of three LLMs-GPT3.5, Mistral and Voyage to encode disease relationships. We then performed unsupervised clustering of the diseases using their encodings (embeddings) from each LLM to determine whether the resulting clusters reflect disease relationships. To enable deeper exploration of disease relationships, we developed interactive plots that provide a system level view of the relationships between thousands of diseases and their association with specific organ systems. Results/Anticipated Results: We found that unsupervised analysis of disease relationships using the LLM encodings reveal high similarities among diseases based on organ systems they affect. All the LLMs clustered diseases into groups largely defined by the organ systems they affect without being trained to specifically classify diseases into their corresponding organ system classification. An exception to this was tumors in which we observed that most tumors cluster together as a group irrespective of the organs they affect. Interestingly, we found that tumors affecting anatomically related organs show higher similarity to each other than to those affecting distantly related organs. In addition to anatomical relationships between diseases, we found that the LLM embeddings capture genetic relationships between diseases. Discussion/Significance of Impact: Overall, we found that the LLM-derived encodings uphold biologically and clinically significant relationships across organ systems and disease types. These results suggest that LLM encodings could provide a universal framework for representing diseases as computable phenotypes and enable the discovery of complex disease relationships.

Type: Informatics, AI and Data Science
Information: Journal of Clinical and Translational Science , Volume 9 , Issue s1 , April 2025 , pp. 113

DOI: https://doi.org/10.1017/cts.2024.991 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.

Article contents

366 Universal representation of human diseases using large language models

Abstract

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests