We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
Online ordering will be unavailable from 17:00 GMT on Friday, April 25 until 17:00 GMT on Sunday, April 27 due to maintenance. We apologise for the inconvenience.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Electronic Health Records (EHR) analysis is pivotal in advancing medical research. Numerous real-world EHR data providers offer data access through exported datasets. While enabling profound research possibilities, exported EHR data requires quality control and restructuring for meaningful analysis. Challenges arise in medical events (e.g., diagnoses or procedures) sequence analysis, which provides critical insights into conditions, treatments, and outcomes progression. Identifying causal relationships, patterns, and trends requires a more complex approach to data mining and preparation.
Methods:
This paper introduces EHRchitect – an application written in Python that addresses the quality control challenges by automating dataset transformation, facilitating the creation of a clean, formatted, and optimized MySQL database (DB), and sequential data extraction according to the user’s configuration.
Results:
The tool creates a clean, formatted, and optimized DB, enabling medical event sequence data extraction according to users’ study configuration. Event sequences encompass patients’ medical events in specified orders and time intervals. The extracted data are presented as distributed Parquet files, incorporating events, event transitions, patient metadata, and events metadata. The concurrent approach allows effortless scaling for multi-processor systems.
Conclusion:
EHRchitect streamlines the processing of large EHR datasets for research purposes. It facilitates extracting sequential event-based data, offering a highly flexible framework for configuring event and timeline parameters. The tool delivers temporal characteristics, patient demographics, and event metadata to support comprehensive analysis. The developed tool significantly reduces the time required for dataset acquisition and preparation by automating data quality control and simplifying event extraction.
Within an infrastructure to monitor vaccine effectiveness (VE) against hospitalization due to COVID-19 and COVID-19 related deaths from November 2022 to July 2023 in seven countries in real-world conditions (VEBIS network), we compared two approaches: (a) estimating VE of the first, second or third COVID-19 booster doses administered during the autumn of 2022, and (b) estimating VE of the autumn vaccination dose regardless of the number of prior doses (autumnal booster approach). Retrospective cohorts were constructed using Electronic Health Records at each participating site. Cox regressions with time-changing vaccination status were fit and site-specific estimates were combined using random-effects meta-analysis. VE estimates with both approaches were mostly similar, particularly shortly after the start of the vaccination campaign, and showed a similar timing of VE waning. However, autumnal booster estimates were more precise and showed a clearer trend, particularly compared to third booster estimates, as calendar time increased after the vaccination campaign and during periods of lower SARS-CoV-2 activity. Moreover, the decrease in protection by increasing calendar time was more clear and precise than when comparing protection by number of doses. Therefore, estimating VE under an autumnal booster framework emerges as a preferred method for future monitoring of COVID-19 vaccination campaigns.
A substantial subset of patients with major depressive disorder (MDD) experience treatment-resistant depression (TRD), typically defined as failure to respond to at least two sequential antidepressant trials at adequate dose and length.
Aims
To examine clinical and service-level associations of TRD, and the experiences of people with TRD and clinicians involved in their care within a large, diverse National Health Service trust in the UK.
Method
This mixed-methods study integrated quantitative analysis of electronic health records with thematic analysis of semi-structured interviews. Chi-squared tests and one-way analysis of variance were used to assess associations between lines of antidepressant treatments and sociodemographic and clinical variables, and binary logistic regression was used to identify associations of TRD status.
Results
Nearly half (48%) of MDD patients met TRD criteria, with 36.9% having trialled ≥4 antidepressant treatments. People with TRD had higher rates of recurrent depression (odds ratio = 1.24, 95% CI: 1.05–1.45, P = 0.008), comorbid anxiety disorders (odds ratio = 1.21, 95% CI: 1.03–1.41, P = 0.019), personality disorders (odds ratio=1.35, 95% CI: 1.10–1.65, P = 0.003), self-harm (odds ratio = 1.76, 95% CI: 1.06–2.93, P = 0.029) and cardiovascular diseases (odds ratio = 1.46, 95% CI: 1.02–2.07, P = 0.0374). Greater treatment resistance was linked to increased economic inactivity and functional loss. Qualitative findings revealed severe emotional distress and frustration with existing treatments, as well as organisational and illness-related barriers to effective care.
Conclusions
TRD is characterised by increasing mental and physical morbidity and functional decline, with individuals experiencing barriers to effective care. Improved pathways, service structures and more effective biological and psychological interventions are needed.
The availability of data is a condition for the development of AI. This is no different in the context of healthcare-related AI applications. Healthcare data are required in the research, development, and follow-up phases of AI. In fact, data collection is also necessary to establish evidence of compliance with legislation. Several legislative instruments, such as the Medical Devices Regulation and the AI Act, enacted data collection obligations to establish (evidence of) the safety of medical therapies, devices, and procedures. Increasingly, such health-related data are collected in the real world from individual data subjects. The relevant legal instruments therefore explicitly mention they shall be without prejudice to other legal acts, including the GDPR. Following an introduction to real-world data, evidence, and electronic health records, this chapter considers the use of AI for healthcare from the perspective of healthcare data. It discusses the role of data custodians, especially when confronted with a request to share healthcare data, as well as the impact of concepts such as data ownership, patient autonomy, informed consent, and privacy and data protection-enhancing techniques.
Exposure to maternal mental illness during foetal development may lead to altered development, resulting in permanent changes in offspring functioning.
Aims
To assess whether there is an association between prenatal maternal psychiatric disorders and offspring behavioural problems in early childhood, using linked health administrative data and the Australian Early Development Census from New South Wales, Australia.
Method
The sample included all mother–child pairs of children who commenced full-time school in 2009 in New South Wales, and met the inclusion criteria (N = 69 165). Univariable logistic regression analysis assessed unadjusted associations between categories of maternal prenatal psychiatric disorders with indicators of offspring behavioural problems. Multivariable logistic regression adjusted the associations of interest for psychiatric categories and a priori selected covariates. Sensitivity analyses included adjusting the final model for primary psychiatric diagnoses and assessing association of interest for effect modification by child's biological gender.
Results
Children exposed in the prenatal period to maternal psychiatric disorders had greater odds of being developmentally vulnerable in their first year of school. Children exposed to maternal anxiety disorders prenatally had the greatest odds for behavioural problems (adjusted odds ratio 1.98; 95% CI 1.43–2.69). A statistically significant interaction was found between child biological gender and prenatal hospital admissions for substance use disorders, for emotional subdomains, aggression and hyperactivity/inattention.
Conclusions
Children exposed to prenatal maternal mental illness had greater odds for behavioural problems, independent of postnatal exposure. Those exposed to prenatal maternal anxiety were at greatest risk, highlighting the need for targeted interventions for, and support of, families with mental illness.
Electronic health records and patient portals are increasingly utilized to enhance research recruitment efficiency, yet response patterns across patient groups remain unclear. We examined 10 studies at Emory Healthcare that used these tools to identify and recruit 24,000 patients over 1 year. Response rates were lower among males and Black individuals, though study interest was higher among respondents. Interest was also greater among those with frequent healthcare interactions and lower comorbidity. In a large academic health system, portal-based recruitment offered a streamlined approach to research recruitment and patient engagement, with minor variations across patient characteristics warranting continued study.
Attempts to use artificial intelligence (AI) in psychiatric disorders show moderate success, highlighting the potential of incorporating information from clinical assessments to improve the models. This study focuses on using large language models (LLMs) to detect suicide risk from medical text in psychiatric care.
Aims
To extract information about suicidality status from the admission notes in electronic health records (EHRs) using privacy-sensitive, locally hosted LLMs, specifically evaluating the efficacy of Llama-2 models.
Method
We compared the performance of several variants of the open source LLM Llama-2 in extracting suicidality status from 100 psychiatric reports against a ground truth defined by human experts, assessing accuracy, sensitivity, specificity and F1 score across different prompting strategies.
Results
A German fine-tuned Llama-2 model showed the highest accuracy (87.5%), sensitivity (83.0%) and specificity (91.8%) in identifying suicidality, with significant improvements in sensitivity and specificity across various prompt designs.
Conclusions
The study demonstrates the capability of LLMs, particularly Llama-2, in accurately extracting information on suicidality from psychiatric records while preserving data privacy. This suggests their application in surveillance systems for psychiatric emergencies and improving the clinical management of suicidality by improving systematic quality control and research.
Social determinants of health (SDoH), such as socioeconomics and neighborhoods, strongly influence health outcomes. However, the current state of standardized SDoH data in electronic health records (EHRs) is lacking, a significant barrier to research and care quality.
Methods:
We conducted a PubMed search using “SDOH” and “EHR” Medical Subject Headings terms, analyzing included articles across five domains: 1) SDoH screening and assessment approaches, 2) SDoH data collection and documentation, 3) Use of natural language processing (NLP) for extracting SDoH, 4) SDoH data and health outcomes, and 5) SDoH-driven interventions.
Results:
Of 685 articles identified, 324 underwent full review. Key findings include implementation of tailored screening instruments, census and claims data linkage for contextual SDoH profiles, NLP systems extracting SDoH from notes, associations between SDoH and healthcare utilization and chronic disease control, and integrated care management programs. However, variability across data sources, tools, and outcomes underscores the need for standardization.
Discussion:
Despite progress in identifying patient social needs, further development of standards, predictive models, and coordinated interventions is critical for SDoH-EHR integration. Additional database searches could strengthen this scoping review. Ultimately, widespread capture, analysis, and translation of multidimensional SDoH data into clinical care is essential for promoting health equity.
Society of Thoracic Surgeons Congenital Heart Surgery Database is the largest congenital heart surgery database worldwide but does not provide information beyond primary episode of care. Linkage to hospital electronic health records would capture complications and comorbidities along with long-term outcomes for patients with CHD surgeries. The current study explores linkage success between Society of Thoracic Surgeons Congenital Heart Surgery Database and electronic health record data in North Carolina and Georgia.
Methods:
The Society of Thoracic Surgeons Congenital Heart Surgery Database was linked to hospital electronic health records from four North Carolina congenital heart surgery using indirect identifiers like date of birth, sex, admission, and discharge dates, from 2008 to 2013. Indirect linkage was performed at the admissions level and compared to two other linkages using a “direct identifier,” medical record number: (1) linkage between Society of Thoracic Surgeons Congenital Heart Surgery Database and electronic health records from a subset of patients from one North Carolina institution and (2) linkage between Society of Thoracic Surgeons data from two Georgia facilities and Georgia’s CHD repository, which also uses direct identifiers for linkage.
Results:
Indirect identifiers successfully linked 79% (3692/4685) of Society of Thoracic Surgeons Congenital Heart Surgery Database admissions across four North Carolina hospitals. Direct linkage techniques successfully matched Society of Thoracic Surgeons Congenital Heart Surgery Database to 90.2% of electronic health records from the North Carolina subsample. Linkage between Society of Thoracic Surgeons and Georgia’s CHD repository was 99.5% (7,544/7,585).
Conclusions:
Linkage methodology was successfully demonstrated between surgical data and hospital-based electronic health records in North Carolina and Georgia, uniting granular procedural details with clinical, developmental, and economic data. Indirect identifiers linked most patients, consistent with similar linkages in adult populations. Future directions include applying these linkage techniques with other data sources and exploring long-term outcomes in linked populations.
The expansion of electronic health record (EHR) data networks over the last two decades has significantly improved the accessibility and processes around data sharing. However, there lies a gap in meeting the needs of Clinical and Translational Science Award (CTSA) hubs, particularly related to real-world data (RWD) and real-world evidence (RWE).
Methods:
We adopted a mixed-methods approach to construct a comprehensive needs assessment that included: (1) A Landscape Context analysis to understand the competitive environment; and (2) Customer Discovery to identify stakeholders and the value proposition related to EHR data networks. Methods included surveys, interviews, and a focus group.
Results:
Thirty-two CTSA institutions contributed data for analysis. Fifty-four interviews and one focus group were conducted. The synthesis of our findings pivots around five emergent themes: (1) CTSA segmentation needs vary according to resources; (2) Team science is key for success; (3) Quality of data generates trust in the network; (4) Capacity building is defined differently by researcher career stage and CTSA existing resources; and (5) Researchers’ unmet needs.
Conclusions:
Based on the results, EHR data networks like ENACT that would like to meet the expectations of academic research centers within the CTSA consortium need to consider filling the gaps identified by our study: foster team science, improve workforce capacity, achieve data governance trust and efficiency of operation, and aid Learning Health Systems with validating, applying, and scaling the evidence to support quality improvement and high-value care. These findings align with the NIH NCATS Strategic Plan for Data Science.
The progression of long-term diabetes complications has led to a decreased quality of life. Our objective was to evaluate the adverse outcomes associated with diabetes based on a patient’s clinical profile by utilizing a multistate modeling approach.
Methods:
This was a retrospective study of diabetes patients seen in primary care practices from 2013 to 2017. We implemented a five-state model to examine the progression of patients transitioning from one complication to having multiple complications. Our model incorporated high dimensional covariates from multisource data to investigate the possible effects of different types of factors that are associated with the progression of diabetes.
Results:
The cohort consisted of 10,596 patients diagnosed with diabetes and no previous complications associated with the disease. Most of the patients in our study were female, White, and had type 2 diabetes. During our study period, 5928 did not develop complications, 3323 developed microvascular complications, 1313 developed macrovascular complications, and 1129 developed both micro- and macrovascular complications. From our model, we determined that patients had a 0.1334 [0.1284, .1386] rate of developing a microvascular complication compared to 0.0508 [0.0479, .0540] rate of developing a macrovascular complication. The area deprivation index score we incorporated as a proxy for socioeconomic information indicated that patients who reside in more disadvantaged areas have a higher rate of developing a complication compared to those who reside in least disadvantaged areas.
Conclusions:
Our work demonstrates how a multistate modeling framework is a comprehensive approach to analyzing the progression of long-term complications associated with diabetes.
Multisector stakeholders, including, community-based organizations, health systems, researchers, policymakers, and commerce, increasingly seek to address health inequities that persist due to structural racism. They require accessible tools to visualize and quantify the prevalence of social drivers of health (SDOH) and correlate them with health to facilitate dialog and action. We developed and deployed a web-based data visualization platform to make health and SDOH data available to the community. We conducted interviews and focus groups among end users of the platform to establish needs and desired platform functionality. The platform displays curated SDOH and de-identified and aggregated local electronic health record data. The resulting Social, Environmental, and Equity Drivers (SEED) Health Atlas integrates SDOH data across multiple constructs, including socioeconomic status, environmental pollution, and built environment. Aggregated health prevalence data on multiple conditions can be visualized in interactive maps. Data can be visualized and downloaded without coding knowledge. Visualizations facilitate an understanding of community health priorities and local health inequities. SEED could facilitate future discussions on improving community health and health equity. SEED provides a promising tool that members of the community and researchers may use in their efforts to improve health equity.
Concern that self-harm and mental health conditions are increasing in university students may reflect widening access to higher education, existing population trends and/or stressors associated with this setting.
Aims
To compare population-level data on self-harm, neurodevelopmental and mental health conditions between university students and non-students with similar characteristics before and during enrolment.
Method
This cohort study linked electronic records from the Higher Education Statistics Agency for 2012–2018 to primary and secondary healthcare records. Students were undergraduates aged 18 to 24 years at university entry. Non-students were pseudo-randomly selected based on an equivalent age distribution. Logistic regressions were used to calculate odds ratios. Poisson regressions were used to calculate incidence rate ratios (IRR).
Results
The study included 96 760 students and 151 795 non-students. Being male, self-harm and mental health conditions recorded before university entry, and higher deprivation levels, resulted in lower odds of becoming a student and higher odds of drop-out from university. IRRs for self-harm, depression, anxiety, autism spectrum disorder (ASD), drug use and schizophrenia were lower for students. IRRs for self-harm, depression, attention-deficit hyperactivity disorder, ASD, alcohol use and schizophrenia increased more in students than in non-students over time. Older students experienced greater risk of self-harm and mental health conditions, whereas younger students were more at risk of alcohol use than non-student counterparts.
Conclusions
Mental health conditions in students are common and diverse. While at university, students require person-centred stepped care, integrated with local third-sector and healthcare services to address specific conditions.
The serotonin 4 receptor (5-HT4R) is a promising target for the treatment of depression. Highly selective 5-HT4R agonists, such as prucalopride, have antidepressant-like and procognitive effects in preclinical models, but their clinical effects are not yet established.
Aims
To determine whether prucalopride (a 5-HT4R agonist and licensed treatment for constipation) is associated with reduced incidence of depression in individuals with no past history of mental illness, compared with anti-constipation agents with no effect on the central nervous system.
Method
Using anonymised routinely collected data from a large-scale USA electronic health records network, we conducted an emulated target trial comparing depression incidence over 1 year in individuals without prior diagnoses of major mental illness, who initiated treatment with prucalopride versus two alternative anti-constipation agents that act by different mechanisms (linaclotide and lubiprostone). Cohorts were matched for 121 covariates capturing sociodemographic factors, and historical and/or concurrent comorbidities and medications. The primary outcome was a first diagnosis of major depressive disorder (ICD-10 code F32) within 1 year of the index date. Robustness of the results to changes in model and population specification was tested. Secondary outcomes included a first diagnosis of six other neuropsychiatric disorders.
Results
Treatment with prucalopride was associated with significantly lower incidence of depression in the following year compared with linaclotide (hazard ratio 0.87, 95% CI 0.76–0.99; P = 0.038; n = 8572 in each matched cohort) and lubiprostone (hazard ratio 0.79, 95% CI 0.69–0.91; P < 0.001; n = 8281). Significantly lower risks of all mood disorders and psychosis were also observed. Results were similar across robustness analyses.
Conclusions
These findings support preclinical data and suggest a role for 5-HT4R agonists as novel agents in the prevention of major depression. These findings should stimulate randomised controlled trials to confirm if these agents can serve as a novel class of antidepressant within a clinical setting.
There is a lack of data on mental health service utilisation and outcomes for people with experience of forced migration living in the UK. Details about migration experiences documented in free-text fields in electronic health records might be harnessed using novel data science methods; however, there are potential limitations and ethical concerns.
Public health data available for research are booming with the expansion of Big Data. This reshapes the data sources for DOHaD enquiries while offering ample opportunities to advance epidemiological modelling within the DOHaD framework. However, Big Data also raises a plethora of methodological challenges related to accurately characterising population health trajectories and biological mechanisms, within heterogeneous and dynamic sociodemographic contexts, and a fast-moving technological landscape. In this chapter, we explore the methodological challenges of research into the causal mechanisms of the transgenerational transfer of disease risks that characterise the DOHaD research landscape and consider these challenges in the light of novel technologies within artificial intelligence (AI) and Big Data. Such technologies could push further the collating of multidimensional data, including electronic health records and tissue banks, to offer new insights. While such methodological and technological innovations may drive clearer and reproducible evidence within DOHaD research, as we argue, many challenges remain, including data quality, interpretability, generalisability, and ethics.
This study serves as an exemplar to demonstrate the scalability of a research approach using survival analysis applied to general practice electronic health record data from multiple sites. Collection of these data, the subsequent analysis, and the preparation of practice-specific reports were performed using a bespoke distributed data collection and analysis software tool.
Background:
Statins are a very commonly prescribed medication, yet there is a paucity of evidence for their benefits in older patients. We examine the relationship between statin prescriptions for general practice patients over 75 and all-cause mortality.
Methods:
We carried out a retrospective cohort study using survival analysis applied to data extracted from the electronic health records of five Australian general practices.
Findings:
The data from 8025 patients were analysed. The median duration of follow-up was 6.48 years. Overall, 52 015 patient-years of data were examined, and the outcome of death from any cause was measured in 1657 patients (21%), with the remainder being censored. Adjusted all-cause mortality was similar for participants not prescribed statins versus those who were (HR 1.05, 95% CI 0.92–1.20, P = 0.46), except for patients with diabetes for whom all-cause mortality was increased (HR = 1.29, 95% CI: 1.00–1.68, P = 0.05). In contrast, adjusted all-cause mortality was significantly lower for patients deprescribed statins compared to those who were prescribed statins (HR 0.81, 95% CI 0.70–0.93, P < 0.001), including among females (HR = 0.75, 95% CI: 0.61–0.91, P < 0.001) and participants treated for secondary prevention (HR = 0.72, 95% CI: 0.60–0.86, P < 0.001). This study demonstrated the scalability of a research approach using survival analysis applied to general practice electronic health record data from multiple sites. We found no evidence of increased mortality due to statin-deprescribing decisions in primary care.
People under the care of mental health services are at increased risk of suicide. Existing studies are small in scale and lack comparisons.
Aims
To identify opportunities for suicide prevention and underpinning data enhancement in people with recent contact with mental health services.
Method
This population-based study includes people who died by suicide in the year following a mental health services contact in Wales, 2001–2015 (cases), paired with similar patients who did not die by suicide (controls). We linked the National Confidential Inquiry into Suicide and Safety in Mental Health and the Suicide Information Database – Cymru with primary and secondary healthcare records. We present results of conditional logistic regression.
Results
We matched 1031 cases with 5155 controls. In the year before their death, 98.3% of cases were in contact with healthcare services, and 28.5% presented with self-harm. Cases had more emergency department contacts (odds ratio 2.4, 95% CI 2.1–2.7) and emergency hospital admissions (odds ratio 1.5, 95% CI 1.4–1.7), but fewer primary care contacts (odds ratio 0.7, 95% CI 0.6–0.9) and out-patient appointments (odds ratio 0.2, 95% CI 0.2–0.3) than controls. Odds ratios were larger in females than males for injury and poisoning (odds ratio: 3.3 (95% CI 2.5–4.5) v. 2.6 (95% CI 2.1–3.1)).
Conclusions
We may be missing existing opportunities to intervene, particularly in emergency departments and hospital admissions with self-harm presentations and with unattributed self-harm, especially in females. Prevention efforts should focus on strengthening routine care contacts, responding to emergency contacts and better self-harm care. There are benefits to enhancing clinical audit systems with routinely collected data.
Social and environmental determinants of health (SEDoH) are crucial for achieving a holistic understanding of patient health. In fact, geographic factors may have more influence on health outcomes than patients’ genetics. Integrating SEDoH into the electronic health record (EHR), however, poses notable technical and compliance-related challenges. We evaluated barriers to the integration of SEDoH in the EHR and developed a privacy-preserving strategy to mitigate risk of protected health information exposure. Using coded identifiers for patient addresses, the strategy evaluates an alternative approach to ensure efficient, secure geocoding of data while preserving privacy throughout the data enrichment processes from numerous SEDoH data sources.
The focus on social determinants of health (SDOH) and their impact on health outcomes is evident in U.S. federal actions by Centers for Medicare & Medicaid Services and Office of National Coordinator for Health Information Technology. The disproportionate impact of COVID-19 on minorities and communities of color heightened awareness of health inequities and the need for more robust SDOH data collection. Four Clinical and Translational Science Award (CTSA) hubs comprising the Texas Regional CTSA Consortium (TRCC) undertook an inventory to understand what contextual-level SDOH datasets are offered centrally and which individual-level SDOH are collected in structured fields in each electronic health record (EHR) system potentially for all patients.
Methods:
Hub teams identified American Community Survey (ACS) datasets available via their enterprise data warehouses for research. Each hub’s EHR analyst team identified structured fields available in their EHR for SDOH using a collection instrument based on a 2021 PCORnet survey and conducted an SDOH field completion rate analysis.
Results:
One hub offered ACS datasets centrally. All hubs collected eleven SDOH elements in structured EHR fields. Two collected Homeless and Veteran statuses. Completeness at four hubs was 80%–98%: Ethnicity, Race; < 10%: Education, Financial Strain, Food Insecurity, Housing Security/Stability, Interpersonal Violence, Social Isolation, Stress, Transportation.
Conclusion:
Completeness levels for SDOH data in EHR at TRCC hubs varied and were low for most measures. Multiple system-level discussions may be necessary to increase standardized SDOH EHR-based data collection and harmonization to drive effective value-based care, health disparities research, translational interventions, and evidence-based policy.