Search

EHRchitect: An open-source software tool for medical event sequences data extraction from Electronic Health Records
Kostiantyn Botnar, Justin T. Nguen, Madison G. Farnsworth, George Golovko, Kamil Khanipov
Journal:

Journal of Clinical and Translational Science / Volume 9 / Issue 1 / 2025

Published online by Cambridge University Press:

26 March 2025, e79
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Background:
Electronic Health Records (EHR) analysis is pivotal in advancing medical research. Numerous real-world EHR data providers offer data access through exported datasets. While enabling profound research possibilities, exported EHR data requires quality control and restructuring for meaningful analysis. Challenges arise in medical events (e.g., diagnoses or procedures) sequence analysis, which provides critical insights into conditions, treatments, and outcomes progression. Identifying causal relationships, patterns, and trends requires a more complex approach to data mining and preparation.
Methods:
This paper introduces EHRchitect – an application written in Python that addresses the quality control challenges by automating dataset transformation, facilitating the creation of a clean, formatted, and optimized MySQL database (DB), and sequential data extraction according to the user’s configuration.
Results:
The tool creates a clean, formatted, and optimized DB, enabling medical event sequence data extraction according to users’ study configuration. Event sequences encompass patients’ medical events in specified orders and time intervals. The extracted data are presented as distributed Parquet files, incorporating events, event transitions, patient metadata, and events metadata. The concurrent approach allows effortless scaling for multi-processor systems.
Conclusion:
EHRchitect streamlines the processing of large EHR datasets for research purposes. It facilitates extracting sequential event-based data, offering a highly flexible framework for configuring event and timeline parameters. The tool delivers temporal characteristics, patient demographics, and event metadata to support comprehensive analysis. The developed tool significantly reduces the time required for dataset acquisition and preparation by automating data quality control and simplifying event extraction.

10 - Data Processing
from Part III - Expanding the Multiverse
Cristobal Young, Cornell University, New York, Erin Cumberworth, Cornell University, New York
Book:

Multiverse Analysis

Published online:

28 February 2025

Print publication:

06 March 2025, pp 154-176
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Raw data require a great deal of cleaning, coding, and categorizing of observations. Vague standards for this data work can make it troublingly ad hoc, with much opportunity and temptation to influence the final results. Preprocessing rules and assumptions are not often seen as part of the model, but they can influence the result just as much as control variables or functional form assumptions. In this chapter, we discuss the main data processing decisions that analysts often face and how they can affect the results: coding and classifying of variables, processing anomalous and outlier observations, and the use of sample weights.

7 - Preparing Network Data
Jennifer M. Larson, Vanderbilt University, Tennessee
Book:

Designing Empirical Social Networks Research

Published online:

07 November 2024

Print publication:

21 November 2024, pp 102-112
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Once the information about nodes, links, and their substantive attributes has been collected, a bit more work is needed to prepare to use the data. This chapter covers this intermediate step, with tips for organizing and cleaning the data. Reading this chapter before collecting the data in the first place will help avoid some serious pitfalls. It covers ethical issues pertaining to collecting names (a necessary step in most methods of network elicitation), a method for automating the cleaning of name data, and robustness checks that can be done to assess the cleaning.

21 - Data Cleaning
from Part IV - Statistical Approaches
- By Solveig A. Cunningham, Jonathan A. Muir
Edited by Austin Lee Nichols, Central European University, Vienna, John Edlund, Rochester Institute of Technology, New York
Book:

The Cambridge Handbook of Research Methods and Statistics for the Social and Behavioral Sciences

Published online:

25 May 2023

Print publication:

08 June 2023, pp 443-467
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

High-quality data are necessary for drawing valid research conclusions, yet errors can occur during data collection and processing. These errors can compromise the validity and generalizability of findings. To achieve high data quality, one must approach data collection and management anticipating the errors that can occur and establishing procedures to address errors. This chapter presents best practices for data cleaning to minimize errors during data collection and to identify and address errors in the resulting data sets. Data cleaning begins during the early stages of study design, when data quality procedures are set in place. During data collection, the focus is on preventing errors. When entering, managing, and analyzing data, it is important to be vigilant in identifying and reconciling errors. During manuscript development, reporting, and presentation of results, all data cleaning steps taken should be documented and reported. With these steps, we can ensure the validity, reliability, and representative nature of the results of our research.

9 - Questionnaires to Elicit Quantitative Data
from Part 2 - Direct Methods of Attitude Elicitation
- By Ruth Kircher
Edited by Ruth Kircher, Mercator European Research Centre on Multilingualism and Language Learning, and Fryske Akademy, Netherlands, Lena Zipp, Universität Zürich
Book:

Research Methods in Language Attitudes

Published online:

25 June 2022

Print publication:

07 July 2022, pp 129-144
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The questionnaire, one of the most frequently used methods in the study of language attitudes, can be used to elicit both qualitative and quantitative data. This chapter focuses on the questionnaire as a means of eliciting quantitative data by means of closed questions. It begins by examining the strengths of doing this (e.g. the fact that the resulting data can easily be compared and analysed across participants) as well as the limitations (e.g. the fact that issues unforeseen by the researcher usually do not come to the fore). The chapter then discusses key issues in research planning and design: for example, question types, question wording, question order, reliability and validity, and more general issues regarding questionnaire design. The chapter also considers questionnaire distribution. The exploration of data analysis and interpretation focuses on data cleaning and coding, statistical analyses, and some points of caution regarding the interpretation of findings from questionnaire-based studies. A case study of language attitudes in Quebec serves to illustrate the main points made in the chapter. The chapter concludes with further important considerations regarding the context-specificity of findings and the benefits of combining questionnaires with other methods of attitude elicitation.

7 - Compensation Analytics II
Jed DeVaro, California State University, East Bay
Book:

Strategic Compensation and Talent Management

Published online:

23 March 2020

Print publication:

02 April 2020, pp 143-171
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter presents a detailed example that applies the compensation analytics concepts developed in Chapter 6. The reader is assumed to be a compensation consultant charged with evaluating whether gender-based discrimination in pay is present in a public university system in the sciences. Section 7.1 walks through the analysis step-by-step, from formulating the business question, to acquiring and cleaning data, to analyzing the data and interpreting the results from voluminous statistical output in light of the business question. Section 7.2 covers exploratory data mining, causality, and experiments. Exploratory data mining covers situations in which the manager does not know in advance which relationships in the data will be of interest, in contrast to the example in section 7.1 in which a statistical model and specific measures could be constructed that were directly tailored to address the business question at hand. Section 7.2 covers the challenges associated with establishing causality in compensation research and how experiments can sometimes be designed to address those challenges. Randomization and some pitfalls associated with compensation experiments are also covered

7 - Compensation Analytics II
Jed DeVaro, California State University, East Bay
Book:

Strategic Compensation and Talent Management

Published online:

23 March 2020

Print publication:

02 April 2020, pp 143-171
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter presents a detailed example that applies the compensation analytics concepts developed in Chapter 6. The reader is assumed to be a compensation consultant charged with evaluating whether gender-based discrimination in pay is present in a public university system in the sciences. Section 7.1 walks through the analysis step-by-step, from formulating the business question, to acquiring and cleaning data, to analyzing the data and interpreting the results from voluminous statistical output in light of the business question. Section 7.2 covers exploratory data mining, causality, and experiments. Exploratory data mining covers situations in which the manager does not know in advance which relationships in the data will be of interest, in contrast to the example in section 7.1 in which a statistical model and specific measures could be constructed that were directly tailored to address the business question at hand. Section 7.2 covers the challenges associated with establishing causality in compensation research and how experiments can sometimes be designed to address those challenges. Randomization and some pitfalls associated with compensation experiments are also covered

6 - Compensation Analytics I
Jed DeVaro, California State University, East Bay
Book:

Strategic Compensation and Talent Management

Published online:

23 March 2020

Print publication:

02 April 2020, pp 107-142
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter responds to the growing importance of business analytics on "big data" in managerial decision-making, by providing a comprehensive primer on analyzing compensation data. All aspects of compensation analytics are covered, starting with data acquisition, types of data, and formulation of a business question that can be informed by data analysis. A detailed, hands-on treatment of data cleaning is provided, equipping readers to prepare data for analysis by detecting and fixing data problems. Descriptive statistics are reviewed, and their utility in data cleaning explicated. Graphical methods are used in examples to detect and trim outliers. The basics of linear regression analysis are covered, with an emphasis on application and interpreting results in the context of the business question(s) posed. One section covers the question of whether or not the pay measure (as a dependent variable) should be transformed via a logarithm, and the implications of that choice for interpreting the results are explained. Precision of regression estimates is covered via an intuitive, non-technical treatment of standard errors. An appendix covers nonlinear relationships among variables.

6 - Compensation Analytics I
Jed DeVaro, California State University, East Bay
Book:

Strategic Compensation and Talent Management

Published online:

23 March 2020

Print publication:

02 April 2020, pp 107-142
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter responds to the growing importance of business analytics on "big data" in managerial decision-making, by providing a comprehensive primer on analyzing compensation data. All aspects of compensation analytics are covered, starting with data acquisition, types of data, and formulation of a business question that can be informed by data analysis. A detailed, hands-on treatment of data cleaning is provided, equipping readers to prepare data for analysis by detecting and fixing data problems. Descriptive statistics are reviewed, and their utility in data cleaning explicated. Graphical methods are used in examples to detect and trim outliers. The basics of linear regression analysis are covered, with an emphasis on application and interpreting results in the context of the business question(s) posed. One section covers the question of whether or not the pay measure (as a dependent variable) should be transformed via a logarithm, and the implications of that choice for interpreting the results are explained. Precision of regression estimates is covered via an intuitive, non-technical treatment of standard errors. An appendix covers nonlinear relationships among variables.

Search Results

Refine search

Refine search

Actions for selected content:

9 results

EHRchitect: An open-source software tool for medical event sequences data extraction from Electronic Health Records

10 - Data Processing

Summary

7 - Preparing Network Data

Summary

21 - Data Cleaning

Summary

9 - Questionnaires to Elicit Quantitative Data

Summary

7 - Compensation Analytics II

Summary

7 - Compensation Analytics II

Summary

6 - Compensation Analytics I

Summary

6 - Compensation Analytics I

Summary

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

9 results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary