Skip to main content Accessibility help
×
Hostname: page-component-7b9c58cd5d-f9bf7 Total loading time: 0 Render date: 2025-03-13T13:41:03.973Z Has data issue: false hasContentIssue false

6 - Is Bigger Always Better?

On Sample Size, Statistical Significance, and Big Data

from Part II - Rethinking Research

Published online by Cambridge University Press:  13 March 2025

Karen B. Schmaling
Affiliation:
Washington State University
Robert M. Kaplan
Affiliation:
Stanford University
Get access

Summary

The criteria for evaluating research studies often include large sample size. It is assumed that studies with large sample sizes are more meaningful than those that include a fewer number of participants. This chapter explores biases associated with the traditional application of null hypothesis testing. Statisticians now challenge the idea that retention of the null hypothesis signifies that a treatment is not effective. A finding associated with an exact probability value of p = 0.049 is not meaningfully different from one in which p = 0.051. Yet the interpretation of these two studies can be dramatically different, including the likelihood of publication. Large studies are not necessarily more accurate or less biased. In fact, biases in sampling strategy are amplified in studies with large sample sizes. These problems are of increasing concern in the era of big data and the analysis of electronic health records. Studies that are overpowered (because of very large sample sizes) are capable of identifying statistically significant differences that are of no clinical importance.

Type
Chapter
Information
Rethinking Clinical Research
Methodology and Ethics
, pp. 117 - 136
Publisher: Cambridge University Press
Print publication year: 2025

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Freedman, D, Pisani, R, Purves, R. Statistics. 4th ed. WW Norton; 2004.Google Scholar
Kalton, G. Introduction to Survey Sampling. Sage Publications; 2020.Google Scholar
Lauer, MS. From hot hands to declining effects: The risks of small numbers. J Am Coll Cardiol. 2012; 60(1):7274.CrossRefGoogle Scholar
Schneeweiss, S, Avorn, J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. 2005; 58(4):323337. doi:10.1016/j.jclinepi.2004.10.012.CrossRefGoogle ScholarPubMed
Budoff, MJ. Current Utility of the Coronary Calcium Score for the Initial Evaluation of Suspected Coronary Artery Disease. BMJ Publishing Group Ltd and British Cardiovascular Society; 2023:659660.Google ScholarPubMed
Budoff, MJ, Kinninger, A, Gransar, H, et al. When does a calcium score equates to secondary prevention?: Insights from the multinational CONFIRM registry. JACC: Cardiovasc Imaging. 2023; 16(9):11811189.Google ScholarPubMed
Eghtedari, B, Kinninger, A, Roy, SK, Budoff, MJ. Coronary artery calcium progression and all-cause mortality. Coron Artery Dis. 2023; 34(4):244249.CrossRefGoogle ScholarPubMed
Abuzaid, A, Saad, M, Addoumieh, A, et al. Coronary artery calcium score and risk of cardiovascular events without established coronary artery disease: A systemic review and meta-analysis. Coron Artery Dis. 2021; 32(4):317328.CrossRefGoogle ScholarPubMed
Tripepi, G, Chesnaye, NC, Dekker, FW, Zoccali, C, Jager, KJ. Intention to treat and per protocol analysis in clinical trials. Nephrology. 2020; 25(7):513517.CrossRefGoogle ScholarPubMed
Lindholt, JS, Søgaard, R, Rasmussen, LM, et al. Five-year outcomes of the Danish Cardiovascular Screening (DANCAVAS) trial. N Engl J Med. 2022; 387(15):13851394.CrossRefGoogle ScholarPubMed
Golub, IS, Termeie, OG, Kristo, S, et al. Major global coronary artery calcium guidelines. Cardiovas Imaging. 2023; 16(1):98117.Google ScholarPubMed
DeYoreo, M, Lansdorp-Vogelaar, I, Knudsen, AB, Kuntz, KM, Zauber, AG, Rutter, CM. Validation of colorectal cancer models on long-term outcomes from a randomized controlled trial. Med Decis Making. 2020; 40(8):10341040.CrossRefGoogle ScholarPubMed
Buskermolen, M, Cenin, DR, Helsingen, LM, et al. Colorectal cancer screening with faecal immunochemical testing, sigmoidoscopy or colonoscopy: A microsimulation modelling study. BMJ. 2019; 36(7):l5383.Google Scholar
Zheng, S, Schrijvers, JJ, Greuter, MJ, Kats-Ugurlu, G, Lu, W, de Bock, GH. Effectiveness of Colorectal Cancer (CRC) screening on all-cause and CRC-specific mortality reduction: A systematic review and meta-analysis. Cancers. 2023; 15(7):1948.CrossRefGoogle ScholarPubMed
Amrhein, V, Greenland, S, McShane, B. Scientists rise up against statistical significance. Nature. 2019; 567(7748):305307.CrossRefGoogle ScholarPubMed
Greenland, S. Valid p-values behave exactly as they should: Some misleading criticisms of p-values and their resolution with s-values. Am Stat. 2019; 73(sup1):106114.CrossRefGoogle Scholar
Rao, CR. RA Fisher: The founder of modern statistics. Stat Sci. 1992; 7(1):3448.CrossRefGoogle Scholar
Glassman, JR, Jauregui, A, Milstein, A, Kaplan, RM. Caring for people with depression: Costs among 43 million commercially insured patients with or without comorbid illnesses. Ann Behav Med. 2023; 57(5):380385.CrossRefGoogle ScholarPubMed
Greenwald, AG, Gonzalez, R, Harris, RJ, Guthrie, D. Effect sizes and p values: What should be reported and what should be replicated? Psychophysiology. 1996; 33(2):175183.CrossRefGoogle ScholarPubMed
Stampfer, MJ, Colditz, GA, Willett, WC, et al. Postmenopausal estrogen therapy and cardiovascular disease. Ten-year follow-up from the nurses’ health study. N Engl J Med. 1991; 325(11):756762. doi:10.1056/NEJM199109123251102.CrossRefGoogle ScholarPubMed
Prentice, RL, Langer, R, Stefanick, ML, et al. Combined postmenopausal hormone therapy and cardiovascular disease: Toward resolving the discrepancy between observational studies and the Women’s Health Initiative clinical trial. Research Support, N.I.H., Extramural Research Support, U.S. Gov’t, P.H.S. Am J Epidemiol. 2005; 162(5):404414. doi:10.1093/aje/kwi223.CrossRefGoogle Scholar
Grodstein, F, Manson, JE, Stampfer, MJ. Postmenopausal hormone use and secondary prevention of coronary events in the nurses’ health study. a prospective, observational study. Ann Intern Med. 2001; 135(1):18.CrossRefGoogle ScholarPubMed
Kyzas, PA, Denaxa-Kyza, D, Ioannidis, JP. Almost all articles on cancer prognostic markers report statistically significant results. Eur J Cancer. 2007; 43(17):25592579. doi:10.1016/j.ejca.2007.08.030.CrossRefGoogle ScholarPubMed
Vul, E, Pashler, H. Voodoo and circularity errors. NeuroImage. 2012; 62(2):945948. doi:10.1016/j.neuroimage.2012.01.027.CrossRefGoogle ScholarPubMed
Kriegeskorte, N, Simmons, WK, Bellgowan, PS, Baker, CI. Circular analysis in systems neuroscience: The dangers of double dipping. Nat Neurosci. 2009; 12(5):535540. doi:10.1038/nn.2303.CrossRefGoogle ScholarPubMed
Ioannidis, JP, Ntzani, EE, Trikalinos, TA, Contopoulos-Ioannidis, DG. Replication validity of genetic association studies. Meta-Analysis Research Support, Non-U.S. Gov’t. Nat Genet. 2001; 29(3):306309. doi:10.1038/ng749.CrossRefGoogle Scholar
Begley, CG, Ellis, LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012; 483(7391):531533. doi:10.1038/483531a.CrossRefGoogle ScholarPubMed
Kho, AN, Pacheco, JA, Peissig, PL, et al. Electronic medical records for genetic research: Results of the eMERGE consortium. Sci Transl Med. 2011; 3(79):79re1. doi:10.1126/scitranslmed.3001807.CrossRefGoogle ScholarPubMed
Lagu, T, Krumholz, HM, Dharmarajan, K, et al. Spending more, doing more, or both? An alternative method for quantifying utilization during hospitalizations. J Hosp Med. 2013; 8(7):373379. doi:10.1002/jhm.2046.CrossRefGoogle ScholarPubMed
Hersh, WR, Weiner, MG, Embi, PJ, et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013; 51(8 Suppl 3):S30–S37. doi:10.1097/MLR.0b013e31829b1dbd.CrossRefGoogle ScholarPubMed
Overhage, JM, Overhage, LM. Sensible use of observational clinical data. Stat Methods Med Res. 2013; 22(1):713.doi:10.1177/0962280211403598.CrossRefGoogle ScholarPubMed
Schroeder, SA. Shattuck Lecture. We can do better – improving the health of the American people. Research Support, Non-U.S. Gov’t. New Engl J Med. 2007; 357(12):12211228. doi:10.1056/NEJMsa073350.CrossRefGoogle Scholar
Council NR. Capturing Social and Behavioral Domains in Electronic Health Records: Phase 1. vol 1. The National Academies Press; 2014.Google Scholar
Estabrooks, PA, Boyle, M, Emmons, KM, et al. Harmonized patient-reported data elements in the electronic health record: Supporting meaningful use by primary care action on health behaviors and key psychosocial factors. J Am Med Inform Assoc. 2012; 19(4):575582. doi:10.1136/amiajnl-2011-000576.CrossRefGoogle ScholarPubMed
Kilbourne, AM, Neumann, MS, Pincus, HA, Bauer, MS, Stall, R. Implementing evidence-based interventions in health care: Application of the replicating effective programs framework. Implement Sci. 2007; 2:42. doi:10.1186/1748-5908-2-42.CrossRefGoogle ScholarPubMed
Kaplan, RM, Chambers, DA, Glasgow, RE. Big data and large sample size: A cautionary note on the potential for bias. Clin Transl Sci. 2014; 7(4):342346.CrossRefGoogle ScholarPubMed

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×