We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The nonparametric Wilcoxon-Mann-Whitney test is commonly used by experimental economists for detecting differences in central tendency between two samples. This test is only theoretically appropriate under certain assumptions concerning the population distributions from which the samples are drawn, and is often used in cases where it is unclear whether these assumptions hold, and even when they clearly do not hold. Fligner and Pollicello's (1981, Journal of the American Statistical Association. 76, 162-168) robustrank-ordertestis a modification of the Wilcoxon-Mann-Whitney test, designed to be appropriate in more situations than Wilcoxon-Mann-Whitney. This paper uses simulations to compare the performance of the two tests under a variety of distributional assumptions. The results are mixed. The robust rank-order test tends to yield too many false positive results for mediumsized samples, but this liberalness is relatively invariant across distributional assumptions, and seems to be due to a deficiency of the normal approximation to its test statistic's distribution, rather than the test itself. The performance of the Wilcoxon-Mann-Whitney test varies hugely, depending on the distributional assumptions; in some cases, it is conservative, in others, extremely liberal. The tests have roughly similar power. Overall, the robust rank-order test performs better than Wilcoxon-Mann-Whitney, though when critical values for the robust rank-order test are not available, so that the normal approximation must be used, their relative performance depends on the underlying distributions, the sample sizes, and the level of significance used.
The classical trinity of tests is used to check for the presence of a tremble in economic experiments in which the response variable is binary. A tremble is said to occur when an agent makes a decision completely at random, without regard to the values taken by the explanatory variables. The properties of the tests are discussed, and an extension of the methodology is used to test for the presence of a tremble in binary panel data from a well-known economic experiment.
Limit theory is developed for least squares regression estimation of a model involving time trend polynomials and a moving average error process with a unit root. Models with these features can arise from data manipulation such as overdifferencing and model features such as the presence of multicointegration. The impact of such features on the asymptotic equivalence of least squares and generalized least squares is considered. Problems of rank deficiency that are induced asymptotically by the presence of time polynomials in the regression are also studied, focusing on the impact that singularities have on hypothesis testing using Wald statistics and matrix normalization. The chapter is largely pedagogical but contains new results, notational innovations, and procedures for dealing with rank deficiency that are useful in cases of wider applicability.
I first review and critique the prevailing use of hypothesis tests to compare treatments. I then describe my application of statistical decision theory. I compare Bayes, maximin, and minimax regret decisions. I consider choice of sample size in randomized trials from the minimax regret perspective.
In Chapter 31 we study three commonly used techniques for proving minimax lower bounds, namely, Le Cam’s method, Assouad’s lemma, and Fano’s method. Compared to the results in Chapter 29, which are geared toward large-sample asymptotics in smooth parametric models, the approach here is more generic, less tied to mean-squared error, and applicable in non-asymptotic settings such as nonparametric or high-dimensional problems. The common rationale of all three methods is reducing statistical estimation to hypothesis testing.
In a variety of measurement situations, the researcher may wish to compare the reliabilities of several instruments administered to the same sample of subjects. This paper presents eleven statistical procedures which test the equality of m coefficient alphas when the sample alpha coefficients are dependent. Several of the procedures are derived in detail, and numerical examples are given for two. Since all of the procedures depend on approximate asymptotic results, Monte Carlo methods are used to assess the accuracy of the procedures for sample sizes of 50, 100, and 200. Both control of Type I error and power are evaluated by computer simulation. Two of the procedures are unable to control Type I errors satisfactorily. The remaining nine procedures perform properly, but three are somewhat superior in power and Type I error control.
Diagnostic classification models (DCMs) have seen wide applications in educational and psychological measurement, especially in formative assessment. DCMs in the presence of testlets have been studied in recent literature. A key ingredient in the statistical modeling and analysis of testlet-based DCMs is the superposition of two latent structures, the attribute profile and the testlet effect. This paper extends the standard testlet DINA (T-DINA) model to accommodate the potential correlation between the two latent structures. Model identifiability is studied and a set of sufficient conditions are proposed. As a byproduct, the identifiability of the standard T-DINA is also established. The proposed model is applied to a dataset from the 2015 Programme for International Student Assessment. Comparisons are made with DINA and T-DINA, showing that there is substantial improvement in terms of the goodness of fit. Simulations are conducted to assess the performance of the new method under various settings.
In the social sciences we are often interested in comparing models specified by parametric equality or inequality constraints. For instance, when examining three group means \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\{ \mu _1, \mu _2, \mu _3\}$$\end{document} through an analysis of variance (ANOVA), a model may specify that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu _1<\mu _2<\mu _3$$\end{document}, while another one may state that \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\{ \mu _1=\mu _3\} <\mu _2$$\end{document}, and finally a third model may instead suggest that all means are unrestricted. This is a challenging problem, because it involves a combination of nonnested models, as well as nested models having the same dimension. We adopt an objective Bayesian approach, requiring no prior specification from the user, and derive the posterior probability of each model under consideration. Our method is based on the intrinsic prior methodology, suitably modified to accommodate equality and inequality constraints. Focussing on normal ANOVA models, a comparative assessment is carried out through simulation studies. We also present an application to real data collected in a psychological experiment.
A paired composition is a response (upon a dependent variable) to the ordered pair <j, k> of stimuli, treatments, etc. The present paper develops an alternative analysis for the paired compositions layout previously treated by Bechtel's [1967] scaling model. The alternative model relaxes the previous one by including row and column scales that provide an expression of bias for each pair of objects. The parameter estimation and hypothesis testing procedures for this model are illustrated by means of a small group analysis, which represents a new approach to pairwise sociometrics and personality assessment.
A variety of distributional assumptions for dissimilarity judgments are considered, with the lognormal distribution being favored for most situations. An implicit equation is discussed for the maximum likelihood estimation of the configuration with or without individual weighting of dimensions. A technique for solving this equation is described and a number of examples offered to indicate its performance in practice. The estimation of a power transformation of dissimilarity is also considered. A number of likelihood ratio hypothesis tests are discussed and a small Monte Carlo experiment described to illustrate the behavior of the test of dimensionality in small samples.
The recent surge of interests in cognitive assessment has led to the development of cognitive diagnosis models. Central to many such models is a specification of the Q-matrix, which relates items to latent attributes that have natural interpretations. In practice, the Q-matrix is usually constructed subjectively by the test designers. This could lead to misspecification, which could result in lack of fit of the underlying statistical model. To test possible misspecification of the Q-matrix, traditional goodness of fit tests, such as the Chi-square test and the likelihood ratio test, may not be applied straightforwardly due to the large number of possible response patterns. To address this problem, this paper proposes a new statistical method to test the goodness fit of the Q-matrix, by constructing test statistics that measure the consistency between a provisional Q-matrix and the observed data for a general family of cognitive diagnosis models. Limiting distributions of the test statistics are derived under the null hypothesis that can be used for obtaining the test p-values. Simulation studies as well as a real data example are presented to demonstrate the usefulness of the proposed method.
A method for externally constraining certain distances in multidimensional scaling configurations is introduced and illustrated. The approach defines an objective function which is a linear composite of the loss function of the point configuration X relative to the proximity data P and the loss of X relative to a pseudo-data matrix R. The matrix R is set up such that the side constraints to be imposed on X’s distances are expressed by the relations among R’s numerical elements. One then uses a double-phase procedure with relative penalties on the loss components to generate a constrained solution X. Various possibilities for constructing actual MDS algorithms are conceivable: the major classes are defined by the specification of metric or nonmetric loss for data and/or constraints, and by the various possibilities for partitioning the matrices P and R. Further generalizations are introduced by substituting R by a set of R matrices, Ri, i = 1, …r, which opens the way for formulating overlapping constraints as, e.g., in patterns that are both row- and column-conditional at the same time.
The properties of nonmetric multidimensional scaling (NMDS) are explored by specifying statistical models, proving statistical consistency, and developing hypothesis testing procedures. Statistical models with errors in the dependent and independent variables are described for quantitative and qualitative data. For these models, statistical consistency often depends crucially upon how error enters the model and how data are collected and summarized (e.g., by means, medians, or rank statistics). A maximum likelihood estimator for NMDS is developed, and its relationship to the standard Shepard-Kruskal estimation method is described. This maximum likelihood framework is used to develop a method for testing the overall fit of the model.
Chapter 6 introduces the hypothesis-testing process and relevance of standard error in reaching statistical conclusions about whether to accept or reject the null hypothesis using the z-test statistic. Type I and Type II errors, along with the types of statistical tests researchers apply in testing hypotheses, are presented; these include one-tailed (directional) versus two-tailed (nondirectional) tests. Three important decision rules are the sampling distribution of means, the level of significance, and critical regions. Type I and Type II errors influence the decisions we make about our predictions of relationships between variables. Statistical decision-making is never error-free, but we have some control in reducing these types of errors.
Discusses statistical methods, covering random variables and variates, sample and population, frequency distributions, moments and moment measures, probability and stochastic processes, discrete and continuous probability distributions, return periods and quantiles, probability density functions, parameter estimation, hypothesis testing, confidence intervals, covariance, regression and correlation analysis, time-series analysis.
This chapter covers ways to explore your network data using visual means and basic summary statistics, and how to apply statistical models to validate aspects of the data. Data analysis can generally be divided into two main approaches, exploratory and confirmatory. Exploratory data analysis (EDA) is a pillar of statistics and data mining and we can leverage existing techniques when working with networks. However, we can also use specialized techniques for network data and uncover insights that general-purpose EDA tools, which neglect the network nature of our data, may miss. Confirmatory analysis, on the other hand, grounds the researcher with specific, preexisting hypotheses or theories, and then seeks to understand whether the given data either support or refute the preexisting knowledge. Thus, complementing EDA, we can define statistical models for properties of the network, such as the degree distribution, or for the network structure itself. Fitting and analyzing these models then recapitulates effectively all of statistical inference, including hypothesis testing and Bayesian inference.
This chapter elaborates on the calibration and validation procedures for the model. First, we describe our calibration strategy in which a customised optimisation algorithm makes use of a multi-objective function, preventing the loss of indicator-specific error information. Second, we externally validate our model by replicating two well-known statistical patterns: (1) the skewed distribution of budgetary changes and (2) the negative relationship between development and corruption. Third, we internally validate the model by showing that public servants who receive more positive spillovers tend to be less efficient. Fourth, we analyse the statistical behaviour of the model through different tests: validity of synthetic counterfactuals, parameter recovery, overfitting, and time equivalence. Finally, we make a brief reference to the literature on estimating SDG networks.
A quick introduction to the standard model of particle physics is given. The general concepts of elementary particles, interactions and fields are outlined. The experimental side of particle physics is also briefly discussed: how elementary particles are produced with accelerators or from cosmic rays and how to observe them with detectors via the interactions of particles with matter. The various detector technologies leading to particle identification are briefly presented. The way in which the data collected by the sensors is analysed is also presented: the most frequent probability density functions encountered in particle physics are outlined. How measurements can be used to estimate a quantity from some data and the question of the best estimate of that quantity and its uncertainty are explained. As measurements can also be used to test a hypothesis based on a particular model, the hypothesis testing procedure is explained.
Separation commonly occurs in political science, usually when a binary explanatory variable perfectly predicts a binary outcome. In these situations, methodologists often recommend penalized maximum likelihood or Bayesian estimation. But researchers might struggle to identify an appropriate penalty or prior distribution. Fortunately, I show that researchers can easily test hypotheses about the model coefficients with standard frequentist tools. While the popular Wald test produces misleading (even nonsensical) p-values under separation, I show that likelihood ratio tests and score tests behave in the usual manner. Therefore, researchers can produce meaningful p-values with standard frequentist tools under separation without the use of penalties or prior information.
For this book, we assume you’ve had an introductory statistics or experimental design class already! This chapter is a mini refresher of some critical concepts we’ll be using and lets you check you understand them correctly. The topics include understanding predictor and response variables, the common probability distributions that biologists encounter in their data, the common techniques, particularly ordinary least squares (OLS) and maximum likelihood (ML), for fitting models to data and estimating effects, including their uncertainty. You should be familiar with confidence intervals and understand what hypothesis tests and P-values do and don’t mean. You should recognize that we use data to decide, but these decisions can be wrong, so you need to understand the risk of missing important effects and the risk of falsely claiming an effect. Decisions about what constitutes an “important” effect are central.