We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Choice behavior is typically evaluated by assuming that the data is generated by one latent decision-making process or another. What if there are two (or more) latent decision-making processes generating the observed choices? Some choices might then be better characterized as being generated by one process, and other choices by the other process. A finite mixture model can be used to estimate the parameters of each decision process while simultaneously estimating the probability that each process applies to the sample. We consider the canonical case of lottery choices in a laboratory experiment and assume that the data is generated by expected utility theory and prospect theory decision rules. We jointly estimate the parameters of each theory as well as the fraction of choices characterized by each. The methodology provides the wedding invitation, and the data consummates the ceremony followed by a decent funeral for the representative agent model that assumes only one type of decision process. The evidence suggests support for each theory, and goes further to identify under what demographic domains one can expect to see one theory perform better than the other. We therefore propose a reconciliation of the debate over two of the dominant theories of choice under risk, at least for the tasks and samples we consider. The methodology is broadly applicable to a range of debates over competing theories generated by experimental and non-experimental data.
The elicitation of an ordinal judgment on multiple alternatives is often required in many psychological and behavioral experiments to investigate preference/choice orientation of a specific population. The Plackett–Luce model is one of the most popular and frequently applied parametric distributions to analyze rankings of a finite set of items. The present work introduces a Bayesian finite mixture of Plackett–Luce models to account for unobserved sample heterogeneity of partially ranked data. We describe an efficient way to incorporate the latent group structure in the data augmentation approach and the derivation of existing maximum likelihood procedures as special instances of the proposed Bayesian method. Inference can be conducted with the combination of the Expectation-Maximization algorithm for maximum a posteriori estimation and the Gibbs sampling iterative procedure. We additionally investigate several Bayesian criteria for selecting the optimal mixture configuration and describe diagnostic tools for assessing the fitness of ranking distributions conditionally and unconditionally on the number of ranked items. The utility of the novel Bayesian parametric Plackett–Luce mixture for characterizing sample heterogeneity is illustrated with several applications to simulated and real preference ranked data. We compare our method with the frequentist approach and a Bayesian nonparametric mixture model both assuming the Plackett–Luce model as a mixture component. Our analysis on real datasets reveals the importance of an accurate diagnostic check for an appropriate in-depth understanding of the heterogenous nature of the partial ranking data.
A model is presented for item responses when different subjects employ different strategies, but only responses, not choice of strategy, can be observed. Using substantive theory to differentiate the likelihoods of response vectors under a fixed set of strategies, we model response probabilities in terms of item parameters for each strategy, proportions of subjects employing each strategy, and distributions of subject proficiency within strategies. The probabilities that an individual subject employed the various strategies can then be obtained, along with a conditional estimate of proficiency under each. A conceptual example discusses response strategies for spatial rotation tasks, and a numerican example resolves a population of subjects into subpopulations of valid responders and random guessers.
Item response theory models posit latent variables to account for regularities in students' performances on test items. Wilson's “Saltus” model extends the ideas of IRT to development that occurs in stages, where expected changes can be discontinuous, show different patterns for different types of items, or even exhibit reversals in probabilities of success on certain tasks. Examples include Piagetian stages of psychological development and Siegler's rule-based learning. This paper derives marginal maximum likelihood (MML) estimation equations for the structural parameters of the Saltus model and suggests a computing approximation based on the EM algorithm. For individual examinees, empirical Bayes probabilities of learning-stage are given, along with proficiency parameter estimates conditional on stage membership. The MML solution is illustrated with simulated data and an example from the domain of mixed number subtraction.
We consider models which combine latent class measurement models for categorical latent variables with structural regression models for the relationships between the latent classes and observed explanatory and response variables. We propose a two-step method of estimating such models. In its first step, the measurement model is estimated alone, and in the second step the parameters of this measurement model are held fixed when the structural model is estimated. Simulation studies and applied examples suggest that the two-step method is an attractive alternative to existing one-step and three-step methods. We derive estimated standard errors for the two-step estimates of the structural model which account for the uncertainty from both steps of the estimation, and show how the method can be implemented in existing software for latent variable modelling.
In this paper, various types of finite mixtures of confirmatory factor-analysis models are proposed for handling data heterogeneity. Under the proposed mixture approach, observations are assumed to be drawn from mixtures of distinct confirmatory factor-analysis models. But each observation does not need to be identified to a particular model prior to model fitting. Several classes of mixture models are proposed. These models differ by their unique representations of data heterogeneity. Three different sampling schemes for these mixture models are distinguished. A mixed type of the these three sampling schemes is considered throughout this article. The proposed mixture approach reduces to regular multiple-group confirmatory factor-analysis under a restrictive sampling scheme, in which the structural equation model for each observation is assumed to be known. By assuming a mixture of multivariate normals for the data, maximum likelihood estimation using the EM (Expectation-Maximization) algorithm and the AS (Approximate-Scoring) method are developed, respectively. Some mixture models were fitted to a real data set for illustrating the application of the theory. Although the EM algorithm and the AS method gave similar sets of parameter estimates, the AS method was found computationally more efficient than the EM algorithm. Some comments on applying the mixture approach to structural equation modeling are made.
The literature on clustering for continuous data is rich and wide; differently, that one developed for categorical data is still limited. In some cases, the clustering problem is made more difficult by the presence of noise variables/dimensions that do not contain information about the clustering structure and could mask it. The aim of this paper is to propose a model for simultaneous clustering and dimensionality reduction of ordered categorical data able to detect the discriminative dimensions discarding the noise ones. Following the underlying response variable approach, the observed variables are considered as a discretization of underlying first-order latent continuous variables distributed as a Gaussian mixture. To recognize discriminative and noise dimensions, these variables are considered to be linear combinations of two independent sets of second-order latent variables where only one contains the information about the cluster structure while the other one contains noise dimensions. The model specification involves multidimensional integrals that make the maximum likelihood estimation cumbersome and in some cases infeasible. To overcome this issue, the parameter estimation is carried out through an EM-like algorithm maximizing a composite log-likelihood based on low-dimensional margins. Examples of application of the proposal on real and simulated data are performed to show the effectiveness of the proposal.
Human response time (RT) data are widely used in experimental psychology to evaluate theories of mental processing. Typically, the data constitute the times taken by a subject to react to a succession of stimuli under varying experimental conditions. Because of the sequential nature of the experiments there are trends (due to learning, fatigue, fluctuations in attentional state, etc.) and serial dependencies in the data. The data also exhibit extreme observations that can be attributed to lapses, intrusions from outside the experiment, and errors occurring during the experiment. Any adequate analysis should account for these features and quantify them accurately. Recognizing that Bayesian hierarchical models are an excellent modeling tool, we focus on the elaboration of a realistic likelihood for the data and on a careful assessment of the quality of fit that it provides. We judge quality of fit in terms of the predictive performance of the model. We demonstrate how simple Bayesian hierarchical models can be built for several RT sequences, differentiating between subject-specific and condition-specific effects.
The analysis of insurance and annuity products issued on multiple lives requires the use of statistical models which account for lifetime dependence. This paper presents a Dirichlet process mixture-based approach that allows to model dependent lifetimes within a group, such as married couples, accounting for individual as well as group-specific covariates. The model is analyzed in a fully Bayesian setting and illustrated to jointly model the lifetime of male–female couples in a portfolio of joint and last survivor annuities of a Canadian life insurer. The inferential approach allows to account for right censoring and left truncation, which are common features of data in survival analysis. The model shows improved in-sample and out-of-sample performance compared to traditional approaches assuming independent lifetimes and offers additional insights into the determinants of the dependence between lifetimes and their impact on joint and last survivor annuity prices.
With the increasing prevalence of big data and sparse data, and rapidly growing data-centric approaches to scientific research, students must develop effective data analysis skills at an early stage of their academic careers. This detailed guide to data modeling in the sciences is ideal for students and researchers keen to develop their understanding of probabilistic data modeling beyond the basics of p-values and fitting residuals. The textbook begins with basic probabilistic concepts, models of dynamical systems and likelihoods are then presented to build the foundation for Bayesian inference, Monte Carlo samplers and filtering. Modeling paradigms are then seamlessly developed, including mixture models, regression models, hidden Markov models, state-space models and Kalman filtering, continuous time processes and uniformization. The text is self-contained and includes practical examples and numerous exercises. This would be an excellent resource for courses on data analysis within the natural sciences, or as a reference text for self-study.
People often perform poorly on stock-flow reasoning tasks, with many (but not all) participants appearing to erroneously match the accumulation of the stock to the inflow – a response pattern attributed to the use of a “correlation heuristic”. Efforts to improve understanding of stock-flow systems have been limited by the lack of a principled approach to identifying and measuring individual differences in reasoning strategies. We present a principled inferential method known as Hierarchical Bayesian Latent Mixture Models (HBLMMs) to analyze stock-flow reasoning. HBLMMs use Bayesian inference to classify different patterns of responding as coming from multiple latent populations. We demonstrate the usefulness of this approach using a dataset from a stock-flow drawing task which compared performance in a problem presented in a climate change context, a problem in a financial context, and a problem in which the financial context was used as an analogy to assist understanding in the climate problem. The hierarchical Bayesian model showed that the proportion of responses consistent with the “correlation heuristic” was lower in the financial context and financial analogy context than in the pure climate context. We discuss the benefits of HBLMMs and implications for the role of contexts and analogy in improving stock-flow reasoning.
We present a classification methodology that jointly assigns to a decision maker a best-fitting decision strategy for a set of choice data as well as a best-fitting stochastic specification of that decision strategy. Our methodology utilizes normalized maximum likelihood as a model selection criterion to compare multiple, possibly non-nested, stochastic specifications of candidate strategies. In addition to single strategy with “error” stochastic specifications, we consider mixture specifications, i.e., strategies comprised of a probability distribution over multiple strategies. In this way, our approach generalizes the classification framework of Bröder and Schiffer (2003a). We apply our methodology to an existing dataset and find that some decision makers are best fit by a single strategy with varying levels of error, while others are best described as using a mixture specification over multiple strategies.
Population seroprevalence can be estimated from serosurveys by classifying quantitative measurements into positives (past infection/vaccinated) or negatives (susceptible) according to a fixed assay cut-off. The choice of assay cut-offs has a direct impact on seroprevalence estimates. A time-resolved fluorescence immunoassay (TRFIA) was used to test exposure to human parvovirus 4 (HP4). Seroprevalence estimates were obtained after applying the diagnostic assay cut-off under different scenarios using simulations. Alternative methods for estimating assay cut-offs were proposed based on mixture modelling with component distributions for the past infection/vaccinated and susceptible populations. Seroprevalence estimates were compared to those obtained directly from the data using mixture models. Simulation results showed that when there was good distinction between the underlying populations all methods gave seroprevalence estimates close to the true one. For high overlap between the underlying components, the diagnostic assay cut-off generally gave the most biased estimates. However, the mixture model methods also gave biased estimates which were a result of poor model fit. In conclusion, fixed cut-offs often produce biased estimates but they also have advantages compared to other methods such as mixture models. The bias can be reduced by using assay cut-offs estimated specifically for seroprevalence studies.
Insurers have been concerned about surrenders for a long time especially in saving business, where huge sums are at stake. The emergence of the European directive Solvency II, which promotes the development of internal risk models (among which a complete unit is dedicated to surrender risk management), strengthens the necessity to deeply study and understand this risk. In this paper, we investigate the topics of segmenting and modelling surrenders in order to better take into account the main risk factors impacting policyholders' decisions. We find that several complex aspects must be specifically dealt with to predict surrenders, in particular the heterogeneity of behaviour as well as the context faced by the insured. Combining them, we develop a new methodology that seems to provide good results on given business lines, and that moreover can be adapted for other products with little effort.
This paper considers the modelling of claim durations for existing claimants under income protection insurance policies. A claim is considered to be terminated when the claimant returns to work. Data used in the analysis were provided by the Life and Risk Committee of the Institute of Actuaries of Australia. Initial analysis of the data suggests the presence of a long-run probability, of the order of 7%, that a claimant will never return to work. This phenomenon suggests the use of mixed parametric regression models as a description of claim duration which include the prediction of a long-run probability of not returning to work. A series of such parametric mixture models was investigated, and it was found that the generalised F mixture distribution provided a good fit to the data and also highlighted the impact of a number of statistically significant predictors of claim duration.
The estimation of probabilistic deformable template models incomputer vision or of probabilistic atlases in Computational Anatomyare core issues in both fields.A first coherent statistical framework where the geometrical variability ismodelled as a hiddenrandom variable has been given by [S. Allassonnière et al., J. Roy. Stat. Soc.69 (2007) 3–29]. They introduce a Bayesian approach and mixture of them to estimate deformable template models.A consistent stochastic algorithm has been introduced in [S. Allassonnière et al. (in revision)] to face the problem encountered in [S. Allassonnière et al., J. Roy. Stat. Soc.69 (2007) 3–29] for theconvergence of the estimation algorithm for the one component model inthe presence of noise.We propose here to go on in this direction of using some “SAEM-like”algorithm to approximate the MAP estimator in the general Bayesian setting ofmixture of deformable template models.We also prove the convergence of our algorithm toward a criticalpoint of the penalised likelihood of the observations andillustrate this with handwritten digit images and medical images.
The aim is to study the asymptotic behavior of estimators and testsfor the components of identifiable finite mixture models ofnonparametric densities with a known number of components.Conditions for identifiability of the mixture components andconvergence of identifiable parameters are given.The consistency and weak convergence of the identifiable parametersand test statistics are presented for several models.
This paper deals with the likelihood ratio test (LRT) for testing hypotheseson the mixing measure in mixture models with or without structural parameter. The main result gives the asymptotic distribution of the LRTstatisticsunder some conditions that are proved to be almost necessary.A detailed solution is given for two testing problems: thetest of a single distribution against any mixture, with application to Gaussian, Poisson andbinomial distributions; the test of the number of populations in afinite mixture with or without structural parameter.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.