A Note on Improving Variational Estimation for Multidimensional Item Response Theory

Chenchen Ma; Jing Ouyang; Chun Wang; Gongjun Xu

doi:10.1007/s11336-023-09939-0

A Note on Improving Variational Estimation for Multidimensional Item Response Theory

Published online by Cambridge University Press: 01 January 2025

Chun Wang and

Chenchen Ma: Affiliation:
University of Michigan
Jing Ouyang: Affiliation:
University of Michigan
Chun Wang*: Affiliation:
University of Washington
Gongjun Xu*: Affiliation:
University of Michigan
*: Correspondence should be made to Chun Wang, College of Education, University of Washington, 312 E Miller Hall, 2012 Skagit Lane, Seattle, WA98105, USA. Email: [email protected]
Correspondence should be made to Gongjun Xu, Department of Statistics, University of Michigan, 456 West Hall, 1085 South University, Ann Arbor, MI 48109, USA. Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Survey instruments and assessments are frequently used in many domains of social science. When the constructs that these assessments try to measure become multifaceted, multidimensional item response theory (MIRT) provides a unified framework and convenient statistical tool for item analysis, calibration, and scoring. However, the computational challenge of estimating MIRT models prohibits its wide use because many of the extant methods can hardly provide results in a realistic time frame when the number of dimensions, sample size, and test length are large. Instead, variational estimation methods, such as Gaussian variational expectation–maximization (GVEM) algorithm, have been recently proposed to solve the estimation challenge by providing a fast and accurate solution. However, results have shown that variational estimation methods may produce some bias on discrimination parameters during confirmatory model estimation, and this note proposes an importance-weighted version of GVEM (i.e., IW-GVEM) to correct for such bias under MIRT models. We also use the adaptive moment estimation method to update the learning rate for gradient descent automatically. Our simulations show that IW-GVEM can effectively correct bias with modest increase of computation time, compared with GVEM. The proposed method may also shed light on improving the variational estimation for other psychometrics models.

Keywords

multidimensional item response theory Gaussian variational em importance sampling

Type: Theory & Methods
Information: Psychometrika , Volume 89 , Issue 1 , March 2024 , pp. 172 - 204

DOI: https://doi.org/10.1007/s11336-023-09939-0 [Opens in a new window]
Copyright: Copyright © 2023 The Author(s), under exclusive licence to The Psychometric Society.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Albert, J.H.. (1992). Bayesian estimation of normal ogive item response curves using GIBBS sampling. Journal of educational statistics, 17 3251–269.CrossRef Google Scholar

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823.Google Scholar

Bishop, C.M.Pattern recognition and machine learning 2006 Springer.Google Scholar

Blei, D.M., Kucukelbir, A, McAuliffe, J.D.. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association, 112 518859–877.CrossRef Google Scholar

Bock, R.D., Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46 4443–459.CrossRef Google Scholar

Briggs, D. C., & Wilson, M. (2003). An introduction to multidimensional measurement using rasch models.Google Scholar

Burda, Y., Grosse, R., & Salakhutdinov, R. (2015). Importance weighted autoencoders. arXiv preprint arXiv:1509.00519.Google Scholar

Cai, L. (2008). Sem of another flavor: Two new applications of the supplemented EM algorithm. British Journal of Mathematical and Statistical Psychology, 61, 309–329.CrossRef Google Scholar PubMed

Cai, L. (2010). Metropolis–Hastings Robbins–Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35 3307–335.CrossRef Google Scholar

Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75 133–57.CrossRef Google Scholar

Cai, L, Hansen, M. (2018). Improving educational assessment: Multivariate statistical methods. Policy Insights from the Behavioral and Brain Sciences, 5 119–24.CrossRef Google Scholar

Cai, L, Yang, J.S., Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological methods, 16 3221.CrossRef Google Scholar PubMed

Chen, Y, Li, X, Zhang, S. (2019). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84 1124–146.CrossRef Google Scholar PubMed

Chen, P, Wang, C. (2021). Using EM algorithm for finite mixtures and reformed supplemented EM for MIRT calibration. Psychometrika, 86, 299–326.CrossRef Google Scholar PubMed

Cho, A. E., Xiao, J., Wang, C., & Xu, G. (2022). Regularized variational estimation for exploratory item response theory. Psychometrika, pp. 1–29.Google Scholar

Cho, A.E., Wang, C, Zhang, X, Xu, G. (2021). Gaussian variational estimation for multidimensional item response theory. British Journal of Mathematical and Statistical Psychology, 74, 52–85.CrossRef Google Scholar PubMed

CRESST (2017). English language proficiency assessment for the 21st century: Item analysis and calibration.Google Scholar

Curi, M., Converse, G. A., Hajewski, J., & Oliveira, S. (2019). Interpretable variational autoencoders for cognitive models. In 2019 international joint conference on neural networks (IJCNN), pp. 1–8. IEEE.CrossRef Google Scholar

Domke, J., & Sheldon, D. R. (2018). Importance weighting and variational inference. Advances in Neural Information Processing Systems, 31.Google Scholar

Gibbons, R.D., Hedeker, D.R.. (1992). Full-information item bi-factor analysis. Psychometrika, 57 3423–436.CrossRef Google Scholar

Hamilton, L.S., Nussbaum, E.M., Kupermintz, H, Kerkhoven, J.I., Snow, R.E.. (1995). Enhancing the validity and usefulness of large-scale educational assessments: Ii. nels: 88 science achievement. American Educational Research Journal, 32 3555–581.CrossRef Google Scholar

Hartig, J, Höhler, J. (2009). Multidimensional IRT models for the assessment of competencies. Studies in Educational Evaluation, 35 2–357–63.CrossRef Google Scholar

Hui, F.K., Warton, D.I., Ormerod, J.T., Haapaniemi, V, Taskinen, S. (2017). Variational approximations for generalized linear latent variable models. Journal of Computational and Graphical Statistics, 26 135–43.CrossRef Google Scholar

Jeon, M, Rijmen, F, Rabe-Hesketh, S. (2017). A variational maximization-maximization algorithm for generalized linear mixed models with crossed random effects. Psychometrika, 82 3693–716.CrossRef Google Scholar

Jordan, M.I.. (2004). Graphical models. Statistical science, 19 1140–155.CrossRef Google Scholar

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google Scholar

Kupermintz, H, Ennis, M.M., Hamilton, L.S., Talbert, J.E., Snow, R.E.. (1995). In dedication: Leigh burstein: Enhancing the validity and usefulness of large-scale educational assessments: I. nels: 88 mathematics achievement. American Educational Research Journal, 32 3525–554.Google Scholar

Lindstrom, M.J., Bates, D.M.. (1988). Newton–Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. Journal of the American Statistical Association, 83 4041014–1022.Google Scholar

Liu, T., Wang, C., & Xu, G. (2022). Estimating three- and four-parameter MIRT models with importance-weighted sampling enhanced variational auto-encoder. Frontiers in Psychology, 13.CrossRef Google Scholar

McCulloch, C.E.. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of the American statistical Association, 92 437162–170.CrossRef Google Scholar

Natesan, P, Nandakumar, R, Minka, T, Rubright, J.D.. (2016). Bayesian prior choice in IRT estimation using MCMC and variational bayes. Frontiers in Psychology, 7, 1422.CrossRef Google Scholar PubMed

Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica: Journal of the Econometric Society, 1–32.CrossRef Google Scholar

OECD, N. (2003). The pisa 2003 assessment framework: Mathematics, reading, science and problem solving knowledge and skills.Google Scholar

Ormerod, J.T., Wand, M.P.. (2010). Explaining variational approximations. The American Statistician, 64 2140–153.CrossRef Google Scholar

Patz, R.J., Junker, B.W.. (1999). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of educational and behavioral statistics, 24 4342–366.CrossRef Google Scholar

Pinheiro, J.C., Bates, D.M.. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of computational and Graphical Statistics, 4 112–35.CrossRef Google Scholar

Reckase, M. D. (2009). Multidimensional item response theory models. In Multidimensional item response theory, pp. 79–112. Springer.CrossRef Google Scholar

Rijmen, F, Jeon, M. (2013). Fitting an item response theory model with random item effects across groups by a variational approximation method. Annals of Operations Research, 206 1647–662.CrossRef Google Scholar

Rijmen, F, Vansteelandt, K, De Boeck, P. (2008). Latent class models for diary method data: Parameter estimation by local computations. Psychometrika, 73 2167–182.CrossRef Google Scholar PubMed

Thissen, D. (2013). Using the testlet response model as a shortcut to multidimensional item response theory subscore computation. In New developments in quantitative psychology, pp. 29–40. Springer.CrossRef Google Scholar

Urban, C.J., Bauer, D.J.. (2021). A deep learning algorithm for high-dimensional exploratory item factor analysis. Psychometrika, 86 11–29.CrossRef Google Scholar PubMed

von Davier, M, Sinharay, S. (2010). Stochastic approximation methods for latent regression item response models. Journal of Educational and Behavioral Statistics, 35 2174–193.CrossRef Google Scholar

Wainer, H, Bradlow, E.T., Wang, XTestlet response theory and its applications 2007 Cambridge University Press.CrossRef Google Scholar

Wang, C, Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68 3456–477.CrossRef Google Scholar PubMed

Wu, M., Davis, R. L., Domingue, B. W., Piech, C., & Goodman, N. (2020). Variational item response theory: Fast, accurate, and expressive. arXiv preprint arXiv:2002.00276.Google Scholar

Yamaguchi, K, Okada, K. (2020). Variational Bayes inference algorithm for the saturated diagnostic classification model. Psychometrika, 85 4973–995.CrossRef Google Scholar PubMed

Yamaguchi, K, Okada, K. (2020). Variational Bayes inference for the DINA model. Journal of Educational and Behavioral Statistics, 45 5569–597.CrossRef Google Scholar

Zhang, H, Chen, Y, Li, X. (2020). A note on exploratory item factor analysis by singular value decomposition. Psychometrika, 85, 358–372.CrossRef Google Scholar PubMed

Zhang, S, Chen, Y, Liu, Y. (2020). An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology, 73 144–71.Google Scholar PubMed

Article contents

A Note on Improving Variational Estimation for Multidimensional Item Response Theory

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests