A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses

Ling Chen; Yuqi Gu

doi:10.1007/s11336-024-09951-y

A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses

Published online by Cambridge University Press: 27 December 2024

Ling Chen and

Yuqi Gu

Show author details

Ling Chen: Affiliation:
Columbia University
Yuqi Gu*: Affiliation:
Columbia University
*: Correspondence should be made to Yuqi Gu, Department of Statistics, Columbia University, New York, NY10027, USA. Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Grade of membership (GoM) models are popular individual-level mixture models for multivariate categorical data. GoM allows each subject to have mixed memberships in multiple extreme latent profiles. Therefore, GoM models have a richer modeling capacity than latent class models that restrict each subject to belong to a single profile. The flexibility of GoM comes at the cost of more challenging identifiability and estimation problems. In this work, we propose a singular value decomposition (SVD)-based spectral approach to GoM analysis with multivariate binary responses. Our approach hinges on the observation that the expectation of the data matrix has a low-rank decomposition under a GoM model. For identifiability, we develop sufficient and almost necessary conditions for a notion of expectation identifiability. For estimation, we extract only a few leading singular vectors of the observed data matrix and exploit the simplex geometry of these vectors to estimate the mixed membership scores and other parameters. We also establish the consistency of our estimator in the double-asymptotic regime where both the number of subjects and the number of items grow to infinity. Our spectral method has a huge computational advantage over Bayesian or likelihood-based methods and is scalable to large-scale and high-dimensional data. Extensive simulation studies demonstrate the superior efficiency and accuracy of our method. We also illustrate our method by applying it to a personality test dataset.

Keywords

grade of membership model identifiability latent variable model mixed membership model successive projection algorithm singular value decomposition spectral method

Type: Theory & Methods
Information: Psychometrika , Volume 89 , Issue 2 , June 2024 , pp. 626 - 657

DOI: https://doi.org/10.1007/s11336-024-09951-y [Opens in a new window]
Copyright: Copyright © 2024 The Author(s), under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Airoldi, E. M., Blei, D., Erosheva, E. A., Fienberg, S. E.. (2014). Handbook of mixed membership models and their applications. Boca Raton: CRC Press.CrossRef Google Scholar

Airoldi, E. M., Blei, D. M., Fienberg, S. E., Xing, E. P.. (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9, 1981–2014.Google Scholar PubMed

Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. Selected papers of Hirotugu Akaike (pp. 199–213).CrossRef Google Scholar

Araújo, M. C. U., Saldanha, T. C. B., Galvao, R. K. H., Yoneyama, T., Chame, H. C., Visani, V.. (2001). The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and Intelligent Laboratory Systems, 57(2), 65–73.CrossRef Google Scholar

Berry, M. W., Browne, M., Langville, A. N., Pauca, V. P., Plemmons, R. J.. (2007). Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis, 52(1), 155–173.CrossRef Google Scholar

Blei, D. M., Ng, A. Y., Jordan, M. I.. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.Google Scholar

Borsboom, D., Rhemtulla, M., Cramer, A. O., van der Maas, H. L., Scheffer, M., Dolan, C. V.. (2016). Kinds versus continua: A review of psychometric approaches to uncover the structure of psychiatric constructs. Psychological Medicine, 46(8), 1567–1579.CrossRef Google Scholar PubMed

Chen, Y., Chi, Y., Fan, J., Ma, C.. (2021). Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5), 566–806.CrossRef Google Scholar

Chen, Y., Li, X., Zhang, S.. (2019). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84, 124–146.CrossRef Google Scholar PubMed

Chen, Y., Li, X., Zhang, S.. (2020). Structured latent factor analysis for large-scale data: Identifiability, estimability, and their implications. Journal of the American Statistical Association, 115(532), 1756–1770.CrossRef Google Scholar

Chen, Y., Ying, Z., Zhang, H.. (2021). Unfolding-model-based visualization: Theory, method and applications. Journal of Machine Learning Research, 22, 11.Google Scholar

Dobriban, E., Owen, A. B.. (2019). Deterministic parallel analysis: An improved method for selecting factors and principal components. Journal of the Royal Statistical Society Series B: Statistical Methodology, 81(1), 163–183.CrossRef Google Scholar

Donoho, D., & Stodden, V. (2003). When does non-negative matrix factorization give a correct decomposition into parts? Advances in Neural Information Processing Systems, 16.Google Scholar

Embretson, S. E., Reise, S. P.. (2013), Item response theory, New York: Psychology Press.CrossRef Google Scholar

Erosheva, E. A. (2002). Grade of membership and latent structure models with application to disability survey data. PhD thesis, Carnegie Mellon University.Google Scholar

Erosheva, E. A.. (2005). Comparing latent structures of the grade of membership, Rasch, and latent class models. Psychometrika, 70(4), 619–628.CrossRef Google Scholar

Erosheva, E. A., Fienberg, S. E., Joutard, C.. (2007). Describing disability through individual-level mixture models for multivariate binary data. Annals of Applied Statistics, 1(2), 346.CrossRef Google Scholar PubMed

Freyaldenhoven, S., Ke, S., Li, D., & Olea, J. L. M. (2023). On the testability of the anchor words assumption in topic models. Technical report, working paper, Cornell University.Google Scholar

Gillis, N., Vavasis, S. A.. (2013). Fast and robust recursive algorithms for separable nonnegative matrix factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(4), 698–714.CrossRef Google Scholar

Goodman, L. A.. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215–231.CrossRef Google Scholar

Gormley, I. C., Murphy, T. B.. (2009). A grade of membership model for rank data. Bayesian Analysis, 4(2), 265–295.CrossRef Google Scholar

Gu, Y., Erosheva, E. E., Xu, G., Dunson, D. B.. (2023). Dimension-grouped mixed membership models for multivariate categorical data. Journal of Machine Learning Research, 24(88), 1–49.Google Scholar

Hagenaars, J. A., McCutcheon, A. L.. (2002). Applied latent class analysis. Cambridge: Cambridge University Press.CrossRef Google Scholar

Horn, J. L.. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179–185.CrossRef Google Scholar PubMed

Hoyer, P. O.. (2004). Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5(9), 1457–1469.Google Scholar

Jin, J., Ke, Z. T., Luo, S.. (2023). Mixed membership estimation for social networks. Journal of Econometrics,.Google Scholar

Ke, Z. T., Jin, J.. (2023). Special invited paper: The score normalization, especially for heterogeneous network and text data. Stat, 12(1), e545.CrossRef Google Scholar

Ke, Z. T., Wang, M.. (2022). Using SVD for topic modeling. Journal of the American Statistical Association, 2022, 1–16.Google Scholar

Klopp, O., Panov, M., Sigalla, S., & Tsybakov, A. (2023). Assigning topics to documents by successive projections. Annals of Statistics (to appear).CrossRef Google Scholar

Koopmans, T. C., Reiersol, O.. (1950). The identification of structural characteristics. The Annals of Mathematical Statistics, 21(2), 165–181.CrossRef Google Scholar

Manrique-Vallier, D., Reiter, J. P.. (2012). Estimating identification disclosure risk using mixed membership models. Journal of the American Statistical Association, 107(500), 1385–1394.CrossRef Google Scholar PubMed

Mao, X., Sarkar, P., Chakrabarti, D.. (2021). Estimating mixed memberships with sharp eigenvector deviations. Journal of the American Statistical Association, 116(536), 1928–1940.CrossRef Google Scholar

Neyman, J., Scott, E. L.. (1948). Consistent estimates based on partially consistent observations. Econometrica: Journal of the Econometric Society, 16, 1–32.CrossRef Google Scholar

Pokropek, A.. (2016). Grade of membership response time model for detecting guessing behaviors. Journal of Educational and Behavioral Statistics, 41(3), 300–325.CrossRef Google Scholar

Robitzsch, A., & Robitzsch, M. A. (2022). Packag ‘sirt’: Supplementary item response theory models.Google Scholar

Schwarz, G.. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.CrossRef Google Scholar

Shang, Z., Erosheva, E. A., Xu, G.. (2021). Partial-mastery cognitive diagnosis models. Annals of Applied Statistics, 15(3), 1529–1555.CrossRef Google Scholar

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., Van Der Linde, A.. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583–639.CrossRef Google Scholar

Woodbury, M. A., Clive, J., Garson, A. Jr. (1978). Mathematical typology: A grade of membership technique for obtaining disease definition. Computers and Biomedical Research, 11(3), 277–298.CrossRef Google Scholar PubMed

Zhang, H., Chen, Y., Li, X.. (2020). A note on exploratory item factor analysis by singular value decomposition. Psychometrika, 85, 358–372.CrossRef Google Scholar PubMed

Article contents

A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests