Hostname: page-component-669899f699-tzmfd Total loading time: 0 Render date: 2025-04-29T18:04:09.876Z Has data issue: false hasContentIssue false

A Spectral Method for Identifiable Grade of Membership Analysis with Binary Responses

Published online by Cambridge University Press:  27 December 2024

Ling Chen
Affiliation:
Columbia University
Yuqi Gu*
Affiliation:
Columbia University
*
Correspondence should be made to Yuqi Gu, Department of Statistics, Columbia University, New York, NY10027, USA. Email: [email protected]

Abstract

Grade of membership (GoM) models are popular individual-level mixture models for multivariate categorical data. GoM allows each subject to have mixed memberships in multiple extreme latent profiles. Therefore, GoM models have a richer modeling capacity than latent class models that restrict each subject to belong to a single profile. The flexibility of GoM comes at the cost of more challenging identifiability and estimation problems. In this work, we propose a singular value decomposition (SVD)-based spectral approach to GoM analysis with multivariate binary responses. Our approach hinges on the observation that the expectation of the data matrix has a low-rank decomposition under a GoM model. For identifiability, we develop sufficient and almost necessary conditions for a notion of expectation identifiability. For estimation, we extract only a few leading singular vectors of the observed data matrix and exploit the simplex geometry of these vectors to estimate the mixed membership scores and other parameters. We also establish the consistency of our estimator in the double-asymptotic regime where both the number of subjects and the number of items grow to infinity. Our spectral method has a huge computational advantage over Bayesian or likelihood-based methods and is scalable to large-scale and high-dimensional data. Extensive simulation studies demonstrate the superior efficiency and accuracy of our method. We also illustrate our method by applying it to a personality test dataset.

Type
Theory & Methods
Copyright
Copyright © 2024 The Author(s), under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Airoldi, E. M., Blei, D., Erosheva, E. A., Fienberg, S. E.. (2014). Handbook of mixed membership models and their applications. Boca Raton: CRC Press.CrossRefGoogle Scholar
Airoldi, E. M., Blei, D. M., Fienberg, S. E., Xing, E. P.. (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9, 19812014.Google ScholarPubMed
Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. Selected papers of Hirotugu Akaike (pp. 199–213).CrossRefGoogle Scholar
Araújo, M. C. U., Saldanha, T. C. B., Galvao, R. K. H., Yoneyama, T., Chame, H. C., Visani, V.. (2001). The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and Intelligent Laboratory Systems, 57(2), 6573.CrossRefGoogle Scholar
Berry, M. W., Browne, M., Langville, A. N., Pauca, V. P., Plemmons, R. J.. (2007). Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis, 52(1), 155173.CrossRefGoogle Scholar
Blei, D. M., Ng, A. Y., Jordan, M. I.. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 9931022.Google Scholar
Borsboom, D., Rhemtulla, M., Cramer, A. O., van der Maas, H. L., Scheffer, M., Dolan, C. V.. (2016). Kinds versus continua: A review of psychometric approaches to uncover the structure of psychiatric constructs. Psychological Medicine, 46(8), 15671579.CrossRefGoogle ScholarPubMed
Chen, Y., Chi, Y., Fan, J., Ma, C.. (2021). Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5), 566806.CrossRefGoogle Scholar
Chen, Y., Li, X., Zhang, S.. (2019). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84, 124146.CrossRefGoogle ScholarPubMed
Chen, Y., Li, X., Zhang, S.. (2020). Structured latent factor analysis for large-scale data: Identifiability, estimability, and their implications. Journal of the American Statistical Association, 115(532), 17561770.CrossRefGoogle Scholar
Chen, Y., Ying, Z., Zhang, H.. (2021). Unfolding-model-based visualization: Theory, method and applications. Journal of Machine Learning Research, 22, 11.Google Scholar
Dobriban, E., Owen, A. B.. (2019). Deterministic parallel analysis: An improved method for selecting factors and principal components. Journal of the Royal Statistical Society Series B: Statistical Methodology, 81(1), 163183.CrossRefGoogle Scholar
Donoho, D., & Stodden, V. (2003). When does non-negative matrix factorization give a correct decomposition into parts? Advances in Neural Information Processing Systems, 16.Google Scholar
Embretson, S. E., Reise, S. P.. (2013), Item response theory, New York: Psychology Press.CrossRefGoogle Scholar
Erosheva, E. A. (2002). Grade of membership and latent structure models with application to disability survey data. PhD thesis, Carnegie Mellon University.Google Scholar
Erosheva, E. A.. (2005). Comparing latent structures of the grade of membership, Rasch, and latent class models. Psychometrika, 70(4), 619628.CrossRefGoogle Scholar
Erosheva, E. A., Fienberg, S. E., Joutard, C.. (2007). Describing disability through individual-level mixture models for multivariate binary data. Annals of Applied Statistics, 1(2), 346.CrossRefGoogle ScholarPubMed
Freyaldenhoven, S., Ke, S., Li, D., & Olea, J. L. M. (2023). On the testability of the anchor words assumption in topic models. Technical report, working paper, Cornell University.Google Scholar
Gillis, N., Vavasis, S. A.. (2013). Fast and robust recursive algorithms for separable nonnegative matrix factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(4), 698714.CrossRefGoogle Scholar
Goodman, L. A.. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215231.CrossRefGoogle Scholar
Gormley, I. C., Murphy, T. B.. (2009). A grade of membership model for rank data. Bayesian Analysis, 4(2), 265295.CrossRefGoogle Scholar
Gu, Y., Erosheva, E. E., Xu, G., Dunson, D. B.. (2023). Dimension-grouped mixed membership models for multivariate categorical data. Journal of Machine Learning Research, 24(88), 149.Google Scholar
Hagenaars, J. A., McCutcheon, A. L.. (2002). Applied latent class analysis. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Horn, J. L.. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179185.CrossRefGoogle ScholarPubMed
Hoyer, P. O.. (2004). Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5(9), 14571469.Google Scholar
Jin, J., Ke, Z. T., Luo, S.. (2023). Mixed membership estimation for social networks. Journal of Econometrics,.Google Scholar
Ke, Z. T., Jin, J.. (2023). Special invited paper: The score normalization, especially for heterogeneous network and text data. Stat, 12(1), e545.CrossRefGoogle Scholar
Ke, Z. T., Wang, M.. (2022). Using SVD for topic modeling. Journal of the American Statistical Association, 2022, 116.Google Scholar
Klopp, O., Panov, M., Sigalla, S., & Tsybakov, A. (2023). Assigning topics to documents by successive projections. Annals of Statistics (to appear).CrossRefGoogle Scholar
Koopmans, T. C., Reiersol, O.. (1950). The identification of structural characteristics. The Annals of Mathematical Statistics, 21(2), 165181.CrossRefGoogle Scholar
Manrique-Vallier, D., Reiter, J. P.. (2012). Estimating identification disclosure risk using mixed membership models. Journal of the American Statistical Association, 107(500), 13851394.CrossRefGoogle ScholarPubMed
Mao, X., Sarkar, P., Chakrabarti, D.. (2021). Estimating mixed memberships with sharp eigenvector deviations. Journal of the American Statistical Association, 116(536), 19281940.CrossRefGoogle Scholar
Neyman, J., Scott, E. L.. (1948). Consistent estimates based on partially consistent observations. Econometrica: Journal of the Econometric Society, 16, 132.CrossRefGoogle Scholar
Pokropek, A.. (2016). Grade of membership response time model for detecting guessing behaviors. Journal of Educational and Behavioral Statistics, 41(3), 300325.CrossRefGoogle Scholar
Robitzsch, A., & Robitzsch, M. A. (2022). Packag ‘sirt’: Supplementary item response theory models.Google Scholar
Schwarz, G.. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461464.CrossRefGoogle Scholar
Shang, Z., Erosheva, E. A., Xu, G.. (2021). Partial-mastery cognitive diagnosis models. Annals of Applied Statistics, 15(3), 15291555.CrossRefGoogle Scholar
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., Van Der Linde, A.. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583639.CrossRefGoogle Scholar
Woodbury, M. A., Clive, J., Garson, A. Jr. (1978). Mathematical typology: A grade of membership technique for obtaining disease definition. Computers and Biomedical Research, 11(3), 277298.CrossRefGoogle ScholarPubMed
Zhang, H., Chen, Y., Li, X.. (2020). A note on exploratory item factor analysis by singular value decomposition. Psychometrika, 85, 358372.CrossRefGoogle ScholarPubMed