The Use of Data Imputation when Investigating Dimensionality in Sparse Data from Computerized Adaptive Tests
Keywords:
Computerized Adaptive Testing, CART, Imputation, Mice, SparsenessAbstract
The development of a Computerized Adaptive Test (CAT) for operational use begins with several important steps, such as creating a large-size item bank, piloting the items on a sizable and representative sample of examinees, dimensionality assessment of the item bank, and estimation of item parameters. Among these steps, testing the dimensionality of the item bank is particularly important because the subsequent analyses depend on the confirmation of the hypothesized factor structure (e.g., unidimensionality). After the CAT becomes operational, it is still important to periodically assess the dimensionality of the item bank because both the examinee population and the item bank may change over time. However, extreme sparseness of the response data returned from the CAT makes the test of dimensionality very difficult. This study investigated whether data imputation can be a feasible solution to the sparseness problem when examining test dimensionality in sparse data returned from CATs. Sparse data with unidimensional, multidimensional, and bi-factor test structures were simulated based on real data from a large-scale, operational CAT. Two-way imputation and Multivariate Imputation with Chain Equations (MICE) methods were used to replace missing responses in the data. Using confirmatory factor analysis, imputed datasets were analyzed to examine whether the true test structure was retained after imputations. Results indicated that MICE with classification and regression trees (MICE-CART) produced highly accurate results in retaining the true structure, whereas the performances of other imputation methods were quite poor. Data imputation with MICE-CART appears to be promising solution to data sparsity when examining test dimensionality for CATs.
Downloads
Metrics
Downloads
Published
How to Cite
Issue
Section
References
Adams, R. J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1-23. https://doi.org/10.1177/0146621697211001
Allison, P. D. (2003). Missing data techniques for structural equation modeling. Journal of Abnormal Psychology, 112, 545-557. https://doi.org/10.1037/0021-843X.112.4.545 PMid:14674868
Azur, M. J., Stuart, E., Frangakis, C., & Leaf, P. (2011). Multiple imputation by chained equations: What is it and how does it work? International Journal of Methods in Psychiatric Research, 20(1), 40-49. https://doi.org/10.1002/mpr.329 https://doi.org/10.1002/mpr.329 PMid:21499542 PMCid:PMC3074241
Ban, J., Hanson, B.A., Yi, Q., & Harris, D. (2001). Data sparseness and online pretest calibration/scaling methods in CAT. Paper presented at the annual meeting of the American Educational Research Association, Seattle, WA.
Bernaards, C. A., & Sijtsma, K. (2000). Influence of simple imputation and EM methods on factor analysis when item nonresponse in questionnaire data is no ignorable. Multivariate Behavioral Research, 35(3), 321364. https://doi.org/10.1207/S15327906MBR3503_03 PMid:26745335
Birnbaum, A. (1968). Some latent trait models. In F.M. Lord & M.R. Novick, (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Bock, D., Gibbons, R., & Muraki, E. (1988). Fullinformation item factor analysis. Applied Psychological Measurement, 12, 261-280. https://doi.org/10.1177/014662168801200305
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford. https://doi.org/10.1080/00036810600603377
Bulut, O., & Kan, A. (2012). Application of computerized adaptive testing to Entrance Examination for Graduate Studies in Turkey. Eurasian Journal of Educational Research, 49, 61-80.
Burgette, L. F., & Reiter, J. P. (2010). Multiple imputation for missing data via sequential regression trees. American Journal of Epidemiology, 172(9), 1070-1076. https://doi.org/10.1093/aje/kwq260 PMid:20841346
Cappaert, K. J., Wen, Y., & Chang, Y. F. (2018). Evaluating CAT-adjusted approaches for suspected item parameter drift detection. Measurement: Interdisciplinary Research and Perspectives, 16(4), 226-238. https://doi.org/10.1080 /15366367.2018.1511199
Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8(3), 430-457. https://doi.org/10.1207/S15328007SEM0803_5
Finch, H. (2011). The use of multiple imputation for missing data in uniform DIF analysis: Power and type I error rates. Applied Measurement in Education, 24(4), 281301. https://doi.org/10.1080/08957347.2011.607054
Glas, C. A. W. (2006). Violations of ignorability in computerized adaptive testing. (LSAC research report series; No. 04-04). Newton, PA, USA: Law School Admission Council.
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549-576. https://doi.org/10.1146/annurev.psych.58.110405.085530 PMid:18652544
Hallquist, M. N. & Wiley, J. F. (2018). Mplus Automation: An R package for facilitating large-scale latent variable analyses in Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 621-638. doi: 10.1080/10705511.2017.1402334. https://doi.org/ 10.1080/10705511.2017.1402334 PMid:30083048 PMCid:PMC6075832
Han, K. T., & Guo, F. (2014). Impact of violation of the missing-at-random assumption on full-information maximum likelihood method in multidimensional adaptive testing. Practical Assessment, Research & Evaluation, 19(2).
Harmes, J. C., Kromney, J. D., & Parshall, C. G. (2001). Online item parameter recalibration: Application of missing data treatments to overcome the effects of sparse data conditions in a computerized adaptive version of the MCAT. Report submitted to the Association of American Medical Colleges, Section for the MCAT. Retrieved from http://iacat.org/sites/default/files/biblio/ ha01-01.pdf
Harrison, D. A. (1986). Robustness of IRT parameter estimation to violations of the unidimensionality assumption. Journal of Educational Statistics, 11(2), 91-115. https://doi.org/10.3102/10769986011002091
Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55. https://doi.org/10.1080/10705519909540118
Ito, K., & Sykes, R.C. (1994). The effect of restricting ability distributions in the estimation of item difficulties: Implications for a CAT implementation. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans.
Kadengye, D. T., Cools, W., Ceulemans, E., & Van den Noortgate, W. (2012). Simple imputation methods versus direct likelihood analysis for missing item scores in multilevel educational data. Behavior research methods, 44(2), 516-531. https://doi.org/10.3758/s13428-0110157-x PMid:22002637
Kingsbury, G. G. (2009). Adaptive item calibration: A process for estimating item parameters within a computerized adaptive test. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. Retrieved from http://iacat.org/sites/default/files/biblio/cat09kingsbury.pdf
Leite, W. L., & Beretvas, S. N. (2004). The performance of multiple imputation for Likert-type items with missing data. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
Linacre, J. M. (2011). Rasch measures and unidimensionality. Rasch Measurement Transactions, 24(4), 1310.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: Wiley. https://doi.org/10.1002/9781119013563
Liu, C., Han, K. T., & Li, J. (2019). Compromised item detection for computerized adaptive testing. Frontiers in psychology, 10, 829. https://doi.org/10.3389/ fpsyg.2019.00829 PMid:31105612 PMCid:PMC6499181
Lorenzo-Seva, U., & Van Ginkel, J. R. (2016). Multiple imputation of missing values in exploratory factor analysis of multidimensional scales: estimating latent trait scores. Annals of Psychology, 32(2), 596-608. https://doi.org/10.6018/analesps.32.2.215161
Makransky, G., & Glas, C. A. (2014). An automatic online calibration design in adaptive testing. Journal of Applied Testing Technology, 11(1), 1-20.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.
Mislevy, R. J., & Wu, P.-K. (1996). Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing. ETS Research Report Series, 2, i-36. https://doi.org/10.1002/j.23338504.1996.tb01708.x
Muthén, L. K., & Muthén, B. O. (1998-2015). Mplus User’s Guide Seventh Edition. Los Angeles, CA: Muthén & Muthén. Nydick, S. W., & Weiss, D. J. (2009). A hybrid simulation procedure for the development of CATs. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. Retrieved from http:// www.iacat.org/sites/default/files/biblio/cat09nydick.pdf
O’Neill, T., & Reynolds, M. (2006). Assessing the unidimensionality of the NCLEX-RN. Retrieved from https://www.ncsbn.org/2005.04_ONeill_-_AERA_-_ Assessing_the_Unidimensionality_of_the_NCLEX-RN.pdf
Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525-556. https://doi.org/10.3102/00346543074004525
R Core Team (2019). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing.
Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press.
Rässler, S., Rubin, D. B., & Zell, E. R. (2013). Imputation. WIREs Computational Statistics, 5(1), 20-29. https://doi.org/10.1002/wics.1240
Ren, H., van der Linden, W. J., & Diao, Q. (2017). Continuous online item calibration: Parameter recovery and item utilization. Psychometrika, 82(2), 498-522. https://doi.org/10.1007/s11336-017-9553-1 PMid:28290109
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. https://doi.org/10.1002/9780470316696
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177. https://doi.org/10.1037/1082-989X.7.2.147 PMid:12090408
Segall, D. O. (2005). Computerized adaptive testing. In K. Kempf-Leonard (Ed.), Encyclopedia of social measurement (pp. 429-438). Boston: Elsevier Academic. https://doi.org/10.1016/B0-12-369398-5/00444-8
Smith, R. M. (1996). A comparison of methods for determining dimensionality in Rasch measurement.
Structural Equation Modeling, 3, 25-40. https://doi.org/10.1080/10705519609540027
Thompson, N. A., & Weiss, D. A. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research, and Evaluation, 16(1). doi: https://doi.org/10.7275/wqzt-9427
Trendafilov, N., Kleinsteuber, M., & Zou, H. (2014). Sparse matrices in data analysis. Computational Statistics, 29(3), 403-405. https://doi.org/10.1007/s00180-013-0468-8
Van Buuren, S. (2018). Flexible imputation of missing data (2nd Ed.). Boca Raton, FL: Chapman & Hall/CRC.https://doi.org/10.1201/9780429492259
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1-67. https://doi.org/10.18637/jss.v045.i03
Wainer H., & Mislevy R. J. (2000). Item response theory, item calibration, and proficiency estimation. In H. Wainer (Ed.), Computer adaptive testing: A primer (pp. 65-102). Hillsdale, NJ: Lawrence Erlbaum.
Wang, S., Jiao, H., & Xiang, Y. (2013, April). The effect of nonignorable missing data in computerized adaptive test on item fit statistics for polytomous item response models. In annual meeting of the National Council on Measurement in Education, San Francisco, CA.
Weiss, D. J. (2004). Computerized adaptive testing for effective and efficient measurement in counseling and education. Measurement and Evaluation in Counseling and Development, 37(2), 70-84. https://doi.org/10.1080/07481756.2004.11909751
Wright, B. D. (1997). Rasch factor analysis. In M. Wilson, G. Engelhard, & K. Draney (Eds.), Objective measurement: Theory into practice (Vol. 4) (pp. 113-137). Norwood, NJ: Ablex.
Yu, C. Ho., Popp, S. O., DiGangi, S., & Jannasch-Pennell, A. (2007). Assessing unidimensionality: A comparison of Rasch modeling, parallel analysis, and TETRAD. Practical Assessment Research & Evaluation, 12(14).