Group comparisons is one of the fundamental analytical procedures in psychology and science in general. When comparing groups, it is assumed that the scale, questionnaire, or metric, measures the same construct in both groups. However, due to measurement invariance this is not always the case. It is therefore important to implement measurement invariance when using such measures to assess the psychometric equivalence between groups, enabling comparability between them. Measurement invariance testing using factor analysis involves the use of multiple nested models. This is done by first testing for configural invariance (equidimensional structure), allowing a unified description of test scores. This is followed by testing for metric invariance (equality of factor loadings), enabling the same interpretation of factors across groups. Once the third nested model assessing scalar invariance (equality of intercepts) is tested for, direct comparisons between groups of participants can then be made. Each step of the nested model compares how well the data fits the more restrictive model in comparison to the baseline model. Model fit is assessed using fit indices such as CFI, RMSEA, SRMR. This research explored whether methods for numerical or categorical data are more suitable for analysing measurement invariance by utilising a simulation study. As Likert scales are often used for group comparisons in psychological research, measurement invariance was tested using simulated Likert-type survey responses. Using this data, the behaviour of fit indices and p-values of the χ2 test based on sample size, number of response categories, number of items, data asymmetry, and simulated data bias, were investigated. Using threshold values, it was then explored whether fit indices are below their threshold for metric and scalar invariance. Measurement invariance was confirmed when three out of four indicators showed a good fit. Results from this study therefore indicate that categorical methods are more suitable for analysing measurement invariance in samples with 500 and 1000 participants while numerical methods are similarly or better suited for samples using a seven point Likert scale. CFI and χ2 test p-value proved to be the most appropriate measures for evaluating differences between models for categorical data. This study is one of very few that has explored the impact of several factors on fit indices when assessing measurement invariance of Likert scales.
|