This document provides an overview of structural equation modeling (SEM). It discusses the basic components and goals of SEM, including latent and observed variables, measurement models, structural models, and error terms. It also covers model identification, parameters, building SEM models through specification, estimation, and modification. Goodness-of-fit indices for model testing are discussed, including chi-square, GFI, AGFI, and RMR. The document is intended as an introduction and guide to conceptualizing and conducting SEM analyses.
This document provides an overview of structural equation modeling (SEM). It discusses the basic components and goals of SEM, including latent and observed variables, measurement models, structural models, and error terms. It also covers model identification, parameters, building SEM models through specification, estimation, and modification. Goodness-of-fit indices for model testing are discussed, including chi-square, GFI, AGFI, and RMR. The document is intended as an introduction and guide to conceptualizing and conducting SEM analyses.
This document provides an overview of structural equation modeling (SEM). It discusses the basic components and goals of SEM, including latent and observed variables, measurement models, structural models, and error terms. It also covers model identification, parameters, building SEM models through specification, estimation, and modification. Goodness-of-fit indices for model testing are discussed, including chi-square, GFI, AGFI, and RMR. The document is intended as an introduction and guide to conceptualizing and conducting SEM analyses.
This document provides an overview of structural equation modeling (SEM). It discusses the basic components and goals of SEM, including latent and observed variables, measurement models, structural models, and error terms. It also covers model identification, parameters, building SEM models through specification, estimation, and modification. Goodness-of-fit indices for model testing are discussed, including chi-square, GFI, AGFI, and RMR. The document is intended as an introduction and guide to conceptualizing and conducting SEM analyses.
Spring 2012 1 What is structural equation modeling (SEM) Used to test the hypotheses about potential interrelationships among the constructs as well as their relationships to the indicators or measures assessing them.
2 Theory of planned behavior (TPB)
Goals of SEM To determine whether the theoretical model is supported by sample data or the model fits the data well. It helps us understand the complex relationships among constructs. 3 Factor1 Factor2 Indica1 Indica2 Indica3 Indica4 Indica5 Indica6 error1 error2 error3 error6 error4 error5 4 Example of SEM 5
Measurement model Measurement model Structural model Example of SEM Basic components of SEM Latent variables (constructs/factors) Are the hypothetical constructs of interest in a study, such as: self- control, self-efficacy, intention, etc. They cannot be measured directly. Observed variables (indicators) Are the variables that are actually measured in the process of data collection by the researchers using developed instrument/test. They are used to define or infer the latent variable or construct. Each of observed variables represents one definition of the latent variable.
6 Basic components of SEM Endogenous variables (dependent variables): variables have at least one arrow leading into it from another variable. Exogenous variables (independent variables): any variable that does not have an arrow leading to it.
7 Basic components of SEM Measurement error terms Represents amount of variation in the indicator that is due to measurement error. Structural error terms or disturbance terms Unexplained variance in the latent endogenous variables due to all unmeasured causes.
8 Basic components of SEM Covariance: is a measure of how much two variables change together. We use two-way arrow to show covariance.
9 Graphs in AMOS Rectangle represents observed variable Circle or eclipse represents unobserved variable Two-way arrow: covariance or correlation One-way arrow: unidirectional relationship
10 11 Latent variable Latent variable Observed variable Measurement Error terms Covariance Path Structural Error term Model parameters Are those characteristics of model unknown to the researchers. They have to be estimated from the sample covariance or correlation matrix. 12 Model parameters Regression weights/Factor loadings Structural Coefficient Variance Covariance Each potential parameter in a model must be specified to be fixed, free, or constrained parameters
13 Model parameters Free parameters: unknown and need to be estimated. Fixed parameters: they are not free, but are fixed to a specified value, either 0 or 1. Constrained parameters: unknown, but are constrained to equal one or more other parameters.
14 15 Fixed Free If opp_v1 = opp_v2, they are constrained parameters Build SEM models Model specification: is the exercise of formally stating a model. Prior to data collection, develop a theoretical model based on theory or empirical study, etc. Which variables are included in the model. How these variables are related. Misspecified model: due to errors of omission and/or inclusion of any variable or parameter.
16 Model identification: the model can in theory and in practice be estimated with observed data. Under-identified model: if one or more parameters may not be uniquely determined from observed data. A model for which it is not possible to estimate all of the model's parameters.
17 Model identification Just-identified model(saturated model): if all of the parameters are uniquely determined. For each free parameter, a value can be obtained through only one manipulation of observed data. The degree of freedom is equal to zero (number of free parameters exactly equals the number of known values). Model fits the data perfectly. Over-identified model: A model for which all the parameters are identified and for which there are more knowns than free parameters. 18 Just or over identified model is identified model If a model is under-identified, additional constraints may make model identified. The number of free parameters to be estimated must be less than or equal to the number of distinct values in the matrix S. The number of distinct values in matrix S is equal to p (p+1)/2, p is the number of observed variables.
19 How to avoid identification problems To achieve identification, one of the factor loadings must be fixed to one. The variable with a fixed loading of one is called a marker variable or reference item. This method can solve the scale indeterminacy problem. There are "enough indicators of each latent variable. A simple rule that works most of the time is that there need to be at least two indicators per latent variable and those indicators' errors are uncorrelated. Use recursive model Design a parsimonious model
20 Rules for building SEM model All variances of independent variables are model parameters. All covariances between independent variables are model parameters. All factor loadings connecting the latent variables and their indicators are parameters. All regression weights between observed or latent variables are parameters.
21 Rules for building SEM model The variance and covariances between dependent variables and covariances between dependent and independent variables are NOT parameters. *For each latent variable included in the model, the metric of its latent scale needs to be set. For any independent latent variable: a path leaving the latent variable is set to 1. *Paths leading from the error terms to their corresponding observed variables are assumed to be equal to 1. 22 23
Build SEM models: Model estimation How SEM programs estimate the parameters? The proposed model makes certain assumptions about the relationships between the variables in the model. The proposed model has specific implications for the variances and covariances of the observed variables. 24 How SEM programs estimate the parameters? We want to estimate the parameters specified in the model that produce the implied covariance matrix . We want matrix is as close as possible to matrix S, sample covariance matrix of the observed variables. If elements in the matrix S minus the elements in the matrix is equal to zero, then chi-square is equal to zero, and we have a perfect fit.
25 How SEM programs estimate the parameters? In SEM, the parameters of a proposed model are estimated by minimizing the discrepancy between the empirical covariance matrix, S, and a covariance matrix implied by the model, . How should this discrepancy be measured? This is the role of the discrepancy function. S is the sample covariance matrix calculated from the observed data. is covariance matrix implied by the proposed model or the reproduced (or model-implied) covariance matrix is determined by the proposed model.
26 How SEM programs estimate the parameters? In SEM, if the difference between S and (distance between matrices) is small, then one can conclude that the proposed model is consistent with the observed data. If the difference between S and is large, one can conclude that the proposed model doesnt fit the data. The proposed model is deficient. The data is not good.
27 Build SEM models Model estimation Estimation of parameters. Estimation process uses a particular fit function to minimize the difference between S and . If the difference = 0, one has a perfect model fit to the data.
28 Model estimation methods The two most commonly used estimation techniques are Maximum likelihood (ML) and normal theory generalized least square (GLS). ML and GLS: large sample size, continuous data, and assumption of multivariate normality Unweighted least squares (ULS): scale dependent. Asymptotically distribution free (ADF) (Weighted least squares, WLS): serious departure from normality.
29
30 Assume normality No normality assumed Model testing We want to know how well the model fits the data. If S and are similar, we may say the proposed model fits the data. Model fit indices. For individual parameter, we want to know whether a free parameter is significantly different from zero. Whether the estimate of a free parameter makes sense.
31 Chi-square test Value ranges from zero for a saturated model with all paths included to a maximum for the independence model (the null model or model with no parameters estimated).
32 Build SEM models Model modification If the model doesnt fit the data, then we need to modify the model . Perform specification search: change the original model in the search for a better fitting model .
33 Goodness-of-fit tests based on predicted vs. observed covariances (absolute fit indexes) Chi-square (CMIN): a non-significant 2 value
indicates S and are similar. 2 should NOT be significant if there is a good model fit. Goodness-of-fit (GFI) and adjusted goodness-of-fit (AGFI). GFI measures the amount of variance and covariance in S that is predicted by . AGFI is adjusted for the degree of freedom of a model relative to the number of variables.
34 Goodness-of-fit tests based on predicted vs. observed covariances (absolute fit indexes) Root-mean-square residual index (RMR). The closer RMR is to 0, the better the model fit. Hoelter's critical N, also called the Hoelter index, is used to judge if sample size is adequate. By convention, sample size is adequate if Hoelter's N > 200. A Hoelter's N under 75 is considered unacceptably low to accept a model by chi- square. Two N's are output, one at the .05 and one at the .01 levels of significance.
35 Information theory goodness of fit: absolute fit indexes. Measures in this set are appropriate when comparing models using maximum likelihood estimation. AIC,BIC,CAIC,and BCC. For model comparison, the lower AIC reflects the better-fitting model. AIC also penalizes for lack of parsimony. BIC: BIC is the Bayesian Information Criterion. It penalizes for sample size as well as model complexity. It is recommended when sample size is large or the number of parameters in the model is small. 36 Information theory goodness of fit: absolute fit indexes. CAIC: an alternative to AICC, also penalizes for sample size as well as model complexity (lack of parsimony). The penalty is greater than AIC or BCC but less than BIC. The lower the CAIC measure, the better the fit. BCC: It should be close to .9 to consider fit good. BCC penalizes for model complexity (lack of parsimony) more than AIC. 37 Goodness-of-fit tests comparing the given model with a null or an alternative model. CFI, NFI, NFI Goodness-of-fit tests penalizing for lack of parsimony. parsimony ratio (PRATIO), PNFI, PCFI
38 Scaling and normality assumption Maximum likelihood and normal theory generalized least squares assume that the measured variables are continuous and have a multivariate normal distribution. In social sciences, we use a lot of variables that are dichotomous or ordered categories rather than truly continuous. In social sciences, it is normal that the distribution of observed variables departs substantially from multivariate normality.
39 Scaling and normality assumption Nominal or ordinal variables should have at least five categories and not be strongly skewed or kurtotic. Values of skewness and kurtosis are within -1 and + 1.
40 Problems of non-normality(practical implications) Inflated 2 goodness-of-fit statistics. Make inappropriate modifications in theoretically adequate models. Findings can be expected to fail to be replicated and contributing to confusion in research areas.
41 How to detect normality of observed data? Screen the data before the data analysis to check the distributions. Skewness and kurtosis: univariate normality. AMOS provides normality results.
42 Solutions to nonnormality The asymptotically distribution free (ADF) estimation: ADF produces asymptotically unbiased estimates of the 2 goodness-of-fit test, parameter estimates, and standard errors. Limitation: require large sample size. 43 Solutions to nonnormality Unweighted least square (ULS): No assumption of normality and no significance tests available. Scale dependent. Bootstrapping: it doesnt rely on normal distribution. Bayesian estimation: if ordered-categorical data are modeled.
44 Sample size (Rules of thumb) 10 subjects per variable or 20 subjects per variable 250-500 subjects (Schumacker & Lomax, 2004)
45 Computer programs for SEM AMOS EQS LISERAL MPLUS SAS
46 AMOS is short for Analysis of MOment Structures. A software used for data analysis known as structural equation modeling (SEM). It is a program for visual SEM.
47 Path diagrams They are the ways to communicate a SEM model. They are drawing pictures to show the relationships among latent/observed variables. In AMOS: rectangles represent observed variables and eclipses represent latent variables.
48 Examples of using AMOS tool bar to draw a diagram. Example Two latent variables: intention and self-efficacy Four observed variables: intention01, intention02, self_efficacy01, and self_efficacy02 Five error terms
49 The model should be like this 50 Go to All programs from Start > IBM SPSS Statistics > IBM SPSS AMOS19 > AMOS Graphics
51 Latent variables Observed variables 52 Tool bar Draw observed variables use Rectangle Draw latent variables use ellipse Draw error terms use
53 Use duplicate objects to get another part of the model , then use Reflect 54 55 Open data: File Data Files 56 Click Your file Put observed variable names to the graphs Go to View > Variables in Dataset Then drag each variable to each rectangle
57 Put latent variables in the graph Put the mouse over one latent variable and right click Get this menu Click Object Properties Type Self-efficacy here 58 For error terms, double click the ellipse and get Object Property window. Constrain parameters: double click a path from Self-efficacy to Self-efficy01, type 1 for regression weight, then click Close.
59 Click The data is from AMOS examples (IBM SPSS) Attig repeated the study with the same 40 subjects after a training exercise intended to improve memory performance. There were thus three performance measures before training and three performance measures after training. 60 Draw diagram 61 Conduct analysis: Analyze > Calculate Estimates Text output 62 1. Number of distinct sample moments: sample means, variances, and covariances (AMOS ignores means). We also use 4(4+1)/2 = 10. 2. Number of distinct parameters to be estimated: 4 variances and 6 covariances. 3. Degrees of freedom: number of distinct sample moments minus number of distinct parameters Text output 63 There is no null hypothesis being tested for this example. The Chi-square result is not very interesting.
For hypothesis test, the chi-square value is a measure of the extent to which the data were incompatible with the hypothesis. For hypothesis test, the result will be positive degrees of freedom. A chi-square value of 0 indicates no departure from the null hypothesis. 64 Text output 65 Minimum was achieved: this line indicates that Amos successfully estimated the variances and covariances. When Amos fails, it is because you have posed a problem that has no solution, or no unique solution (model identification problem). Text output 66 1. Estimate means covariance: for example the covariance between recall1 and recall2 is 2.556. 2. S.E. means an estimate of the standard error of the covariance, 1.16. 3. C.R. is the critical ratio obtained by dividing the covariance estimate by its standard error. 4. For a significance level of 0.05, critical ratio that exceeds 1.96 would be called significant. This ratio is relevant to the null hypothesis that, the covariance between recall1 and recall2 is 0. Text output 67 5. In this example, 2.203 is greater than 1.96, then the covariance between recall1 and recall2 is significantly different from 0 at the 0.05 level. 6. P value of 0.028 (two-tailed) is for testing the null hypothesis that the parameter value is 0 in the population.