Development and Psychometric Evaluation of Scales: A Survey of Published Articles

2016, DOAJ (DOAJ: Directory of Open Access Journals)

Review Article Development and Psychometric Evaluation of Scales: A Survey of Published Articles Foroozan Atashzadeh-Shoorideh, PhD1; Farideh Yaghmaei, PhD2∗ PhD in Nursing, Nursing Management Department, Nursing & Midwifery School, Shahid Beheshti University of Medical Sciences, Tehran, Iran 2 Department of Nursing, Zanjan Branch, Islamic Azad University, Zanjan, Iran. 1 Abstract Background and purpose: Using valid and reliable instruments is an important way for collecting data in qualitative researches. This paper is a report of a study conducted to examine the extent of psychometric properties of the scales in research papers published in Journal of Advanced Nursing. Methods: In this study, the Journal of Advanced Nursing was chosen for systematic review. All articles which were published during 2007-2009 in this journal were collected and articles related to instrument development were selected. Each article was completely reviewed to identify the methods of instrument validation and reliability. Results: From 980 articles published in Journal of Advanced Nursing during 2007-2009, 41 (4.18%) articles were about research methodology. In these, 12 articles (29.27%) were related to developing an instrument. In this study, review of 12 articles that published in Journal of Advanced Nursing, 20072009, showed that some of the articles did not measure psychometric properties properly, thus some of the developed scales need to measure other types of necessary validity. In addition, reliability testing needs to be performed on each instrument used in a study before other statistical analysis are performed. From 12 articles, all of the articles measured and reported Cronbach’s alpha, but four of them did not measure test-retest. Conclusions: Although researchers put a great emphasis on methodology and statistical analysis, they pay less attention to the psychometric properties of their new instruments. The authors of this article hope to draw the attention of researcher to the importance of measuring psychometric properties of new instruments. Keywords: PSYCHOMETRIC, SCALES, CRITICAL REVIEW Journal of Medical Education Summer 2015; 14(4):174-205 Introduction1 The credibility of results from a study is totally dependent on identifying, measuring, and collecting the right variables. Instruments are used to measure variables directly from subjects (1) and research instruments refer to questionnaires or inventories on which, data from a research project can be entered and ∗Corresponding author: Farideh Yaghmaei, Associate professor, Department of Nursing, Zanjan Branch, Islamic Azad University, Zanjan, Iran. Email: farideh.yaghmaei@iauz.ac.ir 174 stored for later analysis. An important part in the process of developing a questionnaire is to ensure its validity and reliability (2). Using a valid and reliable instrument is an integral part of any research. Since interpretation of results depends on the validity of instruments used in studies, researchers should be sure about it (3). Validity is a significant and complicated issue which is considered by authors as well as readers (4). Types of validity includes: face validity, content validity, construct validity (factor analysis, validity by convergent validity, divergent validity, discriminating analysis) criterion validity (concurrent Journal of Medical Education validity and predictive validity), and successive verifications (5). Measuring and reporting content validity of instruments is very important (6). Some authors in their articles have reported the process of measuring content validity frequently, while others did not. This type of validity can also help to ensure construct validity and give confidence to the readers and researchers about instruments. Content validity is used to measure the variables of interest. It is also known as content related validity, intrinsic validity, relevance validity, representative validity and logical or sampling validity (7-9). Therefore, content validity measures the comprehensiveness and representativeness of the content of a scale (10, 11). Construct validity of an instrument is the theoretical frame or feature of a concept that the instrument measures such as intelligence, sorrow, or prejudice. Construct validity can be calculated by different methods including contrasted groups, convergent and divergent analysis or discriminate and factor analysis (12). The criterion validity indicates to what degree the subject’s performance on the measurement instrument and subject’s actual behavior are related. Two forms of criterionrelated validity are concurrent and predictive. Concurrent validity refers to an instrument’s ability to distinguish among people who differ in their present status on the same criterion (13). Predictive validity refers to an instrument’s ability to differentiate between people’s performances or behaviors on the same future criterion (12). Reliability refers to the consistency with which participants of similar characteristics and outlook understand and respond to the questions (2). The most common method of testing a scale’s reliability is Cronbach’s Alpha coefficient (14), and to determine the stability of the instrument, a test-retest must be carried out (15, 16). The internal consistency may be a necessary condition for homogeneity or unidimensionality of a scale Fall 2015, Vol. 14, No. 4 and Cronbach’s alpha should be 0.70 or higher (14, 17, 18). Test-retest can be used to determine the stability of the instrument (15, 16). It is accomplished by administrating an instrument, waiting a reasonable period of time, and then re-administrating the instrument. The best correlation coefficient between the two sets of item scores is 0.70 or higher (1, 16). Since strong measurement strategy is critical for proper research (1, 19), this study was conducted to evaluate the process of measuring validity and reliability of 12 development instruments papers published in Journal of Advanced Nursing (JAN) during 2007-2009. Methods In this study, the "Journal of Advanced Nursing" was chosen for review. All articles published during 2007-2009 in this journal were collected and articles related to instrument development were included. Each article was completely reviewed to identify the methods of instrument validation and reliability. Results From 980 articles published in Journal of Advanced Nursing during 2007-2009, 41 (4.18%) articles were about research methodology. In these research methodology papers, 12 articles (29.27%) were related to developing a instrument. Table 1 shows the features of the articles. None of 12 articles mentioned their psychometric properties absolutely (Table 1). Discussion Appropriate instruments have a significant influence on validity of a study. Invalid and unreliable instruments may show incorrect results and using findings is doubtful. In 175 Development and Psychometric Evaluation of Scales… / Atashzadeh-Shoorideh et al. Table 1. Instruments' characteristics of published articles in Journal of Advanced Nursing, 2007-2009. Author Instrument/s /s Type of validity Criticism of validity Type of Criticism reliability reliability of Ushiro The Content validity, in this Content validity is an Cronbach’s alpha The R psychometric article was measured by initial coefficients coefficients (2009) properties revising the content and establishing validity, but test–retest 0.70 and above (20) the wording of Nurse– in and of based on the the best method in this reliability indicate made by the regard coefficients were these scales are Validity Index (14), that measured. internally with didn’t measure in this Cronbach’s alpha consistent exploratory factor analysis study. In addition, the coefficient for the All results for was (CFI) <0.8 and RMSEA number of person for physicians’ test–retest >0.08 for the single-factor measuring responses to the reliability model, and validity Nurse–Physician satisfactory, CFI <0.9 and RMSEA <0.08 between 15-20 (9) that Collaboration except for the for the three-factor model. did not mention in this Scale physician Concurrent study. were Physician responses Collaboration physicians and nurses. Scale (NPCS) Factor analysis: validity was is Content content should be 0.911 for responses patient regarding (16). were Factor between nurses’ responses to exploratory the analysis was measured 0.926 Collaboration Scale (NPCS) and participation and the Intergroup Conflict acceptable but cut-off the Scale. value for factor loadings decision-making However, other wasn’t reported. process and 0.842 α values were negative correlations for all Concurrent validity was for 0.70 three factors (r = _0.20 to reported but the ranges cooperativeness. which confirms _0.236, P < 0.01). Among of correlations for item- When Cronbach’s the stability of the between totals inter-item alpha coefficient the scales. physicians’ responses to the were The of the item-total The Nurse–Physician concurrent validity value correlations were correlation Collaboration Scale (NPCS) must be ranging from 0 compared coefficients and the Intergroup Conflict to +1 (4). those Scale, there were statistically Convergent validity was when an item had mentioned and it significant small negative reported but these ranges been is acceptable. correlations were no Nurse–Physician There were significant relationship for shared analysis (NPCS) that measured by relationships statistically 176 step alpha factor reported. and low. low. with It is The shared information, for sharing joint in cure/care with obtained eliminated, items was patient’s information, (r = convergent validity found lower than _0.165, value must be ranging coefficient value. P < 0.01) and of patient information (0.629). nurses – 0.92, test–retest for were Journal of Medical Education Fall 2015, Vol. 14, No. 4 cooperativeness. (r = _0.152, from 0 to +1 (4, 5). In The P < 0.01). addition; correlation values Convergent done validity with the Characteristic with both was Team Scale the and nurses’ the item-total psychometric of the used were high, scale ranging from for validity mentioned. convergent did not 0.502 to 0.801. The item-total responses (r = 0.360–0.523, correlation values P < 0.01) and physicians’ were high, responses (r = 0.435–0.639, ranging from P < 0.01) to the Nurse– 0.423 to 0.787. Physician The Collaboration test–retest Scale (NPCS). The used (The scale between the first in this study for interval convergent validity did not and the second validate or didn’t report its test validity and reliability. weeks) correlation was coefficients 2–3 for nurses were 0.710 (P<0.01) for sharing of patient information, 0.658 (P<0.01) for joint participation the in cure/care decision-making process, and 0.676 (P < 0.01) for cooperativeness. The test–retest correlation coefficients physicians for were 0.624 (P < 0.01) for sharing patient information, 0.798 (P < 0.01) 177 Development and Psychometric Evaluation of Scales… / Atashzadeh-Shoorideh et al. for joint participation the in cure/care decision-making process and 0.774 (P < 0.01) for cooperativeness. Author Instrument/s /s Type of validity Criticism of validity Type of Criticism reliability reliability Chang Chinese Content validity, in this Content validity or face Cronbach’s alpha Internal H-J version of article was not measured. validity is an initial step coefficients consistency Factor in establishing validity test–retest based reliability suggested et al the Positive analysis was and on (2009)( and Negative examined by using both with (6) 2) Suicide exploratory factor analysis measured in this study. coefficients were criterion Ideation and Factor measured. indicating (PANSI) analysis (CFA) and all item- exploratory Inventory total analysis confirmatory factor coefficients ranged that was not analysis with the level factor Cronbach’s alpha adequate internal and Coefficients were consistency for a from 0.42 to 0.71. The confirmatory factor 0.86 and 0.94 for coefficient’s α of results indicated that the two analysis was measured the total scores on 0.70 or above factor oblique model had the and the positive and (14). best fit. The confirmatory acceptable but, cut-off negative suicide factor analysis using the two value for factor loadings ideation positive factor model yielded the wasn’t reported. ideation (PANSI- following results: CFI = Convergent validity was PI) 0.950, RMSEA=0.078. reported these positive and was ranges were moderate negative suicide demonstrated by statistically level. The convergent ideation-negative significant positive validity value must be suicide correlations between total ranging from 0 to +1. If (PANSI-NSI) scores on the positive and the convergent measures respectively. negative suicide ideation- are closely related, the The negative validity (The Convergent validity and suicide ideation (PANSI-NSI) and the Children’s Depression Inventory (CDI) (r=0.61), reported. and of instrument It is each is and the ideation test–retest interval between the first strengthened (Burns and and Grove 2007). test was 4 weeks) the second was carried out. the positive and negative positive Divergent validity was Intra-class ideation (PANSI-PI) and the reported but the ranges correlation Cognitive Triad for Children of coefficients were suicide 178 The of ideation correlations were Journal of Medical Education Fall 2015, Vol. 14, No. 4 Inventory = moderate. The divergent 0.82 and 0.70 for and validity value must be the total scores on ideation ranging from -1 to 0. If the positive and positive ideation (PANSI-PI) the convergent measure negative suicide and the self-control schedule of ideation positive (SCS) (r = 0.46). negatively correlated ideation (PANSI- with measures, PI) and positive 0.65), (CTI-C) the negative (r positive suicide Divergent validity was instrument other is demonstrated by statistically validity for each of the and negative significant instrument suicide ideation- and correlations negative between strengthened (Burns and negative total Grove 2007). ideation (PANSI- Scores on the positive and The negative ideation predictive validity and correlations were positive ideation (PANSI- the score of this study is statistically acceptable. significant at the suicide the PI), the is Children’s Depression Inventory (CDI) (r=-0.52), suicide suicide the process of suicide NSI). All P<0.05 level. negative ideation-negative ideation (PANSI- NSI), the Cognitive Triad for Children Inventory (CTIC) (r=-0.52), and the negative suicide ideationnegative suicide ideation (PANSI-NSI) and the selfcontrol schedule (SCS) (r= 0.30). All correlations were statistically significant at the P<0.01 level. Predictive Validity was measured one year after first-wave study with the Chinese Version Positive and of the Negative Suicide Ideation Inventory (PANSI-C). Logistic regression analysis showed that the total score on the negative suicide 179 Development and Psychometric Evaluation of Scales… / Atashzadeh-Shoorideh et al. ideation-negative suicide ideation (PANSI-NSI) in the first-wave study statistically significantly predicted the attempted- suicide behaviour after 1 year (coefficient = 0.095, P<0.001; CI = 1.05– 1.15). The overall classification rate was good, at 89.4%. The total score of the positive and negative suicide ideation positive ideation (PANSI-PI) in the first-wave study statistically also significantly predicted the attempted suicide behaviour after 1 year (coefficient = _0.084, P<0.05, CI = 0.86–0.99). Author Instrument/s /s Type of validity Criticism of validity Type of Criticism reliability reliability The of Eizenbe Moral Distress Content validity, in this Content validity is an Internal rg MM Questionnaire article was not measured. initial consistency et for Factor was establishing validity (6, measured using 16), that didn’t measure using Cronbach’s indicate exploratory factor analysis in this study. alpha. these scales are and Measuring and reporting three factors the internally of content validity in internal consistent 0.56 to 0.90. The results questionnaire consistency is 16). indicated three developing is necessary above 0.79 (for The factors yielded. The authors and important (16). It is three factors are correlation didn’t report CFI and other recommended 0.851, 0.791 and coefficients were results of factor analysis. determine 0.804). mentioned but it But they mentioned cut-off validity before construct is low (1). It is value. validity. recommended to al (2009) (22) Nurses Clinical analysis examined by all coefficients ranged that Discriminate addition, 180 item-total the from validity: to In provide step Factor in to content analysis exploratory For was by the with factor Stability construct validity of the and examined by use is of 0.70 and above that (15, test–retest the items in second analysis was measured It coefficients increase additional evidence for the reported. alpha was version of this questionnaire. Journal of Medical Education Fall 2015, Vol. 14, No. 4 test-retest questionnaire, a comparison necessary reporting of of was their reliability made between two results but the (The groups (hospital nurses and authors didn’t report CFI interval community clinic nurses), as and the first and the it factor analysis (23). was assumed that would be differences other results of between second test was 1 month). The observed in pressure correlation resulting from different between the two moral dilemmas. To measurements examine these differences, t- was tests (P<0.001), 0.385 for independent samples were conducted. A (P<0.05) statistically 0.535 significant difference was found three and (P<0.01) respectively between means for two of the 0.624 for the three factors. factors relationships and time (For relationship t=2.171 and for time t=2.208). These differences provide further evidence for the discriminant validity of the questionnaire. Author Instrument/s /s Type of validity Criticism of validity Type of Criticism reliability reliability of Liu M Competency Content validity, in this Content validity is an Internal Measuring et Inventory for article and Content Validity initial consistency reliability (2009) Registered Index (CVI) was reported establishing validity (6), reliability (24) Nurses based on the other studies. and it supports construct stability Factor validity (3) that didn’t estimated exploratory factor analysis measure in this study. Cronbach’s alpha was (CFI) <0.8 and RMSEA Measuring and reporting and paired t-test, The >0.08 for the single-factor of content validity in respectively. indicates a high model, and questionnaire Internal degree CFI <0.9 and RMSEA <0.08 developing is necessary consistency stability over a for the three-factor model. and important (16). It is Cronbach’s alpha period of time Confirmatory factor analysis recommended to was 0.90 for the and satisfactory was employed to test the determine content overall scale and degree construct validity of the validity Content 0.71–0.90 homogeneity (8). al Macao in analysis with step and in and were is reported and is acceptable. by for stability of of 181 Development and Psychometric Evaluation of Scales… / Atashzadeh-Shoorideh et al. instrument. loading The value factor across 55 Validity Index (CVI) in subscales. The best interval every Internal time questionnaire items ranged from 0.310 to developing (6, 16). 0.725. Factor A cut-off value of analysis 0.3 for factor loadings was exploratory applied as this is considered analysis to confirmatory indicate statistical significance. consistency between was first and second with 0.74. The interval test in test-retest factor between the first is 2-4 weeks (5, and and 16). factor test analysis was measured the second didn’t reported. It is recommended to and reported. report of interval between two tests. Author Instrument/s /s Type of validity Criticism of validity Type of reliability Criticism of reliability Zisberg Scale of Older Content validity: In this Content A, Adults’ study, items were generated Measuring and reporting correlation should be Young Routine on the basis of a literature of content validity in coefficient considered as HM & (SOAR) review questionnaire developing statistics and then Schepp systematically tested for K content validity. Then, the (2009) (25) necessary and used important (16, 19). ICC were to test scores reliability indices in four reliability at the groups instrument’s content validity item level of the estimate was rated on the basis of the continuous scores high of >0.80), levels: (ICC instrument’s item relevance Convergent validity was as to older adult routine in the reported subscale pilot sample. The relevance, ranges were moderate Across all types <ICC<0.80), clarity, simplicity based on level. In this study, the of moderate (0.41< Content Validity authors (50%) ICC <0.60) and (CVI) items consistently poor to fair (ICC presented <0.40) (26). moderate to high Kappa coefficient Index weren’t and these reported convergent the validity between reported. Convergent validity: In _1 to 0. But the well as scores. scores, 21 substantial (0.60 order to test the convergent convergent validity test–retest validity of SOAR, the mean value must be ranging reliability deviation scores the from 0 to +1. If the >0.41). Six items subscale level correlated convergent measures are (14.3%) presented functional closely related, poor reliability on The indicators (ADL and IADL). validity of all correlation The ADL score was found to instrument be negatively correlated with strengthened (5). with 182 is validity: Intra-class the on the each is four (ICC scores almost is perfect (27). test–retest (ICC <0.40). coefficients were These items were mentioned but, it the consistency of time spent shopping, passive is low (1). (mean deviation score for transportation, Reliability for Journal of Medical Education Fall 2015, Vol. 14, No. 4 duration) on each basic and and medical overall scale rest activity (r = -0.41, -0.34; treatment, wasn’t P < 0.01 respectively), as attending mentioned. well as with the consistency concerts/movies/s of total time spent on basic ports and rest activities (mean participating deviation score for total group duration, r = 00.56, -0.33; P and taking care of < 0.01 respectively). an older person. events, in activities On the subscale level, over 73% of the scores showed high to substantial reliability and none showed poor reliability. Kappa coefficients was done for nominal variables and it was over 0.75 % of (item agreement = 88.4%–100%). Only16.6% had kappa coefficients in the low range (j < 0.40). Test–retest reliability for subscales is 0.46 to 0.85. interval The between the first and the second test didn’t report. Author Instrument/s /s Pelande Child Care Type of validity Criticism of validity Content validity: In this Content validity: Type A of Criticism reliability reliability Internal A correlation 183 of Development and Psychometric Evaluation of Scales… / Atashzadeh-Shoorideh et al. r T, Quality at study, following a literature scale-level CVI of %75 consistency or higher is acceptable. using between The reporting of content Cronbach’s alpha and validity index must be was 0.373–0.812 desirable, Leino- Hospital review Kilpi H (CCQH) drawings & instrument children and interviews/ by hospitalized by Coefficient 0.90 Katajist (n=40), were based on percent (3, 16). for subscales, but 0.70 o designed and an expert panel Factor analysis didn’t for acceptable (2009) (n=7) the report obviously. The scale (28) instrument’s content process of it should be report. clear. The alpha values J the assessed validity. To judge the the overall didn’t showed a subcategories on a scale tendency to from increase one to four for is but is for new instruments (29). validity of the items and during Correlations for relevance and clarity; to the course of the item-totals and indicate instrument inter-item were whether (yes/no) a or not subcategory development main category; whether or categories: not nursing subcategories or characteristics increasing more subcategory quality and whether or not there was any from overlap 0.557, between the main reported. all the the for belonged to a particular measured in 0.383 to nursing certain items, especially in the activities least relevant subcategories 0.763 to 0.809, improve were 0.38 and 0.67, so these and reliability (30). items deleted. environment from The least from Combining different subcategories. The nursing clarity of subcategories was 0.584 to 0.761. 0.65 and 0.69, whereas the Item-to-total level of agreement for all correlations were other subcategories was over calculated for the 0.90. Level of agreement various among nurses was over 0.95 subcategories for subcategories nursing activities measuring quality, except and environment for appearance (0.37), sense and for the main of and category of nurse In the characteristics. assessments, the Item-to-total all humour humanity nurses’ 184 items 0.80 (0.69) (0.93). in subcategories of humanity correlations (0.31), ranged from 0.062 caring and subcategory, can the Journal of Medical Education Fall 2015, Vol. 14, No. 4 communication (0.31), and to education (0.31) showed the lowest greatest overlap with other total correlations subcategories. were obtained for factor The analysis 0.611. The item-to- of the subcategories was assessed by of physical care using principal component and treatment, and analysis to measure the level entertainment. of congruence of empirical The items ‘takes results main account of child’s nursing food preferences’ CCQH with categories the of activities and environment. and No component relief for pain’ analysis was carried out for were the most the main category of nurse problematic. characteristics. These items were, principal ‘provides however, not deleted from the instrument as their contents are crucial in this context. Author Instrument/s /s Type of validity Criticism of validity Type of Criticism reliability reliability of Carlson Carlson’s Content validity was done The reporting of content Cronbach’s alpha The C Prior by reviewing literature and validity in this study is coefficients were coefficients (2008) Conditions theoretical acceptable. measured. 0.70 and above (31) Instruments was The reporting of content instrument indicate validity index must be demonstrated these scales are (CPCIs), to definition supported review by and through experts. The Each alpha of that assess the four average of CVI based on percent (3, 16). internal internally theoretically- scores for relevancy of all Rattray and Jones consistency (alpha consistent derived items within each instrument suggest that a KMO range= 19). conditions of were (The greater than 0.5 supports 0.825). Previous average of CVI scores of all a factor analysis, and retest reliability felt items within each instrument that anything less than needs needs/problem were 1.0 for the Previous 0.5 confirmed s, Practice Instrument, 0.79 for amenable innovativeness the factor analysis. So, this practice, prior 0.79 Felt to 1.0 Needs/Problems is probably to not useful 0.731– (16, In addition, test– to assess be to the stability of the 185 Development and Psychometric Evaluation of Scales… / Atashzadeh-Shoorideh et al. and norms of Instrument, 0.94 for the KMO the Innovativeness acceptable. social system that Instrument, measure is Inter-item measures correlations and 0.98 for the Norms of For influence the accurate nurses’ Instrument). another type of construct analysis The clarity, simplicity based validity such as internal adopt on Content Validity Index predictive validity is consistency evidence- (CVI) needed (6). decisions to based pain Social System items was not achieving more instrument, are time (6). between 0.2 and 0.7. After item for reliability, reported. over the Previous Practice management Factor analysis practices. examined through principal reduced components factor analysis items, with varimax rotation and Needs/ Problems reported for each factor of Instrument to 14 instruments. items, was Instrument Factors were was to the 13 Felt the established using the Kaiser Innovativeness rationale Instrument to nine by retaining eigenvalues over 1.0. items, and the To establish salient factors, Norms of the the items with correlations Social above 0.3 on more than one Instrument to nine factor were deleted, as they items. were The were 0.825, 0.76, Kaiser–Meyer Olkin (KMO) 0.731 and 0.775 measure of sample adequacy respectively. repetitious. System Alphas was then determined. The KMOs of Carlson’s Prior Conditions Instruments (CPCIs) ranged from 0.655 to 0.841. Author Instrument/s /s Type of validity Content validity is an Internal reliability The initial was estimated by consistency establishing validity (6), calculating Cronbach’s Factor analysis: exploratory and it supports construct Cronbach’s alpha alpha factor validity (3) that didn’t coefficient for the The confirmatory factor analysis measure in this study. scale(s) coefficients was done. Construct validity Factor analysis: from the analysis R Coping validity was not measured. Efficacy article, for (2008) Nurses (32) (OCSE-N) 186 this content In al Scale Criticism reliability Occupational et of reliability Pisanti Self- Type Criticism of validity analysis, and step in with the derived of Internal reported. alpha of 0.7 and above Journal of Medical Education with Fall 2015, Vol. 14, No. 4 exploratory factor exploratory analysis is (CFI) <0.75and analysis RMSEA >0.15 for the first confirmatory model, and CFI <0.92 and RMSEA <0.08 for the second model. Concurrent assessed validity by correlations was estimating between the factor and indicate whether these scales are internally analysis was measured Cronbach’s alpha. consistent and Cronbach’s alpha 18). acceptable. reliability In Criterion validity was done reported carefully. subscales It is item every that increased reported. factor and by checking were for addition, two reliability (For as such test-retest ‘CSE to manage needs general confirmed nursing (16, to be to Occupational Coping Self- burden’ alpha = assess Efficacy for Nurses Scale 0.77; stability of the (OCSE-N) dimensions and ‘CSE to manage measures two the time (6). external criteria: and for relational Maslach Burnout Inventory difficulties in the (MBI) workplace’, alpha dimensions coping and dimensions. Pearson’s the over = 0.79). correlation coefficients between the Occupational Coping SelfEfficacy for Nurses Scale (OCSE-N) dimensions and both the Maslach Burnout Inventory (MBI) variables and Coping Inventory for Stressful Situations – Short Version (CISS-SV) dimensions were all statistically significant. The OCSE-N dimensions were positively associated with task coping strategies (r = 0.07 to 0.08, P < 0.05) and negatively associated with both emotion-focused and avoidant strategies (r = _0.09 to _0.08, P < 0.01). The OCSE-N Scales also correlated with the burnout 187 Development and Psychometric Evaluation of Scales… / Atashzadeh-Shoorideh et al. dimensions. They were negatively correlated with both emotional exhaustion (r= _0.31 to _0.21, P < 0.01) and depersonalization (r = _0.25 to _0.19, P < 0.01), and positively associated with personal accomplishment (r = 0.21 to 0.22, P < 0.01). These patterns of correlations support the construct validity of the Occupational Coping Self-Efficacy for Nurses Scale (OCSE-N). Author Instrument/s /s Type of validity Content study, of Criticism reliability reliability Content validity is an Cronbach’s alpha The initial coefficient coefficients of Barnes Perceived In C.R.& Maternal validity Adams Parenting reviewing literature and establishing validity, but used to calculate 0.70 and above on- Self-Efficacy theoretical definition and the best method in this internal indicate Maced (PMP was regard consistency these scales are o E.N. instrument S-E) this Type Criticism of validity was done supported by through step is in Content was alpha review by participants in a Validity Index (14), that reliability internally (2007) pilot study. didn’t measure in this estimates for the consistent (33) Factor study. Perceived 18). measured and cut-off value Construct validity with Maternal of 0.3 for factor loadings exploratory Parenting was analysis was measured Efficacy (PMP S- and E) analysis applied considered statistical as to was this is indicate significance. reported. It is instrument; The correlation Factor 1 had an Eigen value their acceptable coefficient of 8.235 and explained 41% authors didn’t report CFI (0.91). of the variance, factor 2 had and internal an Eigen value of 1.496 and factor analysis (23). consistency explained In addition, cut-off point reliability variance, factor 3 had an is low. estimates for each Eigen value of 1.314 and Divergent validity was of the subscales explained reported but the ranges were of acceptable 7.48% 6.57% of of the the other but the results correlations of were (16, test–retest this reached an results that Self- necessary reporting of variance, and factor 4 had an 188 factor of level The mentioned and it is acceptable. also was Journal of Medical Education Fall 2015, Vol. 14, No. 4 Eigen value 0.255 moderate. The divergent [subscale 1 (0.74), explaining 6.27% of the validity value must be 2 (0.89), 3 (0.74) variance. ranging from -1 to 0. If and 4 (0.72)]. In Divergent Validity by using the convergent measure addition, the of is whole correlation Maternal of Self-Report instrument item- Inventory was rs = 0.4 (P < negatively correlated revealed that all 0.05) and using the Maternal with measures, items Postnatal Attachment Scale validity for each of the statistically was rs = 0.31, (P< 0.01). instrument significantly with other is strengthened (4). correlated total scores (ranging from 0.30–0.77). test–retest The (The interval between the first and the second test was 10 days) correlation coefficients was 0.96. Author Instrument/s /s Type of validity Criticism of validity Type of Criticism reliability reliability The of Van Work-Related Survey of the literature and Content validity is an Internal Laar, D Quality of Life qualitative expert reviews initial consistency et (WRQoL) were used to assess the establishing validity, but using Cronbach’s content the best method in this Cronbach’s alpha alpha regard Content was 0.75–0.86 for The analysis, Validity Index (14), that subscales, and for coefficients exploratory factor analysis didn’t measure in this the overall scale 0.7 and above and factor study. was 0.96. indicate analysis were done. A cut- Factor off value of 0.5 for factor exploratory loadings was applied. By analysis using confirmatory al. (2007) scale for (34) healthcare measure. workers For validity factor Confirmatory Split-half of the factor step is in analysis with by Internal consistency reported. alpha that these scales are factor internally and consistent factor of (16, 18). analysis for the full data, a analysis was measured In addition, other first data set with 481 cases and type of reliability to be used in the exploratory criterion for establishing such step (hereafter referred to as model fit via goodness retest needs to be data set EXPLORE), and a of fit indices statistics confirmed reported. The as test- 189 to Development and Psychometric Evaluation of Scales… / Atashzadeh-Shoorideh et al. second data set with 472 generally suggest that assess cases to be used in the values around 0.90 are stability of the confirmatory acceptable and values measures (hereafter referred to as data >0.90 time (6). set A considered good fit for principal the CFI, GFI and the analysis (CONFIRM). preliminary higher are component analysis (PCA) NFI (35). was carried out on the Values < 0.05 for the WRQoL RMSEA indicate a close set. EXPLORE Twelve data components fit whereas values with eigenvalues above 1.0 between 0.05 and 0.10 were generated. Using this represent procedure, 34 items were mediocre fit (36). removed, leaving 24 items, which together represented six factors [Factor 1: Job and Career Satisfaction (JCS) contained six items, Factor 2: General Well-Being (GWB) also contained six questions, Factor 3: Home– Work Interface (HWI) reflected three items, Factor 4: Stress at Work (SAW) was represented by two items, Factor 5: Control at Work (CAW): Three items loaded on component five, Factor 6: Conditions Working (WCS) with three items]. Confirmatory factor analysis was conducted remaining 23 on the items and support was found for the model in the CONFIRM data set (P < 0.01, CFI = 0.93, GFI = 0.90, NFI = 0.89 and RMSEA = 0.06). 190 or adequate to the over Journal of Medical Education Author Instrument/s /s Fall 2015, Vol. 14, No. 4 Type of validity Type Criticism of validity validity of Criticism of reliability reliability Measuring and reporting The reliability of The each resultant consistency was Cronbach’s Otieno An instrument Content O.G et to addressed by basing the content al measures items on previous surveys questionnaire factor (2007) nurses’ use, and the developing is necessary computed (37) quality and instrument by a panel of and important (16, 19). Cronbach’s alpha The satisfaction nurses In coefficient. coefficients with nursing informatics. Electronic Factor Medical was reviewing experienced analysis, in validity this reporting in study of the content Criteria using were reported. alpha of 0.7 and above based study was examined. A cut- But CVI didn’t report. Cronbach’s alpha these scales are Record (EMR) off value of 0.4 for factor Factor analysis: coefficients ≥0.7 internally systems loadings was applied. Factor Exploratory factor within a construct consistent analysis analysis was measured and 16). and correlation subscales in three use of reported. It is on alpha validity is acceptable. revealed this in Internal item–total Electronic Medical Record acceptable. within (EMR) scale. Also factor Concurrent validity was subscales. analysis revealed two measured but the degree Items subscales in of of correlation was not deleted mentioned. necessary ‘quality ≥0.4 the indicate In that addition, reliability needs to be confirmed were where to assess measures achieve an alpha time (6). value of at least In satisfaction’ are determined 0.7. validity by factor analysis. In (EMR)’ and three-factor subscales in Concurrent ‘user validity this study, the stability of the to Electronic Medical Record (15, this over study, and reliability of the overall instrument calculating Cronbach’s alpha reported coefficients coefficient didn’t together. between the scales of the mention, recommended instrument and the global reported for each reporting measure. Criterion-related subscale. Three validity validity was not addressed subscales with explicitly in low However, the assessed by correlation this was study. degree of but Cronbach’s were scores of the two subscales from of EMR use (Nursing Care instrument. Management between and It of reliability will be separated. removed the is and alpha coefficient the correlation it was final Order Entry); two subscales of quality of EMR (Information 191 Development and Psychometric Evaluation of Scales… / Atashzadeh-Shoorideh et al. Quality and Service Quality) and one subscale of user satisfaction (Impact of EMR systems on Clinical Care) revealed in all cases. Author Instrument/s /s Type of validity Type Criticism of validity validity of Criticism of reliability reliability the Cronbach’s alpha The reliability is content was computed to reported correctly. FU Adapted Content M.R., Symptom Symptom reporting McDan Distress scale: Experience Index (SEI) was validity is acceptable. measure iel The Symptom ensured But CVI didn’t report. consistency. stability R.W. & Experience medical-surgical The validity of this study Correlation indicates a high Rhodes Index (SEI) oncology patients in the isn’t complete. analysis for the degree stability over a by of 15 the general and In this study, of internal experience The of V.A. study who had tested the total (2007) reliability and validity of the revealed period of time (38) Adapted Symptom Distress Cronbach’s alpha and satisfactory Scale version 2 (ASDS-2). 0.91; for total degree In addition, content validity occurrence 0.85; homogeneity (8). of the SEI is supported by Construct validity was for total distress inclusion of symptoms that measured 0.84. have multiple been identified by through comparisons. Reliability for the subscales For test-retest patients in other studies as But, factor analysis can was estimated procedures, well as those perceived by be an using Cronbach second patients with cancer in a exploratory or alpha for administration series of the investigators’ confirmatory technique subscale: generally studies. to respiratory (0.8), recommended cognitive about 2–14 days Construct authors validity: used The multiple comparisons (with KruskalWallis test) construct to estimate validity by used as estimate underlying the dimensions each (0.79), is eating/gastrointest after items in an instrument. inal (39). Because of (0.73), the the or to reduce redundant first pain/discomfort the attributes of (0.76), the phenomena determining statistically neurological being measured significant differences (0.78), (symptom between pairs of contrasting fatigue/sleep/restl occurrence groups. essness distress), only healthy adult (0.81), eliminations (0.74) 192 of and and participants were appearance (0.77). asked To measure the complete to the Journal of Medical Education Fall 2015, Vol. 14, No. 4 stability of the Symptom SEI, a test-retest Experience method Index (SEI) (during two different during periods of different periods 2–4 two hours apart) was of used apart. This time with healthy 63 adult 2–4 hours lapse was participants. sufficient Intra-class avert correlation participants’ coefficients were recall calculated to previous test- response estimate retest reliability. of absence of flu symptoms) were to correlated total for symptom their (i.e. Test-retest scores strongly to preclude activities onset and (i.e. of flu experience symptoms after 2 (r=0.93), weeks) that may occurrence have affected the (r=0.94) and distress (r=0.92). stability of the characteristic (symptom experience) being measured (40). addition, it affects implications of research findings to the population under study (19). In this study, review of 12 articles that published in the Journal of Advanced Nursing, 2007-2009, showed that psychometric properties did not present, since from 12 articles only 2 of the articles documented validity completely, and 5 of the articles reported incomplete content validity and 5 of them did not measured it. In regard to measuring construct validity, factor analysis is a useful method. From 12 articles that reviewed, 4 articles measured factor analysis completely, 4 of them measured or reported incomplete and 4 of the articles did not measure it. In regard to other type of validity, from 12 articles, only one article measured concurrent validity, one article measured discriminate validity, one article measured divergent validity and one article measured convergent validity. As stated before, measuring 3 types of validity for new developed instruments is necessary. Therefore, measuring validity to determine the appropriateness of an instrument should be for a special group. The 193 Development and Psychometric Evaluation of Scales… / Atashzadeh-Shoorideh et al. findings showed that some of the articles did not measure psychometric properties properly, thus some of the developed scales need to measure other types of necessary validity. In addition, reliability testing needs to be performed on each instrument used in a study before other statistical analysis are performed. From 12 articles, all of the articles measured and reported Cronbach’s alpha and test-retest, but 4 of them did not measure test-retest. Conclusion It can be concluded that although researchers put a great emphasis on methodology and statistical analysis, they pay less attention to the psychometric properties of their new instruments. The authors of this article hope to draw the attention of researcher to the importance of measuring psychometric properties of new instruments. Acknowledgements The authors would like to thank Dr. Zagheri Tafreshi for commenting on a draft of this paper. Her feedback was much appreciated References 1. Houser J. Nursing Research: Reading, Using, and Creating Evidence. 2nd Ed. Sudbury, Jones and Bartlett Publishers; 2011. 2. Watson R, Hugh McK, Seamus C, John K. Nursing Research: Designs and Methods. Elsevier Health Sciences; 2008. 3. Yaghmaie F. Subjective computer training: Development of a scale. Journal of Medical Education. 2004; 5(1):33-7. 4. Burns N, Grove SK. Practice of Nursing Research, Conduct, Critique and Utilization. 6th Ed. Philadelphia: Saunders Co; 2009. 5. Burns N, Grove SK. Understanding Nursing Research, Building an Evidence-Based Practice. 6th Ed. Philadelphia: Saunders Co; 2014. 6. Rattray J, Jones MC. Essential elements of questionnaire design and development. Journal of Clinical Nursing. 2007; 16(2):234–43. 7. Bush CT. Nursing research. Virginia: Reston Publishing Co; 1985. 204 8. Polit DF, Hungler BP. Nursing Research Principles and Methods. 7th Ed. Philadelphia: Lippincott Co; 2004. 9. Dempsy PA, Dempsy AD. Using Nursing Research, Process, Critical Evaluation and Utilization. 5th Ed. Philadelphia: Lippincott Co.; 2000. 10. Kerlinger FN. Foundations of behavioral research. 3rd Ed. New York: CBS Publishing; 1986. 11. Yaghmaie F. Content validity and its estimation. Journal of Medical Education. 2003; 3(1):25-7. 12. LoBiondo-Wood G, Haber J. Nursing Research; Methods, Critical Appraisal, and Utilization. 8th Ed. St. Louis: Mosby-Elsevier; 2014. 13. Polit DF, Beck CT. Essentials of Nursing Research, Methods, Appraisal and Utilization. 6th Ed. Philadelphia: Lippincott Williams & Wilkins; 2006. 14. Nunnally JC, Bernstein IH. Psychometric Theory. 3rd Ed. New York: McGraw Hill Inc; 1994. 15. Yaghmaie F. Development of a scale for measuring user computer experience. Journal of Research Nursing. 2007; 12(2):185-90. 16. Yaghmaei F. Measuring Behavior in Research by Valid and Reliable Instruments. 2nd Ed. Tehran: Shahid Beheshti Medical University Publishing; 2009. (Persian) 17. Clark LA, Watson D. Constructing validity: Basic issues in objective scale development. Psychological Assessment. 1995; 7(9):309–19. 18. Yaghmaie F. Reliability and its measurement in quantitative studies. Journal of Faculty of Nursing & Midwifery, Shahid Beheshti University of Medical Sciences. 2003; 13(42):22-7. (Persian) 19. Zagheri-Tafreshi M, Yaghmaie F. Factor analysis of construct validity: A review of nursing articles. Journal of Medical Education. 2007; 10(1):19-26. 20. Ushiro R. Nurse–Physician Collaboration Scale: development and psychometric testing. Journal of Advanced Nursing. 2009; 65(7):1497– 508. 21. Chang HJ, et al. Chinese version of the positive and negative suicide ideation: Instrument development. Journal of Advanced Nursing. 2009; 65(7):1485–96. 22. Eizenberg MM, et al. Moral distress questionnaire for clinical nurses: instrument development. Journal of Advanced Nursing. 2009; 65(4):885–92. Journal of Medical Education 23. Munro BH. Statistical Methods for Health Care Research. 5th edition. Philadelphia: Lippincott Williams and Wilkins; 2005. 24. Liu M, Yin L, Ma E, Lo S, Zeng L. Competency Inventory for Registered Nurses in Macao: instrument validation. Journal of Advanced Nursing. 2009; 65(4):893–900. 25. Zisberg A, Young HM, Schepp K. Development and psychometric testing of the Scale of Older Adults’ Routine. Journal of Advanced Nursing. 2009; 65(3):672–83. 26. Woods-Dauphinee S, Berg K, Daley K. Monitoring status and evaluating outcomes: An overview of rating scales for the use with patients who have sustained a stroke. Topics in Geriatric Rehabilitation. 1994; 10(2):22–41. 27. Cyr L, Francis K. Measures of clinical for nominal and categorical data: The kappa coefficient. Computers in Biology and Medicine. 1992; 22(4):239-46. 28. Pelander T, Leino-Kilpi H, Katajisto J. The quality of pediatric nursing care: developing the Child Care Quality at Hospital instrument for children. Journal of Advanced Nursing. 2009; 65(2):443–53. 29. Stotts NA, Aldrich KM. How to try this defines your terms, Evaluating instruments for use in nursing practice. Advanced Journal of Nursing. 2007; 107(10):71-2. 30. Ferketich S. Focus on psychometrics: Aspects of item analysis. Research in Nursing & Health. 1991; 14(2); 165–8. 31. Carlson C. Development and testing of four instruments to assess prior conditions that influence nurses’ adoption of evidence-based pain management practices. Journal of Advanced Nursing. 2008; 64(6):632–43. Fall 2015, Vol. 14, No. 4 32. Pisanti R, Lombardo C, Lucidi F, Lazzari D, Bertini M. Development and validation of a brief Occupational Coping Self-Efficacy Questionnaire for Nurses. Journal of Advanced Nursing. 2008; 62(2):238–47. 33. Barnes CR, Adamson-Macedo EN. Perceived Maternal Parenting Self-Efficacy (PMP S-E) instrument: development and validation with mothers of hospitalized preterm neonates. Journal of Advanced Nursing. 2007; 60(5):550–60. 34. Van Laar D, Edwards J A, Easton S. The Workrelated quality of life scale for healthcare workers. Journal of Advanced Nursing. 2007; 60(3):325–33. 35. Bentler PM, Bonett DG. Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin. 1980; 88(3):588–606. 36. Browne MW, Cudeck R. Alternative Ways of Assessing Model Fit in Testing Structural Equation Models. Sage, Newbury Park, CA, USA; 1993. 37. Otieno OG, Toyama H, Asonuma M. Nurses’ views on the use, quality and user satisfaction with electronic medical records: questionnaire development. Journal of Advanced Nursing. 2007; 60(2):209–19. 38. Fu M, McDaniel RW, Rhodes VA. Measuring symptom occurrence and symptom distress: development of the symptom experience index. Journal of Advanced Nursing. 2007; 59(6):623–34. 39. Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use. 5th edition. Oxford University Press Inc.: New York; 2015. 40. Fu MR, Rhodes VA, Xu B. The Chinese translation: The index of nausea, vomiting, and retching (INVR). Cancer Nursing. 2002; 25(2):134–40. 205

Log In

Development and Psychometric Evaluation of Scales: A Survey of Published Articles

Related papers

Related papers

Related topics