(Cohen) Psych Assessment Reviewer
(Cohen) Psych Assessment Reviewer
(Cohen) Psych Assessment Reviewer
Testing
-used to refer to everything from the administration of a test to the interpretation of scores
Psychological Testing
-the process of measuring psychology-related variables by means of devices or procedures
designed to obtain a sample of behavior
Psychological Assessment
-the gathering and integration of psychology-related data for the purpose of making a
psychological evaluation that is accomplished through the use of tools such as tests, interviews,
case studies, behavioral observations and specially designed apparatuses and measurement
procedures
Process of Assessment:
Step 1: Assessment begins with a referral
Optional: The assessor may meet with the assesse or others before the formal assessment in
order to clarify aspects of the reason of referral
Step 2: The assessor prepares for the assessment by selecting the tools of assessment
Step 4: After the assessment, the assessor will write a report of the findings that is designed to
answer the referral question
Optional: Feedback sessions
Approaches in Assessment:
Collaborative Psychological Assessment
-assessor and assesse work as partners
-aka therapeutic psychological assessment
Dynamic Assessment
-interactive, changing or varying in nature
-usually employed in education setting
1.) Test
-a measurement device or technique used to quantify behavior or aid in the understanding and
prediction of behavior
-Item –a specific stimulus to which a person responds overtly; this response can be scored or
evaluated
-Types:
A. Ability Tests
1. Achievement –previous learning
2. Aptitude – the potential for learning or acquiring a specific skill
3. Intelligence – a person’s general potential to solve problems, adapt to changing
circumstances, think abstractly and profit from experience
B. Personality Tests –related to the overt and covert dispositions of the individual
1. Structured –provides a self-report statement to which the person responds “True” or “False”,
“Yes” or “No”
2. Projective –provides an ambiguous test stimulus; response requirements are unclear
2.) Interview
-face to face
-taking note of both verbal and non-verbal behavior
-taking note of the way that the interviewee is dressed
-however, face to face contact is not always possible and interviews may be conducted
in other formats
1. Test Developer
-create test or other methods of assessment
-bring a wide array of backgrounds and interest
2. Test User
-clinicians, counselors, school psychologists, human resources personnel, consumer
psychologists, experimental psychologists and social psychologists
-the one who will conduct tests
3. Test Taker
-the subject of an assessment or an observation
-aka assesse
-psychological autopsy –reconstruction of deceased individual’s psychological profile
4. Society at large
-as society evolves and as the need to measure different psychological variables emerges, test
developers respond by devising new tests
5. Other parties
-people who solely responsibility is the marketing and sales of test
-academicians who review test and evaluate their psychometric soundness
1. Educational Setting
-help identify children who may have special needs
-provides achievement test, which evaluated accomplishment or the degree of learning
has taken place
-Informal Evaluation –typically non-systematic assessment that leads to the formation of
an opinion or attitude
2. Clinical Setting
-help screen to diagnose behavior problems
-intelligence test, personality tests, etc.
-individualized
-group testing is used primarily for screening
3. Counseling
-ultimate objective is the improvement of the assesse in terms of adjustment, productivity
or some related variable
4. Geriatric
-for old age
-ultimate goal is to provide good quality of life
5. Business/Military
-decision-making about the careers of personnel
7. Other Settings
-court trials
-health psychology
Reference Sources:
Test Catalogues
-most readily accessible
-usually contain only a brief description of the test and seldom contain the kind of detailed
technical information
-the catalogue’s objective is to sell the test
Test Manuals
-detailed information concerning the development of test
-contains technical information
-requires credential before purchasing
Reference Volumes
-updated periodically
-provides detailed information for each test listed
Journal Articles
-may contain review of the test, updated or independent studies of its psychometric soundness
Online Database
-online website
Other Sources
-school library contains a number of other sources that may be used to acquire information
about tests and test-related topics
CHAPTER 2: HISTORICAL, CULTURAL AND LEGAL/ETHICAL CONSIDERATIONS
Early Antecedents
China
-Test and testing programs first came into China as early as 2200BC
-The purpose is in the means of selecting who, of many applicants which obtain government jobs
Song Dynasty
-emphasis was placed on knowledge of classical literature
Ancient Greco-Roman
-Attempts to categorize people’s personality types in terms of bodily fluid
Charles Darwin
-“Higher forms of life evolved partially because of differences among individual forms of life
within a species”
-“Those with the best or most adaptive characteristics survive at the expense of those who are
less fit and that the survivors pass their characteristics on to the next generation”
Francis Galton
-Classify people according to their natural gifts and to ascertain their deviation from an average
-Pioneered the use of a statistical concept central to psychological experimentation and testing:
the coefficient of correlation
Wilhelm Wundt
-First experimental psychology laboratory, founded at the University of Leipzig in Germany
Students of Wundt
Charles Spearman
-originating the concept of test reliability
-building the mathematical framework for the statistical technique of factor analysis
Victor Henri
-suggest how mental test could be used to measure higher mental process
Emil Kraeplin
-word association techniques as a formal test
Lightner Witmer
-little known writer of clinical psychology and school psychology
1895 – Binet and Henri published several articles in which they argued for the measurement of
abilities such as memory and social comprehension
1905 – Binet and Simon published 30-item measuring scale of intelligence designed to help
identify mentally retarded Paris school children
World War 1 – Group Intelligence tests came into being in the United States in response to the
military’s need
World War 1
-Robert Woodworth developed Personal Data Sheet (measure of adjustment and emotional
stability)
-Woodworth Psychoneurotic Inventory (first widely self-report test of personality)
Projective Test
-an individual is assumed to project into some ambiguous stimulus his or her own unique needs,
fears, hopes and motivation
-Rorschach Inkblots (best known projective test)
Culture
-the socially transmitted behavior patterns, beliefs and products of work of a particular
population, community or group of people
Henry Goddard
-used interpreters in test administrator, employed a bilingual psychologist and administered
mental tests to selected immigrants who appeared mentally-retarded
-written excessively on the genetic nature of mental deficiency, but he did not summarily
conclude that these findings were the result of hereditary
Verbal Communication
-the examiner and the examinee must speak the same language
Level A: Tests or aids that can adequately be administered, scored and interpreted with the aid
of the manual and a general orientation to the kind of institution or organization in which one is
working (for instance, achievement or proficiency tests)
Level B: Tests or aids that require some technical knowledge of test construction and use and of
supporting psychological and educational fields such as statistics, individual differences,
psychology of adjustment, personnel psychology and guidance (e.g., aptitude tests and
adjustment inventories applicable to normal populations).
Level C: Tests and aids that require substantial understanding of testing and supporting
psychological fields together with supervised experience in the use of these devices (for
instance, projective tests, individual mental tests).
Scales of Measurement
Measurement
-act of assigning numbers or symbols to characteristics of things according to rules
Scales
-set of numbers whose properties model empirical properties of the objects to which the
numbers are assigned
-Continuous Scale –measures continuous variable
-Discrete –categorization has no much meaning
Properties of Scales
1. Magnitude –the property of “moreness”
2. Equal Intervals –the difference between two points at any place on the scale has the
same meaning as the difference between two other points that differ by the same
number of scale units
3. Absolute Zero –obtained when nothing of the property being measured exists
Types/Levels of Scales
1. Nominal Scales –with classification or categorization based on one or more
distinguishing characteristics
2. Ordinal Scales –with classification and ranking or ordering
3. Interval Scales –with classification, ranking and equal intervals
4. Ratio Scales –all math operations can be meaningfully performed; has absolute zero
Describing Data
Distributions
-a set of test scores arrayed for recording or study
Raw Score
-straightforward, unmodified accounting of performance that is usually numerical
Frequency Distributions
-displays scores on a variable or a measure to reflect how frequently each value was obtained
Graphic Form
1. Histogram –a graph with vertical lines drawn at the true limits of each test score forming
a series of contiguous rectangles
2. Bar Graph –numbers indicative of frequency appear of Y-axis; categorization in X-axis
3. Frequency Polygon –expressed by a continuous line connecting the points where test
scores or class intervals meet frequencies
Percentile Ranks
-answers the question, “What percent of the scores fall below a particular score (Xi)?”
Percentiles
-the specific scores or points within a distribution
-divide the total frequency for a set of observations into hundredths
Measures of Central Tendency
-a statistic that indicates the average or midmost score between the extreme scores in a
distribution
Mean
-most commonly used
-most appropriate measure of central tendency for interval and ratio data
Media
-middle score in distribution
-most appropriate for ordinal, interval and ratio data
-useful when few scores fall at the high end or relatively few scores at the low end
Mode
-most frequently occurring score
-appropriate in nominal data
-not commonly used
-is useful in analysis of a qualitative or verbal nature
Measures of Variability
Variability
-an indication of how scores in a distribution are scattered or dispersed
● Standard Deviation
-a measure of variability equal to the square root of the average squared deviations
about the mean
-is equal to the square root of the variance
-variance –equal to the arithmetic mean of the squares of the differences between the
scores in a distribution and their mean
Skewness
-symmetry is absent
-presence or absence of symmetry in a distribution is simply one characteristic by which a
distribution can be described
Kurtosis
-the steepness of a distribution in its center
-platy –flat
-lepto –peaked
-meso –middle
Standard Scores
-a raw score that has been converted from one scale to another scale, where the latter scale
has some arbitrarily set mean and standard deviation
-more easily interpretable than raw scores
● Z-scores
-mean = 0 ; SD = 1
-is equal to the difference between a particular raw score and the mean divided by
standard deviation
● T-scores / McCall’s T
-mean = 50 ; SD = 10
-devised by W.A. McCall
-named a T-score in honor of his professor E.L. Throndike
-none of the scores is negative
● Stanine
-mean = 5 ; SD = 2
● Sten
-mean = 5.5 ; SD = 2
● Deviation IQ
-mean = 100 ; SD = 15
Correlation
-an expression of the degree and direction of correspondence between two things
-degree (weak-strong)
-direction (positive, negative, no correlation)
-linear relationship
-only two variables
-numerical in nature
-no causation but can predict
● Pearson R
-most widely used
-also known as the Pearson correlation coefficient and the Pearson product-moment
coefficient of correlation
-used when variables are linear and continuous
-pearson and z-score are correlated because both are concerned with the location of an
individual in a distribution
-the smaller the p-value, the more significant the relationship
-larger correlation, means more related to each other
-coefficient of determinism (r2)
-an indication of how much variance is shared by the X and Y variables
-evaluated the strength of relationship
● Spearman Rho
-one commonly used alternative statistic
-also known as rank-order correlation coefficient or rank-difference correlation coefficient
-used when small sample size (fewer than 30 pairs) and ordinal data
● Point-biserial correlation
-relationship when one of the variables are dichotomous and the other is continuous
● Phi-coefficient
-used when both variables are dichotomous
Regression
-a reversion to the mean over time or generations
-analysis of relationships among variables of understanding how one variable may predict other
(X) IV – Predictor Variable
(Y) DV – Outcome Variable
Multiple Regression
-the use of more than one score to predict Y
-more predictors are not necessarily better
Meta-Analysis
-analysis of data from several studies
-a family of techniques used to statistically combine information across studies to produce single
estimates of the data under study
-more weight can be given to studies that have larger numbers of subjects
-advantages:
a. meta-analyses can be replicated
b. the conclusions of meta-analyses tend to be more reliable and precise than the conclusions
from single studies
c. there is more focus on effect size rather than statistical significance alone
d. meta-analysis promotes evidence-based practice, which may defined as professional practice
that is based on clinical and research findings
CHAPTER 4: OF TEST AND TESTING
Assumptions:
1. Psychological traits and states exist
2. Psychological traits and states can be quantified and measured
3. Test-related behavior predicts non-test related behavior
4. Test and other measurement techniques have strength and weaknesses
5. Various sources of error are part of the assessment process
6. Testing and assessment can be conducted in a fair and unbiased manner
7. Testing and assessment benefit society
-Validity
-measure what it purports to measure
-other considerations:
-a good test is one that trained examiners can administer, score and interpret with a minimum of
difficulty
-a good test contains
Norms
-(singular)refers to behavior that is usual, average, normal, standard, expected or typical
-(psychometric context) test performance data of a particular group of testtakers that are
designed for use as a reference when evaluating test scores
Normative Sample
-group of people whose performance on a particular test is analyzed for reference in evaluating
the performance of individual testtakers
Norming
-refers to the process of deriving norms
Population –the complete universe or set of individuals with at least one common observable
characteristic
Methods:
1. Stratified Sampling
-reduces bias
-members of the sample came from different strata
2. Random Sampling
-Every member of the population has the same chance of being included in the sample
3. Purposive Sampling
-arbitrarily select some sample because it is believe that it will be the best to represent
the population
-common in consumer psychology
Types of Norms
1. Percentile
-an expression of the percentage of people whose score on a test or measure falls
below a particular raw score
-a converted raw score that refers to a percentage of testtakers
-problem with using percentile: there will be distortion in normal curve and highly skewed
data
2. Age Norms
-also known as age-equivalent scores, age norms indicate the average performance of
different samples of testtakers who were at various ages at the time the test was
administered
3. Grade Norms
-average test performance of testtakers in a given school grade
-representative samples of children over a range of consecutive grade levels
-useful only in children who are in school or already completed a particular grade
4. National Norms
-nationally representative of the population at the time the norming study was conducted
-example: large numbers of people representative of a particular variable
6. Subgroup Norms
-any of the criteria initially used in selecting subjects for the sample
-example: educational level of out-of-school youth
7. Local Norms
-local population’s performance on some test
-example: the arithmetic ability of the people of Silang, Cavite
Norm-referenced Evaluation
-deriving meaning from test scores by evaluating an individual’s score with reference to a
particular norm
Criterion-referenced Evaluation
-deriving from test scores by evaluating an individual’s score with reference to a particular
standard
-also known as domain or content referenced testing and assessment and mastery test
CHAPTER 5: RELIABILITY
Reliability
-synonyms: dependability/consistency
-refers to the consistency in measurement
-refers to the proportion of the total variance attributed to true variance
-the greater the proportion of the total variance attributed to true variance, the more reliable the test
Reliability Coefficient
-an index of reliability, a proportion that indicates the ratio between the true score variance on a test
and the total variance
Measurement Error
-all of the factors associated with the process of measuring some variable, other than the variable
being measured
Do’s
1. Randomly assign items on the halves of the test.
2. Assign odd-numbered items to one half of the test and even-numbered items to the other
half (odd-even reliability).
3. Divide the test by content.
Don’t’s
Don’t split into middle
-in general, a primary objective in splitting a test in half for the purpose of obtaining a split-half
reliability estimate is to create what might be called “mini-parallel-forms,” with each half equal
to the other – or as nearly equal as humanly possible – in format, stylistic, statistical, and
related aspects.
Note:
If the reliability of the original test is relatively low, it is impractical to increase the test items,
instead, you can develop a new test or creating new test items, clarifying the test instructions,
or simplifying the scoring rules.
4. INTER-ITEM CONSISTENCY
-1 test; 1 group; 1 administration
-assessing test of homogeneity
-more homogenous, more inter-item consistency
Kuder-Richardson Formulas
-developed by G. Frederic Kuder and M.W. Richardson
-KR-20
-named KR-20 because it was the twentieth formula developed in a series
-when to use? homogenous items and dichotomous items
-note: if items are more heterogenous, KR-20 will yield lower reliability estimates than the
split-half method
-KR-21
-used for a test where the items are all about the same difficulty
Coefficient Alpha
-developed by Cronbach
-when to use? homogenous items and non-dichotomous items
-the preferred statistic for obtaining an estimate of internal consistency reliability
-values ranges from (No Similarity 0.00 – Perfectly Identical 1.00)
-note: a value of .90 or above indicates redundancy of items
Criterion-referenced Test (designed to provide an indication of where a testtaker stands with respect
to some variable or criterion)
-statistical measures are not widely applicable
Domain-the universe of items that could conceivably measure that behavior, can be thought of
as a hypothetical construct: one that shares certain characteristics with (and is measured by)
the sample of items that make up the test
3. GENERALIZABILITY THEORY
-based on the idea that a person’s test scores vary from testing to testing because of variables
in the testing situation
-“Given the exact same conditions of all facets of universe, the exact same test score should
be obtained”
Generalizability Study-examines how generalizable scores from a particular test are if the
test is administered in different situations
Decision Study-developers examine the usefulness of test scores in helping the test user
make decisions
4. ITEM-RESPONSE THEORY
-the procedures provide a way to model the probability that a person with X ability will be able
to perform at a level of Y
-aka Latent-Trait Theory
CHAPTER 6: VALIDITY
VALIDITY
-used in conjunction with the meaningfulness of a test score – what the test score truly means
-estimate of how well a test measures what it purports to measure in a particular context
-no test is universally valid for all time
-the validity of a test must be proven again from time to time
Validation
-the process of gathering and evaluating evidence about validity
Trinitarian View
-Content Validity
-Criterion-related Validity
-Construct Validity
Face Validity
-what a test appears to measure to the person being tested than to what the test actually
measures
-a judgment concerning how relevant the test items appear to be
-example: high face validity = structured personality test
-example: low face validity = projective test
-lack of face validity may result to decrease level of cooperation or motivation of testtaker and
might not get “buy in” value for test user
-face validity may be more of a public relations
ASPECTS OF VALIDITY:
Content Validity
-a judgment of how adequately a test samples behavior representative of the universe of
behavior that the test was designed to sample
Construct-irrelevant variance –occurs when scores are influenced by factors irrelevant to the
construct
Criterion-Related Validity
-a judgment of how adequately a test score can be used to infer an individual’s most probable
standing on some measure of interest – the measure of interest being the criterion
Criterion
-the standard against which a test or test score is evaluated
-characteristics: relevant, valid, uncontaminated
criterion contamination-the term applied to a criterion measure that has been based, at least
in part, on predictor measures
Concurrent Validity
-test scores and criterion measures are obtained at about the same time
-a test with satisfactorily demonstrated concurrent validity may therefore be appealing to
prospective users because it holds out the potential of savings of money and professional time
Predictive Validity
-tells how well a certain measure can predict future behavior
Statistical Evidence:
-Validity Coefficient-a correlation that provides a measure of the relationship between test
scores and scores on the criterion measure (0.30-0.40 high validity coefficient)
Construct Validity
-a judgment about the appropriateness of inferences drawn from test scores regarding individual
standings on a variable called a construct
-“umbrella validity”
-“viewed as the unifying concept for all validity evidence”
Construct
-an informed, scientific idea developed or hypothesized to describe or explain behavior
-unobserved , presupposed (underlying) traits that a test developer may invoke to describe test
behavior or criterion performance
-example: intelligence, self-esteem, motivation
Convergent Evidence/Validity
-correlations with tests purporting to measure an identical construct but also from
correlations with measures purporting to measure related constructs
Discriminant Evidence/Validity
-little (a statistically insignificant) relationship between test scores and/or other variables
with which scores on the test being construct-validated should not theoretically be
correlated
Multitrait-Multimethod Matrix
-a useful technique for examining both convergent and discriminant validity
-two or more traits + two or more methods
Factor Analysis
-also helplful in obtaining convergent and discriminant validity
-class of mathematical procedures designed to identify factors or specific variables that are
typically attributes, characteristics, or dimensions on which people may differ
-Factor Loading
-the extent to which the factor determines the test score or scores
Test Bias
-Bias- a factor inherent in a test that systematically prevents accurate, impartial measurement
Slope Bias-when the slope of one group’s regression line differs significantly from others
Rating Error
-Rating-a numerical or verbal judgment (or both) that places a person or an attribute along a
continuum
-Rating Error-a judgment resulting from the intentional or unintentional misuse of a rating
scale
Severity Error
Central Tendency Error
Generosity and Leniency Error
(Note: To avoid this to happen, it is advisable to use Rankings)
Test Fairness-extent to which a test is used in an impartial, just and equitable way
CHAPTER 7: UTILITY
Utility
-usefulness or practical value of testing to improve efficiency
-also used to refer to the usefulness or practical value of a training program or intervention
2. Cost
-refers to disadvantages, losses or expenses in both economic and noneconomic terms
-economic, financial or budget-related in nature must certainly be taken into account
3. Benefits
-refers to profits, gains or advantages
Utility Analysis
-a family of techniques that entail a cost-benefit analysis designed to yield information relevant
to a decision about the usefulness and/or practical value of a tool of assessment
2. Brodgen-Cronbach-Gleser Formula
-a formula used to calculate the dollar amount of a utility gain resulting from the use of a
particular selection instrument under specified conditions
3. Decision Theory
-a body of methods used to quantitatively evaluate selection procedures, diagnostic
classifications, therapeutic interventions or other assessment or intervention-related
procedures in terms of how optimal they are (most typically from a cost-benefit
perspective)
1. Hit
-definition: a correct classification
-example: a qualified driver is hired; an unqualified driver is not hired
2. Miss
-definition: an incorrect classification; a mistake
-example: a qualified driver is not hired; an unqualified driver is hired
3. Hit Rate
-definition: the proportion of people that an assessment tool accurately identified as
possessing a particular variable
-example: the proportion of qualified drivers with a passing score who actually gain
permanent employee status; the proportion of unqualified drivers with a failing score who
did not gain permanent status
4. Miss Rate
-definition: the proportion of people that an assessment tool inaccurately identified as
possessing a particular variable
-example: the proportion of drivers whom inaccurately predicted to be qualified; the
proportion of drivers whom inaccurately predicted to be unqualified
5. False Positive
-definition: falsely indicates that the testtaker possesses a particular variable
-example: a driver who is hired is not qualified
6. False Negative
-definition: falsely indicates that the testtaker does not possess a particular variable
-example: the assessment tool says to not hire but driver would have been rated as
qualified
-relative cut score-a reference point – in a distribution of test scores used to divide a set
of data into two or more classifications – that is a set based on norm-related
considerations rather than on the relationship of test scores to a criterion
-aka norm-referenced cut score
-normative
-fixed cut score-a reference point – in a distribution of test scored used to divide a set of
data into two or more classifications – that is typically set with reference to a judgment
concerning a minimum level of proficiency required to be included in a particular
classification
-aka absolute cut score
-criterion
3. IRT-Based Methods
-in order to “pass” the test, the testtaker must answer items that are considered that has
some minimum level of difficulty, which is determined by the experts and serves as the
cut score
4. Other Methods:
Method of Predictive Yield
-a technique for identifying cut scores based on the number of positions to be filled
Discriminant Analysis
-a family of statistical techniques used to shed light on the relationship between certain
variables and two or more naturally occurring groups
CHAPTER 8: TEST DEVELOPMENT
Scaling
-the process of setting rules for assigning numbers in measurement
-scaling methods:
a. Likert Scales
-a type of summative rating scale
-five alternative responses (sometimes seven)
-ordinal in nature
b. Paired Comparison
-scaling method whereby one of a pair of stimuli (such as photos) is selected according to a rule
(such as “select the one that is more appealing”)
-scaling systems:
a. comparative scaling (best to worst)
b. categorical scaling (section1, section 2, section 3)
Writing Items
-When devising a standardized test using a multiple-choice format, it is usually advisable that
the first draft contain approximately twice the number of items that the final version of the test
will contain.
-item pool-the reservoir or well from which items will or will not be drawn for the final version
of the test; the collection of items to be further evaluated for possible selection for use in an
item bank
Item Format
-the form, plan, structure, arrangement and layout of individual test items
1. Selected-Response Format
-a form of test item requiring testtakers to select a response
A. Multiple-Choice Format
-has 3 elements: stem, correct alternative/option, distractors/foils
-criteria of good multiple-choice:
-has one correct alternative
-has grammatically parallel alternatives
-has alternatives of similar length
-has alternatives that fit grammatically with the stem
-includes as much of the item as possible in the stem to avoid unnecessary
repetition
-avoids ridiculous distractors
-not excessively long
B. Matching-item
-a testtaker is presented with two columns: premises and responses, and must
determine which response is best associated with which premise
-testtaker could get perfect score even if he did not actually know all the answers
-to minimize the possibility, provide more options or state in the directions that each
response may be a correct answer once, more that once or not at all
2. Constructed-Response Items
-a form of test item requiring the testtaker to construct or create a response
B. Essay
-is useful when the test developer wants the examinee to demonstrate a depth of
knowledge about a single topic
-allows for the creative integration and expression of the material in the testtaker’s
own words
-the main problem in essay is the subjectivity in scoring
Item Bank
-a collection of questions to be used in the construction of tests computer test
administration
Item Branching
-in computerized adaptive testing, the individualized presentation of test items drawn
from an item bank based on the testtaker’s previous responses
Scoring Items
1. Cumulative Model-a method of scoring whereby points or scores accumulated on
individual items or subtests are tallied and then, the higher the total sum, the higher the
individual is presumed to be on the ability, trait, or other characteristic being measured
(Example: High IQ Score > more intelligent)
2. Class or Category Scoring-a method of evaluation in which test responses earn credit
toward placement in a particular class or category with other testtakers. Sometimes
testtakers must meet a set number of responses corresponding to a particular criterion in
order to be placed in a specific category or class
(Examples: GPA of 1.50 and above will be placed on Star Section; GPA of 2 and below will
be placed on Lower Section)
3. Ipsative Scoring-an approach to test scoring and interpretation wherein the testtaker’s
responses and the presumed strength of a measured trait are interpreted relative to the
measured strength of other traits for that testtaker / forced to answer
(Example: High Score in Extraversion; Low in Agreeableness)
Formula:
# of testtakers who answered correctly
_______________________________
Total # of testtakers
Level of Difficulty:
1.0 to 0.20 – very difficult
0.21 to 0.40 – difficult
0.41 to 0.60 – average
0.61 to 0.80 – easy
0.81 to 1.00 – very easy
Standards:
0.50 – optimal average item difficulty (whole test)
0.30 to 0.80 – average item difficulty on individual items
0.75 – true or false
0.625 – multiple choice (4 choices)
Formula:
UL – LL
____________
# of testtakers
Item-Characteristic Index
-graphic representation of item difficulty and item discrimination
-the steeper the slope, the greater item discrimination
-easy item – lean on left
-difficult item – lean on right
Cross Validation-a revalidation on a sample of testtakers other than the testtakers on whom
test performance was originally found to be a valid predictor of some criterion
Co-Validation-the test validation process conducted on two or more tests using the same
sample of testtakers; when used in conjunction with the creation of norms or the revision of
existing norms; this process may also be referred to as co-norming
CHAPTER 9: INTELLIGENCE AND MEASUREMENT
INTELLIGENCE
-a multifaceted capacity that manifests itself in difference ways across the life
span
1. FRANCIS GALTON
-first person to published on the heritability of intelligence, thus framing the
contemporary nature-nurture debate
-he believed that the most intelligent persons were those equipped with the
best sensory abilities
-attempted to measure this sort of intelligence in many of the sensorimotor
and other perception-related tests he devised
INTERACTIONISM
(Heredity + Environment = Intelligence)
2. ALFRED BINET
-components of intelligence: reasoning, judgment, memory and abstraction
-more complex measure of intelligence
3. DAVID WECHSLER
-intelligence as “aggressive” or “global” capacity
-considered other factors (traits and personality) in assessing intelligence
-at first, he proposed two qualitatively abilities: Verbal and Performance
-then, he added other factors: Verbal Comprehension, Working Memory,
Perceptual Organization, Processing Speed
4. JEAN PIAGET
-intelligence is evolving biological adaptations to the outside world
-focused on the development of cognition in children
-schema (or schemata) -an organized action or mental structure that when
applied to the world, leads to knowing and understanding
-the basic mental operations:
-Assimilation -actively organizing new information so that it fits in what
already perceived and thought
-Accommodation -changing what is already perceived or thought so that it
fits with the new information
-Disequilibrium -causes the individual to discover new information perceptions
and communication skills
5. CHARLES SPEARMAN
-Theory of General Intelligence / Two-Factor Theory of Intelligence
-(g) - general intellectual ability
-(s) - specific components
-(e) - error components
-The greater the magnitude of g in a test of intelligence, the better the test
was thought to predict overall intelligence
-g factor is based on some type of general electrochemical mental energy
available to the brain for problem solving
-Abstract Reasoning were thought to be the best measures of g in formal test
-Group Factors - an intermediate class of factors common to a group of
activities but not at all
Ex: Linguistic, Mechanical, Arithmetical
8. HOWARD GARDNER
-intelligence is the ability to solve problems or to create products, that are
valued within one or more cultural settings
-theory of multiple intelligence:
-logical-mathematical
-bodily-kinesthetic
-linguistic
-musical
-spatial
-interpersonal
-intrapersonal
9. RAYMOND CATTELL
-two major types of cognitive abilities:
-Crystallized Intelligence (Gc)
-acquired skills and knowledge that are dependent on exposure to a
particular culture as well as on formal and informal evaluation
(Example: Vocabulary)
INFORMATION-PROCESSING VIEW
15. Others
-PASS Model
-Planning -strategy development for problem solving
-Attention/Arousal -receptivity to information
-Simultaneous and Successive -the type of information processing
employed
Measuring Intelligence
C. Adult
-according to Wechsler, abilities such as retention of general
information, quantitative reasoning
-expressive language and memory, and social judgment
-obtain during clinical evaluation or corporate assessment
1st Edition
-The first published intelligence test to provide organized and detailed
administration and scoring instructions
-The first American test to employ the concept of IQ. And it was the first test to
introduce the concept of an alternate item, an item to be substituted for a
regular item under specified conditions
-Criticism: lack of representativeness of the standardization sample
Revisions:
1937
-Included the development of two equivalent forms, labeled L (for Lewis) and
M (for Maud)
-New types of tasks for use with preschool-level and adult-level testtakers
-Adequate standardization sample
-Criticism: lack of representation of minority groups during the test’s
development
1960
-consisted of only a single form (labeled L-M) and included the items
considered to be the best from the two forms of the 1937 test, with no new
items added tot he test
-the use of the deviation IQ tables in place of the ratio IQ tables
1972
-the quality of the standardization sample was criticized
-norms may also have overrepresented the West, as well as large urban
communities
Knowledge (KN)
-skills and knowledge acquired by formal and informal education
-Routing Test
-A task used to direct or route the examinee to a particular level of
questions
-Direct an examinee to test items that have a high probability of being at
an optimal level of difficulty
-Teaching items
-designed to illustrate the task required and assure the examiner that the
examinee understands
-Testing the Limit -A procedure that involves administering test items beyond
the level at which the test manual dictates discontinuance
-WAIS-R (1981)
-new norms and materials
-alternate administration of verbal and performance tests
-WAIS-III (1997)
-contained updated and more user-friendly materials
-test materials were made physically larger to facilitate viewing by older adults
-some items were added to each of the subtests that extended the test’s floor
in order to make the test more useful for evaluating people with extreme
intellectual deficits
-extensive research was designed to detect and eliminate items that may
have contained cultural bias
-norms were expanded to include testtakers in the age range 74-89
-yielded a full scale (composite) IQ as well as four Index Scores - Verbal
Comprehension, Perceptual Organization, Working Memory, and Processing
Speed -used for more in-depth interpretation of findings
-WAIS-IV (2008)
-It is made up of subtests that are designated either as core or supplemental
-Core subtest is one that is administered to obtain a composite score
-Supplemental Subtest is used for purposes such as providing additional
clinical information or extedning the number of abilities or processes
sampled
-Intended for use with individuals ages 16 to 90 years and 11 months
-contains ten core subtests (Block Design, Similarities, Digit Span, Matrix
Reasoning, Vocabulary, Arithmetic, Symbol Search, Visual Puzzles,
Information and Coding)
-and five supplemental subtests (Letter-Number Sequencing, Figure Weights,
Comprehension, Cancellation and Picture Completion)
-more explicit administration instructions as well as the expanded use of
demonstration and sample items - this in an effort to provide assessees with
practice in doing what is required, in addition to feedback on their
performance
-all of the test items were thoroughly reviewed to root out any possible cultural
bias
-Floor = 40, Ceiling = 160
-WISC-V (2014)
-ages 6 years old to 16 years and 11 months
-FSIQ, Primary Index Scores and Ancillary Index Scores
-21 subtests; 15 composite scores
-completion time: 60 minutes
-WPPSI (2012)
-ages 2 years and 6 months up to 7 years and 7 months
-completion time:
-ages 2:6 to 3:11 = 30-45 minutes
-ages 4:0 to 7:7 = 45-60 minutes
-WASI-2 2011
-making the test materials more user friendly, and increasing the psychometric
soundness of the test
-World War 2
-Army General Classification Test (AGCT) -administered to more than 12
million recruits
-Today
-group tests are still administered to prospective recruits, primarily for
screening purposes
-Screening tool -an instrument or procedure used to identify a particular
trait or constellation of traits at a gross or imprecise level
-Meaures of Creativity:
-Originality -the ability to produce something that is innovative or nonobvious
-Fluency -the ease with which responsesare reproduced and is usually
measured by the total number of responses produced
-Flexibility -the variety of ideas presented and the ability to shift from one
approach to another
-Elaboration -the richness of detail in a verbal explanation or pictorial display
-A criticism frequently leveled at group standardized intelligence tests (as well
as at other ability and achievement tests) is that evaluation of test
performance is too heavily focused on whether the answer is correct
-The heavy emphasis on correct response leaves little room for the evaluation
of processes such as originality, fluency, flexibility and elaboration
-Convergent thinking
-a deductive reasoning process taht entails recall and consideration of facts
as well as a series of logical judgments to narrow down solutions and
eventually arrive at one solution
-Divergent thinking
-a reasoning process in which thought is free to move in many different
directions, making several solutions possible
-requires flexibility of thought, originality, and imagination
-It is interesting that many tests of creativity do not fare well when evaluated
by traditional psychometric procedures
Nature vs Nurture
-Preformationism
-all living organisms are preformed at birth
-all of the organism’s structures, including intelligence, are preformed at birth
and therefore cannot be improved
-it is like a cocoon turned into butterfly
-Predeterminism
-one’s abilities are pre-determined by genetic inheritance and that no amount
of learning or other intervention can enhance what has been genetically
encoded to unfold time
-Arnold Gesell
-”training does not transcend maturation”
-mental development as a progressive morphogenesis of pattern of behavior
-behavior patterns are predetermined by “innate process growth”
-Francis Galton
-believed that genius was hereditary
-Richard Dugdale
-argued that degeneracy (being immoral) was also inherited
-Henry Goddard
-role of hereditary in feeblemindedness
-feeblemindedness is the product of recessive gene
-Lewis Terman
-the father of the American version of Binet’s test
-based on his testing he concluded that Mexican and Native American are
inferior
-Karl Pearson
-”Jews are somewhat inferior physiologically and mentally”
-Wendy Johnson
-VPR Model -strong genetic influence on mental ability
-Interactionist View
-we are free to become all that we can be
Other issues:
-Flynn effect
-intelligence inflation/10 years
-Personality
-Street efficacy -perceived ability to avoid violent confrontations and to be safe
in one’s neighborhood
-Gender
-males have the edge when it comes to g factor in intelligence especially
when only the highest-scoring group on the ability test is considered
-males also tend to outperform females on tasks requiring visual spatialization
-girls may general outperform on language-skill related task, although
differences may be minimized when assessment is conducted by computer
-Family Environment
-divorce may have significant consequences in the life of child ranging from
impaired school achievement to impaired social problem solving ability
-Culture
-Culture loading -a test incorporates the vocabulary, concepts, traditions,
knowledge, and feelings associated with a particular culture
-Culture-Fair Intelligence Test
-designed to minimize the influence of culture with regard to various aspects
of the evaluation procedures
CHAPTER 10: TESTS OF INTELLLIGENCE
Measuring Intelligence
C. Adult
-according to Wechsler, abilities such as retention of general information,
quantitative reasoning
-expressive language and memory, and social judgment
-obtain during clinical evaluation or corporate assessment
1st Edition
-The first published intelligence test to provide organized and detailed administration
and scoring instructions
-The first American test to employ the concept of IQ. And it was the first test to introduce
the concept of an alternate item, an item to be substituted for a regular item under
specified conditions
-Criticism: lack of representativeness of the standardization sample
Revisions:
1937
-Included the development of two equivalent forms, labeled L (for Lewis) and M (for
Maud)
-New types of tasks for use with preschool-level and adult-level testtakers
-Adequate standardization sample
-Criticism: lack of representation of minority groups during the test’s development
1960
-consisted of only a single form (labeled L-M) and included the items considered to be
the best from the two forms of the 1937 test, with no new items added tot he test
-the use of the deviation IQ tables in place of the ratio IQ tables
1972
-the quality of the standardization sample was criticized
-norms may also have overrepresented the West, as well as large urban communities
Knowledge (KN)
-skills and knowledge acquired by formal and informal education
-Routing Test
-A task used to direct or route the examinee to a particular level of questions
-Direct an examinee to test items that have a high probability of being at an
optimal level of difficulty
-Teaching items
-designed to illustrate the task required and assure the examiner that the examinee
understands
-Basal Level -A stage in a test achieved by a testtaker by meeting some preset criterion
to continue to be tested for example, responding correctly to two consecutive items on
an ability test that contains increasingly difficult items may establish a “base” from which
to continue testing
-Testing the Limit -A procedure that involves administering test items beyond the level at
which the test manual dictates discontinuance
-WAIS-R (1981)
-new norms and materials
-alternate administration of verbal and performance tests
-WAIS-III (1997)
-contained updated and more user-friendly materials
-test materials were made physically larger to facilitate viewing by older adults
-some items were added to each of the subtests that extended the test’s floor in order to
make the test more useful for evaluating people with extreme intellectual deficits
-extensive research was designed to detect and eliminate items that may have
contained cultural bias
-norms were expanded to include testtakers in the age range 74-89
-yielded a full scale (composite) IQ as well as four Index Scores - Verbal
Comprehension, Perceptual Organization, Working Memory, and Processing Speed
-used for more in-depth interpretation of findings
-WAIS-IV (2008)
-It is made up of subtests that are designated either as core or supplemental
-Core subtest is one that is administered to obtain a composite score
-Supplemental Subtest is used for purposes such as providing additional
clinical information or extedning the number of abilities or processes sampled
-Intended for use with individuals ages 16 to 90 years and 11 months
-contains ten core subtests (Block Design, Similarities, Digit Span, Matrix Reasoning,
Vocabulary, Arithmetic, Symbol Search, Visual Puzzles, Information and Coding)
-and five supplemental subtests (Letter-Number Sequencing, Figure Weights,
Comprehension, Cancellation and Picture Completion)
-more explicit administration instructions as well as the expanded use of demonstration
and sample items - this in an effort to provide assessees with practice in doing what is
required, in addition to feedback on their performance
-all of the test items were thoroughly reviewed to root out any possible cultural bias
-Floor = 40, Ceiling = 160
Wechsler Intelligence Scale for Children (WISC)
-1st edition 1949
-currently in its 5th edition
-WISC-V (2014)
-ages 6 years old to 16 years and 11 months
-FSIQ, Primary Index Scores and Ancillary Index Scores
-21 subtests; 15 composite scores
-completion time: 60 minutes
-WPPSI (2012)
-ages 2 years and 6 months up to 7 years and 7 months
-completion time:
-ages 2:6 to 3:11 = 30-45 minutes
-ages 4:0 to 7:7 = 45-60 minutes
-WASI-2 2011
-making the test materials more user friendly, and increasing the psychometric
soundness of the test
-World War 2
-Army General Classification Test (AGCT) -administered to more than 12 million
recruits
-Today
-group tests are still administered to prospective recruits, primarily for screening
purposes
-Screening tool -an instrument or procedure used to identify a particular trait or
constellation of traits at a gross or imprecise level
-Meaures of Creativity:
-Originality -the ability to produce something that is innovative or nonobvious
-Fluency -the ease with which responsesare reproduced and is usually measured by the
total number of responses produced
-Flexibility -the variety of ideas presented and the ability to shift from one approach to
another
-Elaboration -the richness of detail in a verbal explanation or pictorial display
-A criticism frequently leveled at group standardized intelligence tests (as well as at
other ability and achievement tests) is that evaluation of test performance is too heavily
focused on whether the answer is correct
-The heavy emphasis on correct response leaves little room for the evaluation of
processes such as originality, fluency, flexibility and elaboration
-Convergent thinking
-a deductive reasoning process taht entails recall and consideration of facts as well as a
series of logical judgments to narrow down solutions and eventually arrive at one
solution
-Divergent thinking
-a reasoning process in which thought is free to move in many different directions,
making several solutions possible
-requires flexibility of thought, originality, and imagination
-It is interesting that many tests of creativity do not fare well when evaluated by
traditional psychometric procedures
CHAPTER 11: PRESCHOOL AND EDUCATIONAL ASSESSMENT
Infant Scales
Woodcock-Johnson III
-designed as a broad-range individually administered test to be used in
educational settings
-it assesses general intellectual ability (g), specific cognitive abilities,
scholastic aptitude, oral language and achievement
-the Woodcock-Johnson III’s cognitive ability standard battery includes 10
tests such as verbal comprehension, visual-auditory learning, spatial relations,
and visual matching
-has relatively good psychometric properties
-based on CHC Model
Visuographic Test
RTL Model
Response to intervention model as a multi level prevention framework applied
in educational settings that is designed to maximize student achievement
through the use of data that identifies students at risk for poor learning
outcomes combined with evidence-based intervention and teaching that is
adjusted on the basis of student responsiveness
Achievement Tests
● Designed to measure accomplishment
● A test of achievement may be standardized nationally, regionally, or
locally, or it may not be standardized at all
● A sound achievement test is ong that adequately samples the targeted
subject matter and reliably gauges the extent to which the examinees
have learned it
● Curriculum-based assessment (CBA)
- a term used to refer to assessment of information acquired from
teachings at school
- Curriculum-based measurement (CBM)
- a type of CBA, is characterized by the use of standardized
measurement procedures to derive local norms to be used in the
evaluation of student performance on curriculum-based tasks
Aptitude Tests
● Tend to focus more on informal learning or life experiences
● Also referred to as prognostic tests, are typically used to make
predictions
Pre-School Level
● Checklist - a questionnaire on which marks are made to indicate the
presence or absence of a specified behavior, thought, event, or
circumstance
● Rating Scale - a form completed by an evaluator (a rater, judge, or
examiner) to make a judgment of relative standing with regard to a
specified variable or list of variables
● Apgar number - "everybody's first test"
● A score on a rating scale developed by physician Virginia Apgar
(1909-1974), an obstetrical anesthesiologist who saw a need for
a simple, rapid method of evaluating newborn infants and
determining what immediate action, if any, is necessary
● Informal evaluation - a typically nonsystematic, relatively brief, and
"off-the-record" assessment leading to the formation of an opinion or
attitude conducted by any person, in any way, for any reason, in an
unofficial context that is not subject to the ethics or other standards of
an evaluation by a professional
● At risk - children who have documented difficulties in one or more
psychological, social, or academic areas and for whom intervention is
or may be required
Diagnostic Tests
● a tool used to identify areas of deficit to be targeted for intervention
● Evaluative Information
● typically applied to tests or test data that are used to make
judgments (such as pass-fail)
- Diagnostic Information
● Typically applied to tests or test data used to pinpoint a student's
difficulty, usually for remedial purposes
Terms
Personality - Individual's unique constellation of psychological traits that is relatively stable over
time
Personality Assessment - the measurement and evaluation of psychological traits, states,
values, interests, attitudes, worldview, acculturation, sense of humor, cognitive and behavioral
styles, and/or related individual characteristics
Personality Traits - Any distinguishable, relatively enduring way in which one individual varies
from another
Personality Type - a constellation of traits that is similar in pattern to one identified category of
personality within a taxonomy of personalities
Personality States - relatively temporary predisposition
Frame of Reference
● Frame of Reference - defined as aspects of the focus of exploration such as the time
frame (the past, the present, or the future) as well as other contextual issues that involve
people, places, and events
● Q-sort Technique - an assessment technique in which the task is to sort a group of
statements, usually in perceived rank order ranging from most descriptive to least
descriptive
Acculturation and Related Considerations
● Acculturation is an ongoing process by which an individual's thoughts, behaviors,
values, worldview, and identity develop in relation to the general thinking behavior,
customs, and values of a particular cultural group
● Rokeach (1973) differentiated what he called instrumental from terminal values
● Instrumental Values - are guiding principles to help on attain some objective
● Terminal Values - guiding principles and a mode of behavior that is an endpoint
objective
● Also intimately tied to the concept of acculturation concept of personal identity
● Identity - a set of cognitive and behavioral characteristics by which individuals
define themselves as members of a particular group
● Identification - process by which an individual assumes a pattern of behavior
characteristic of other people, and referred to it as one of the central issues that
ethnic minority groups must deal with
● Worldview is the unique way people interpret and make sense of their
perceptions as a consequence of their learning experiences, cultural background,
and related variables
Objective Methods
Logical-Content
Criterion-Group Strategy
● MMPI-2
● More representative standardization sample (normal control group) used in the
norming
● Items were rewritten to correct grammatical errors and to make the language
more contemporary, nonsexist, and readable
● 567 true-false items, including 394 items that are identical to the original MMPI
items, 66 items that were modified or rewritten, and 107 new items
● 18 years old and older
● The TRIN scale is designed to identify acquiescent and non acquiescent
response patterns. It contains 23 pairs of items worded in opposite forms
Theoretical Strategy
Combination Strategy
Projective Methods