(Cohen) Psych Assessment Reviewer

Chapter 1: INTRODUCTION
Testing
-used to refer to everything from the administration of a test to the interpretation of scores
Psychological Testing
-the process of measuring psychology-related variables by means of devices or procedures
designed to obtain a sample of behavior
Psychological Assessment
-the gathering and integration of psychology-related data for the purpose of making a
psychological evaluation that is accomplished through the use of tools such as tests, interviews,
case studies, behavioral observations and specially designed apparatuses and measurement
procedures
Psychological Testing Psychological Assessment

Objective Typically, to obtain some gauge, Typically, to answer a referral
usually numerical in nature, with question, solve a problem, or arrive at
regard to an ability or attribute a decision through the use of tools of
evaluation
Process Can be individual or group Typically individualized
Focuses on the results of the Focuses on the process

process
Role of the Tester is not the key; can be Assessor is the key
Evaluator substituted
Skills of the Technician-like skills Requires wide-range of knowledge
Evaluator
Outcome Scores Answer referral question
Process of Assessment:
Step 1: Assessment begins with a referral
Optional: The assessor may meet with the assesse or others before the formal assessment in
order to clarify aspects of the reason of referral
Step 2: The assessor prepares for the assessment by selecting the tools of assessment
Step 3: Formal assessment will begin
Step 4: After the assessment, the assessor will write a report of the findings that is designed to
answer the referral question
Optional: Feedback sessions
Approaches in Assessment:
Collaborative Psychological Assessment
-assessor and assesse work as partners
-aka therapeutic psychological assessment
Dynamic Assessment
-interactive, changing or varying in nature
-usually employed in education setting
Tools of Psychological Assessment:
1.) Test
-a measurement device or technique used to quantify behavior or aid in the understanding and
prediction of behavior
-Item –a specific stimulus to which a person responds overtly; this response can be scored or
evaluated
-Types:
A. Ability Tests
1. Achievement –previous learning
2. Aptitude – the potential for learning or acquiring a specific skill
3. Intelligence – a person’s general potential to solve problems, adapt to changing
circumstances, think abstractly and profit from experience
B. Personality Tests –related to the overt and covert dispositions of the individual
1. Structured –provides a self-report statement to which the person responds “True” or “False”,
“Yes” or “No”
2. Projective –provides an ambiguous test stimulus; response requirements are unclear
2.) Interview
-face to face
-taking note of both verbal and non-verbal behavior
-taking note of the way that the interviewee is dressed
-however, face to face contact is not always possible and interviews may be conducted
in other formats
3.) Case History Data/Case Study

-refers to records, transcripts, or other accounts in written, pictorial, or other form that preserves
archival information, official and informal accounts, and other data and items relevant to an
assesse
-shed light on an individual’s past and current adjustment as well as on events and
circumstances that may have contributed to any changes in adjustment
-useful in making judgments in future class placements (educational)
4.) Behavioral Observation

-monitoring the actions of others or oneself by visual or electronic means while recording
quantitative and/or qualitative information regarding those actions
-Naturalistic Observation –observe behavior of humans in a natural setting –that is, the setting
in which the behavior would typically be expected to occur
5.) Role Play Test

-a tool of assessment wherein assesses are directed to act as if they were in a particular
situation
6.) Computers as tools

-provides fast test administration, scoring and interpretation
-CAPA (Computer Assisted Psychological Assessment) –provides assistance to test user
-CAT (Computer Adaptive Testing) –ability to tailor test takers’ ability or test taking patterns
Other tools:
-Video
-Mental Health Procedures
Parties in the Assessment:
1. Test Developer
-create test or other methods of assessment
-bring a wide array of backgrounds and interest
2. Test User
-clinicians, counselors, school psychologists, human resources personnel, consumer
psychologists, experimental psychologists and social psychologists
-the one who will conduct tests
3. Test Taker
-the subject of an assessment or an observation
-aka assesse
-psychological autopsy –reconstruction of deceased individual’s psychological profile
4. Society at large
-as society evolves and as the need to measure different psychological variables emerges, test
developers respond by devising new tests
5. Other parties
-people who solely responsibility is the marketing and sales of test
-academicians who review test and evaluate their psychometric soundness
Types of settings where assessment conducted:
1. Educational Setting
-help identify children who may have special needs
-provides achievement test, which evaluated accomplishment or the degree of learning
has taken place
-Informal Evaluation –typically non-systematic assessment that leads to the formation of
an opinion or attitude
2. Clinical Setting
-help screen to diagnose behavior problems
-intelligence test, personality tests, etc.
-individualized
-group testing is used primarily for screening
3. Counseling
-ultimate objective is the improvement of the assesse in terms of adjustment, productivity
or some related variable
4. Geriatric
-for old age
-ultimate goal is to provide good quality of life
5. Business/Military
-decision-making about the careers of personnel
6. Governmental and Organizational

-governmental licensing, certification, or general credentialing of professionals
7. Other Settings
-court trials
-health psychology
Reference Sources:
Test Catalogues
-most readily accessible
-usually contain only a brief description of the test and seldom contain the kind of detailed
technical information
-the catalogue’s objective is to sell the test
Test Manuals
-detailed information concerning the development of test
-contains technical information
-requires credential before purchasing
Reference Volumes
-updated periodically
-provides detailed information for each test listed
Journal Articles
-may contain review of the test, updated or independent studies of its psychometric soundness
Online Database
-online website
Other Sources
-school library contains a number of other sources that may be used to acquire information
about tests and test-related topics
CHAPTER 2: HISTORICAL, CULTURAL AND LEGAL/ETHICAL CONSIDERATIONS
Early Antecedents
China
-Test and testing programs first came into China as early as 2200BC
-The purpose is in the means of selecting who, of many applicants which obtain government jobs
Han Dynasty (206 B.C.E. to 220 C.E.)

-test batteries were quite common
-tests related to such diverse topics as civil law, military affairs, agriculture, revenue, and
geography
Ming Dynasty (1368-1644 C.E.)

-a national multistage testing program involved local and regional testing centers
equipped with special testing booths
Song Dynasty
-emphasis was placed on knowledge of classical literature
Ancient Greco-Roman
-Attempts to categorize people’s personality types in terms of bodily fluid
Charles Darwin and Individual Differences
Charles Darwin
-“Higher forms of life evolved partially because of differences among individual forms of life
within a species”
-“Those with the best or most adaptive characteristics survive at the expense of those who are
less fit and that the survivors pass their characteristics on to the next generation”
Francis Galton
-Classify people according to their natural gifts and to ascertain their deviation from an average
-Pioneered the use of a statistical concept central to psychological experimentation and testing:
the coefficient of correlation
Experimental Psychology and Psychophysical Measurement
Wilhelm Wundt
-First experimental psychology laboratory, founded at the University of Leipzig in Germany
James Mckeen Cattell

-Coined the term “mental test”
Students of Wundt
Charles Spearman
-originating the concept of test reliability
-building the mathematical framework for the statistical technique of factor analysis
Victor Henri
-suggest how mental test could be used to measure higher mental process
Emil Kraeplin
-word association techniques as a formal test
Lightner Witmer
-little known writer of clinical psychology and school psychology
The Evolution of Intelligence and Standardized Achievement Test
1895 – Binet and Henri published several articles in which they argued for the measurement of
abilities such as memory and social comprehension
1905 – Binet and Simon published 30-item measuring scale of intelligence designed to help
identify mentally retarded Paris school children
1939 – David Wechsler introduced a test designed to measure adult intelligence

(Wechsler-Bellevue Intelligence Scale (W-B))
World War 1 – Group Intelligence tests came into being in the United States in response to the
military’s need
The Measurement of Personality
World War 1
-Robert Woodworth developed Personal Data Sheet (measure of adjustment and emotional
stability)
-Woodworth Psychoneurotic Inventory (first widely self-report test of personality)
Projective Test
-an individual is assumed to project into some ambiguous stimulus his or her own unique needs,
fears, hopes and motivation
-Rorschach Inkblots (best known projective test)
Culture and Assessment
Culture
-the socially transmitted behavior patterns, beliefs and products of work of a particular
population, community or group of people
Henry Goddard
-used interpreters in test administrator, employed a bilingual psychologist and administered
mental tests to selected immigrants who appeared mentally-retarded
-written excessively on the genetic nature of mental deficiency, but he did not summarily
conclude that these findings were the result of hereditary
Some Issues regarding Culture and Assessment
Verbal Communication
-the examiner and the examinee must speak the same language
Non-verbal Communication and behavior

-facial expressions, finger and hand signs and shifts in one’s position in space may all convey
messages
Test User Qualifications
Level A: Tests or aids that can adequately be administered, scored and interpreted with the aid
of the manual and a general orientation to the kind of institution or organization in which one is
working (for instance, achievement or proficiency tests)
Level B: Tests or aids that require some technical knowledge of test construction and use and of
supporting psychological and educational fields such as statistics, individual differences,
psychology of adjustment, personnel psychology and guidance (e.g., aptitude tests and
adjustment inventories applicable to normal populations).
Level C: Tests and aids that require substantial understanding of testing and supporting
psychological fields together with supervised experience in the use of these devices (for
instance, projective tests, individual mental tests).
Testing People with Disabilities

1. Transforming the test into a form that can be taken by test taker
2. Transforming the responses of test taker so that they are score-able
3. Meaningfully interpreting the test data
The Rights of Test Takers
1. The Rights of Informed Consent

- Testtakers have a right to know why they are being evaluated, how the test data will be
used and what (if any) information will be release to whom
- If a testtaker is incapable of providing an informed consent to testing, such consent may
be obtained from a parent or a legal representative
● (a) do not use deception unless it is absolutely necessary
● (b) do not use deception at all if it will cause participants emotional distress
● (c) fully debrief participants
2. The Rights to be informed of test findings

- Testtakers have a right to be informed, in language they can understand, of the nature of
the findings with respect to a test they have taken
- They are also entitled to know what recommendations are being made as consequence of
the test data
- Testtakers have the rights also if the test results are voided
3. The Right to Privacy and Confidentiality

- Privacy - freedom of the individual to pick and choose for himself the time,
circumstances, and particularly the extent to which the he wishes to share or withhold
from others his attitudes, beliefs, behavior, and opinions
- Privileged – it is information that is protected by law from disclosure in a legal
proceeding
- Confidentiality – protects client’s from disclosure outside the courtroom
4. The Right to the least stigmatizing label

Chapter 3: BASIC STATISTICS
Scales of Measurement
Measurement
-act of assigning numbers or symbols to characteristics of things according to rules
Scales
-set of numbers whose properties model empirical properties of the objects to which the
numbers are assigned
-Continuous Scale –measures continuous variable
-Discrete –categorization has no much meaning
Properties of Scales
1. Magnitude –the property of “moreness”
2. Equal Intervals –the difference between two points at any place on the scale has the
same meaning as the difference between two other points that differ by the same
number of scale units
3. Absolute Zero –obtained when nothing of the property being measured exists
Types/Levels of Scales
1. Nominal Scales –with classification or categorization based on one or more
distinguishing characteristics
2. Ordinal Scales –with classification and ranking or ordering
3. Interval Scales –with classification, ranking and equal intervals
4. Ratio Scales –all math operations can be meaningfully performed; has absolute zero
Describing Data
Distributions
-a set of test scores arrayed for recording or study
Raw Score
-straightforward, unmodified accounting of performance that is usually numerical
Frequency Distributions
-displays scores on a variable or a measure to reflect how frequently each value was obtained
Graphic Form
1. Histogram –a graph with vertical lines drawn at the true limits of each test score forming
a series of contiguous rectangles
2. Bar Graph –numbers indicative of frequency appear of Y-axis; categorization in X-axis
3. Frequency Polygon –expressed by a continuous line connecting the points where test
scores or class intervals meet frequencies
Percentile Ranks
-answers the question, “What percent of the scores fall below a particular score (Xi)?”
Percentiles
-the specific scores or points within a distribution
-divide the total frequency for a set of observations into hundredths
Measures of Central Tendency
-a statistic that indicates the average or midmost score between the extreme scores in a
distribution
Mean
-most commonly used
-most appropriate measure of central tendency for interval and ratio data
Media
-middle score in distribution
-most appropriate for ordinal, interval and ratio data
-useful when few scores fall at the high end or relatively few scores at the low end
Mode
-most frequently occurring score
-appropriate in nominal data
-not commonly used
-is useful in analysis of a qualitative or verbal nature
Measures of Variability
Variability
-an indication of how scores in a distribution are scattered or dispersed
● Range (highest score – lowest score)
● Interquartile and Semi-interquartile Ranges

-quarter –refers to a specific point ; 4 quarters; “in”
-quartile –refers to an interval; 3 quartiles; “at”
-interquartile range (Q3-Q1)
● Standard Deviation
-a measure of variability equal to the square root of the average squared deviations
about the mean
-is equal to the square root of the variance
-variance –equal to the arithmetic mean of the squares of the differences between the
scores in a distribution and their mean
Skewness
-symmetry is absent
-presence or absence of symmetry in a distribution is simply one characteristic by which a
distribution can be described
Kurtosis
-the steepness of a distribution in its center
-platy –flat
-lepto –peaked
-meso –middle
The Normal Curve

-scientists referred to it as Laplace-Gaussian Curve
-Karl Pearson is credited with being the first to refer to the curve as normal curve
-also called as Gaussian
-a bell-shaped, smooth, mathematically defined curve that is highest at its center
-the mean, the median, and the mode all have the same value
-a normal curve has two tails:
-tails –area on the normal curve between 2 and 3 standard deviations above the mean
Standard Scores
-a raw score that has been converted from one scale to another scale, where the latter scale
has some arbitrarily set mean and standard deviation
-more easily interpretable than raw scores
● Z-scores
-mean = 0 ; SD = 1
-is equal to the difference between a particular raw score and the mean divided by
standard deviation
Z-score = raw score – mean / standard deviation
● T-scores / McCall’s T
-mean = 50 ; SD = 10
-devised by W.A. McCall
-named a T-score in honor of his professor E.L. Throndike
-none of the scores is negative
● Stanine
-mean = 5 ; SD = 2
● Sten
-mean = 5.5 ; SD = 2
● Deviation IQ
-mean = 100 ; SD = 15
Correlation and Inference
Inferences (or deduced conclusions)

-how some things (such as traits, abilities, or interests) are related to other things (such as
behavior)
Coefficient of correlation (or correlation coefficient)

-a number that provides us with an index of the strength of the relationship between two things
The Concept of Correlation
Correlation
-an expression of the degree and direction of correspondence between two things
-degree (weak-strong)
-direction (positive, negative, no correlation)
-linear relationship
-only two variables
-numerical in nature
-no causation but can predict
● Pearson R
-most widely used
-also known as the Pearson correlation coefficient and the Pearson product-moment
coefficient of correlation
-used when variables are linear and continuous
-pearson and z-score are correlated because both are concerned with the location of an
individual in a distribution
-the smaller the p-value, the more significant the relationship
-larger correlation, means more related to each other
-coefficient of determinism (r2)
-an indication of how much variance is shared by the X and Y variables
-evaluated the strength of relationship
● Spearman Rho
-one commonly used alternative statistic
-also known as rank-order correlation coefficient or rank-difference correlation coefficient
-used when small sample size (fewer than 30 pairs) and ordinal data
● Point-biserial correlation
-relationship when one of the variables are dichotomous and the other is continuous
● Phi-coefficient
-used when both variables are dichotomous
Graphic Representation of Correlation

Scatterplot
-a simple graphing of the coordinate points for values of the X-variable (placed along the graph’s
horizontal axis) and the Y-variable (placed along the graph’s vertical axis)
-provide a quick indication of the direction and magnitude of the relationship, if any, between the
two variables
-useful in revealing the presence of curvilinearity in a relationship
-also makes the spotting of outliers relatively easy
-outliers –an extremely atypical point located at a relatively long distance –an outlying
distance – from the rest of the coordinate points in a scatterplot
-why outliers exist?
-simply the result of administering a test to a very small sample of testtakers
-sometimes help identify a testtaker who did not understand the instructions, was
not able to follow instructions, or was simply oppositional and did not follow
instructions
-sometimes provides a hint of deficiency in testing or scoring
Regression
-a reversion to the mean over time or generations
-analysis of relationships among variables of understanding how one variable may predict other
(X) IV – Predictor Variable
(Y) DV – Outcome Variable
Multiple Regression
-the use of more than one score to predict Y
-more predictors are not necessarily better
Meta-Analysis
-analysis of data from several studies
-a family of techniques used to statistically combine information across studies to produce single
estimates of the data under study
-more weight can be given to studies that have larger numbers of subjects
-advantages:
a. meta-analyses can be replicated
b. the conclusions of meta-analyses tend to be more reliable and precise than the conclusions
from single studies
c. there is more focus on effect size rather than statistical significance alone
d. meta-analysis promotes evidence-based practice, which may defined as professional practice
that is based on clinical and research findings
CHAPTER 4: OF TEST AND TESTING
Assumptions:
1. Psychological traits and states exist
2. Psychological traits and states can be quantified and measured
3. Test-related behavior predicts non-test related behavior
4. Test and other measurement techniques have strength and weaknesses
5. Various sources of error are part of the assessment process
6. Testing and assessment can be conducted in a fair and unbiased manner
7. Testing and assessment benefit society
What’s a Good Test?

-clear instructions for administration, scoring and interpretation
-offered economy in the time and money it took to administer, score and interpret
-psychometric soundness
-Reliability
-involves consistency
-the perfectly reliable measuring tool consistently measures in the same way
-necessary but not sufficient element of a good test
-Validity
-measure what it purports to measure
-other considerations:
-a good test is one that trained examiners can administer, score and interpret with a minimum of
difficulty
-a good test contains
Norms
-(singular)refers to behavior that is usual, average, normal, standard, expected or typical
-(psychometric context) test performance data of a particular group of testtakers that are
designed for use as a reference when evaluating test scores
Normative Sample
-group of people whose performance on a particular test is analyzed for reference in evaluating
the performance of individual testtakers
Norming
-refers to the process of deriving norms
Norm-referenced Testing and Assessment

-deriving meaning from test scores by evaluating an individual testtaker’s score and comparing
it to scores of a group of testtakers
-individual compared to a group of testtakers
Sampling to develop norms
Standardization or Test Standardization

-administering a test to a representative sample of testtakers for the purpose of establishing
norms
Sampling
-the process of selecting the portion of the universe deemed to be representative of the whole
population
Population –the complete universe or set of individuals with at least one common observable
characteristic
Sample –a portion of the universe deemed to be representative of the whole population

-the size of the sample could be small, but prone to errors
Methods:
1. Stratified Sampling
-reduces bias
-members of the sample came from different strata
2. Random Sampling
-Every member of the population has the same chance of being included in the sample
3. Purposive Sampling
-arbitrarily select some sample because it is believe that it will be the best to represent
the population
-common in consumer psychology
4. Incidental or Convenience Sampling

-practicality
-done due to budgetary limitations or other constraints
Developing Norms for a Standardized Test

-after obtaining a sample, test developer will set standard instructions and conditions in giving
the test
-after all the test data have been collected and analyze, the test developer will summarize data
using descriptive statistics
-test manuals sometimes supply prospective test users with guidelines for establishing local
norms
Types of Norms
1. Percentile
-an expression of the percentage of people whose score on a test or measure falls
below a particular raw score
-a converted raw score that refers to a percentage of testtakers
-problem with using percentile: there will be distortion in normal curve and highly skewed
data
2. Age Norms
-also known as age-equivalent scores, age norms indicate the average performance of
different samples of testtakers who were at various ages at the time the test was
administered
3. Grade Norms
-average test performance of testtakers in a given school grade
-representative samples of children over a range of consecutive grade levels
-useful only in children who are in school or already completed a particular grade
4. National Norms
-nationally representative of the population at the time the norming study was conducted
-example: large numbers of people representative of a particular variable
5. National Anchor Norms

-provide some stability to test score by anchoring them to other test scores
6. Subgroup Norms
-any of the criteria initially used in selecting subjects for the sample
-example: educational level of out-of-school youth
7. Local Norms
-local population’s performance on some test
-example: the arithmetic ability of the people of Silang, Cavite
Fixed-reference Group Scoring System

-distribution of scores obtained on the test from one group of testtakers, fixed reference group,
is used as the basis for the calculation of test scores for the future administrations of the test
-typically,
-pioneer testtakers
-if it’s necessary to change, then test developer will create new-fixed reference group
Norm-referenced Evaluation
-deriving meaning from test scores by evaluating an individual’s score with reference to a
particular norm
Criterion-referenced Evaluation
-deriving from test scores by evaluating an individual’s score with reference to a particular
standard
-also known as domain or content referenced testing and assessment and mastery test
CHAPTER 5: RELIABILITY
Reliability
-synonyms: dependability/consistency
-refers to the consistency in measurement
-refers to the proportion of the total variance attributed to true variance
-the greater the proportion of the total variance attributed to true variance, the more reliable the test
Reliability Coefficient
-an index of reliability, a proportion that indicates the ratio between the true score variance on a test
and the total variance
(variance – standard deviation squared)

True Variance – variance from true differences
Error Variance – variance from irrelevant, random sources
(true variance + error variance = total variance)
Measurement Error
-all of the factors associated with the process of measuring some variable, other than the variable
being measured
Random Measurement Error

X=T+E
X = a person’s test score (raw score)
T = a person’s stable characteristics or knowledge (true score)
E = chance events (error score)
Random Error/Noise-source of error in measuring a targeted variable caused by unpredictable

fluctuations and other inconsistencies of other variables in the measurement process
Systematic Error-source of error in measuring a variable that is typically constant or proportionate to
what is presumed to be the true value of the variable being measured
Sources of Error Variance:

● Test Construction-the extent to which the score is affected by the content sampled in the test
and by the way the content is sampled (item sampling or content sampling)
● Test Administration
-Test environment (room temperature, level of lighting, amount of ventilation and noise)
-Testtaker Variables (pressing emotional variables, physical discomfort, lack of sleep and
effect of drugs or medications)
-Examiner-related Variables (examiner’s physical appearance and demeanor – or even the
presence or absence of an examiner)
● Test Scoring and Interpretation
-not all test can be scored by computer
-scorers and scoring system are potential source of error variance
-if subjectivity is involved in scoring, then the rater can be a source of error variance
-subjectivity in scoring can even enter in behavioral
● Others:
-Social Desirability
Reliability Estimates / Models of Reliability:
1. TEST RETEST RELIABILITY ESTIMATES

-using the same instrument to measure the same thing as two points in time
-time sampling
-1 test; 1 group; 2 different administration
-measure something that is relatively stable over time such as personality
-coefficient of stability – interval between testing is 6 months
-also appropriate in reaction time and perceptual judgment
2. PARALLEL FORMS AND ALTERNATE FORMS RELIABILITY ESTIMATES

Parallel Forms
-each form of the test, the means and the variances of observed test scores are equal
Alternate Forms
-simply different versions of a test that have been constructed so as to be parallel
-developing alternate forms of tests can be time consuming and expensive
-2 tests; 1 group; 2 different administration
-coefficient of equivalence
-applicable on relatively stable traits
3. SPLIT HALF RELIABILITY ESTIMATES

-correlating two pairs of scores obtained from equivalent halves of a single test administered
once
Step 1. Divide the test into equivalent halves

Step 2. Calculate a Pearson r between scores on the two halves of the test
Step 3. Adjust the falf-test reliability using the Spearman Brown formula
Do’s
1. Randomly assign items on the halves of the test.
2. Assign odd-numbered items to one half of the test and even-numbered items to the other
half (odd-even reliability).
3. Divide the test by content.
Don’t’s
Don’t split into middle
-in general, a primary objective in splitting a test in half for the purpose of obtaining a split-half
reliability estimate is to create what might be called “mini-parallel-forms,” with each half equal
to the other – or as nearly equal as humanly possible – in format, stylistic, statistical, and
related aspects.
Spearman Brown Prophecy Formula

-allows test developer or user to estimate internal consistency reliability from a correlation of
two halves of a test
-estimate the reliability of a test that is lengthened or shortened by number of items
-usually, but not always, reliability increases as test length increases
-used to determine the number of items needed to attain a desired level of reliability
Note:
If the reliability of the original test is relatively low, it is impractical to increase the test items,
instead, you can develop a new test or creating new test items, clarifying the test instructions,
or simplifying the scoring rules.
(Other methods of estimating internal consistencies)
4. INTER-ITEM CONSISTENCY
-1 test; 1 group; 1 administration
-assessing test of homogeneity
-more homogenous, more inter-item consistency
(heterogeneity-the degree to which a test measures different factors)
Homogenous-items in a scale are unifactorial/contain items that measure a single trait

Heterogeneous-composed of items that measure more than one trait
Kuder-Richardson Formulas
-developed by G. Frederic Kuder and M.W. Richardson
-KR-20
-named KR-20 because it was the twentieth formula developed in a series
-when to use? homogenous items and dichotomous items
-note: if items are more heterogenous, KR-20 will yield lower reliability estimates than the
split-half method
-KR-21
-used for a test where the items are all about the same difficulty
Coefficient Alpha
-developed by Cronbach
-when to use? homogenous items and non-dichotomous items
-the preferred statistic for obtaining an estimate of internal consistency reliability
-values ranges from (No Similarity 0.00 – Perfectly Identical 1.00)
-note: a value of .90 or above indicates redundancy of items
Average Proportional Distance (APD)

-a relatively new measure for evaluating the internal consistency of a test
-a measure used to evaluate the internal consistency of a test that focuses on the degree of
difference that exists between item scores
5. INTER-SCORER RELIABILITY ESTIMATES

-the degree of agreement of consistency between two or more scorers (judges or raters)
with regard to a particular measure
-Cohenn’s Kappa Statistic- 2 raters
-Fleiss’ Kappa Statistic – 3 or more raters
The Nature of Test
Homogeneity-high internal consistency

Heterogeneity-low internal consistency
Dynamic-use internal consistency

Static-use test-retest or alternate forms
Restricted Range-lower reliability
Inflated Range-higher reliability
Power Test-long time limit, increasing difficulty of items

Speed Test-limited time; same difficulty
Criterion-referenced Test (designed to provide an indication of where a testtaker stands with respect
to some variable or criterion)
-statistical measures are not widely applicable
The True Score Model of Measurement and its Alternatives:
1. CLASSICAL TEST SCORE THEORY

-widely used and accepted model in the psychometric literature
-each person has a true score that would be obtained if there were no errors (from Kaplan)
-major assumption: errors of measurement are random (from Kaplan)
Basic Sampling Theory

-tell us that the distribution of random errors is bell-shaped
-center of the distribution should represent the true score, and the dispersion around the mean
of the distribution should display the distribution of sampling errors
2. DOMAIN SAMPLING THEORY

-seek to estimate the extent to which specific sources of variation under defined conditions are
contributing to the test score
-a test’s reliability is conceived of an objective measure of how precisely the test score
assesses the domain from which the test draws a sample
Domain-the universe of items that could conceivably measure that behavior, can be thought of
as a hypothetical construct: one that shares certain characteristics with (and is measured by)
the sample of items that make up the test
3. GENERALIZABILITY THEORY
-based on the idea that a person’s test scores vary from testing to testing because of variables
in the testing situation
-“Given the exact same conditions of all facets of universe, the exact same test score should
be obtained”
Generalizability Study-examines how generalizable scores from a particular test are if the
test is administered in different situations
Decision Study-developers examine the usefulness of test scores in helping the test user
make decisions
4. ITEM-RESPONSE THEORY
-the procedures provide a way to model the probability that a person with X ability will be able
to perform at a level of Y
-aka Latent-Trait Theory
CHAPTER 6: VALIDITY
VALIDITY
-used in conjunction with the meaningfulness of a test score – what the test score truly means
-estimate of how well a test measures what it purports to measure in a particular context
-no test is universally valid for all time
-the validity of a test must be proven again from time to time
Validation
-the process of gathering and evaluating evidence about validity
Validation Studies –conduct test with their own groups of testtakers

Local Validation Studies –absolutely necessary when the test user plans to alter in some way
the format, instructions, language or content of the test
Trinitarian View
-Content Validity
-Criterion-related Validity
-Construct Validity
Face Validity
-what a test appears to measure to the person being tested than to what the test actually
measures
-a judgment concerning how relevant the test items appear to be
-example: high face validity = structured personality test
-example: low face validity = projective test
-lack of face validity may result to decrease level of cooperation or motivation of testtaker and
might not get “buy in” value for test user
-face validity may be more of a public relations
ASPECTS OF VALIDITY:
Content Validity
-a judgment of how adequately a test samples behavior representative of the universe of
behavior that the test was designed to sample
Test Blueprint(aka table of specifications)-a plan regarding the types of information to be

covered by the items, the number of items tapping each area of coverage, the organization of
the items in the test and so forth
Construct Underrepresentattion –describes the failure to capture important components of a

construct
Construct-irrelevant variance –occurs when scores are influenced by factors irrelevant to the
construct
Criterion-Related Validity
-a judgment of how adequately a test score can be used to infer an individual’s most probable
standing on some measure of interest – the measure of interest being the criterion
Criterion
-the standard against which a test or test score is evaluated
-characteristics: relevant, valid, uncontaminated
criterion contamination-the term applied to a criterion measure that has been based, at least
in part, on predictor measures
Concurrent Validity
-test scores and criterion measures are obtained at about the same time
-a test with satisfactorily demonstrated concurrent validity may therefore be appealing to
prospective users because it holds out the potential of savings of money and professional time
Predictive Validity
-tells how well a certain measure can predict future behavior
Statistical Evidence:
-Validity Coefficient-a correlation that provides a measure of the relationship between test
scores and scores on the criterion measure (0.30-0.40 high validity coefficient)
-Incremental Validity-the degree to which an additional predictor explains something about

the criterion measure that is not explained by predictors already in use
Construct Validity
-a judgment about the appropriateness of inferences drawn from test scores regarding individual
standings on a variable called a construct
-“umbrella validity”
-“viewed as the unifying concept for all validity evidence”
Construct
-an informed, scientific idea developed or hypothesized to describe or explain behavior
-unobserved , presupposed (underlying) traits that a test developer may invoke to describe test
behavior or criterion performance
-example: intelligence, self-esteem, motivation
Evidence of Construct Validity:

1. The test is homogeneous, measuring a single construct
2. Test scores increase or decrease as a function of age, the passage of time or an
experimental manipulation as theoretically predicted
3. Test scores obtained after some event or the mere passage of time (or posttest scores)
differ from pretest scores as theoretically predicted
4. Test scores obtained by people from distinct groups vary as predicted by the theory
5. Test scores correlate with scores on other tests in accordance with what would be
predicted from a theory that covers the manifestation of the construct in question
Convergent Evidence/Validity
-correlations with tests purporting to measure an identical construct but also from
correlations with measures purporting to measure related constructs
Discriminant Evidence/Validity
-little (a statistically insignificant) relationship between test scores and/or other variables
with which scores on the test being construct-validated should not theoretically be
correlated
Multitrait-Multimethod Matrix
-a useful technique for examining both convergent and discriminant validity
-two or more traits + two or more methods
Factor Analysis
-also helplful in obtaining convergent and discriminant validity
-class of mathematical procedures designed to identify factors or specific variables that are
typically attributes, characteristics, or dimensions on which people may differ
-Exploratory Factor Analysis

-“estimating or extracting factors; deciding how many factors to retain; and rotating factors to
an interpretable orientation”
-Confirmatory Factor Analysis

-researchers test the degree to which a hypothetical model (which includes factors) fits the
actual data
-Factor Loading
-the extent to which the factor determines the test score or scores
VALIDITY BIAS AND FAIRNESS
Test Bias
-Bias- a factor inherent in a test that systematically prevents accurate, impartial measurement
Intercept Bias-when a test systematically under-predicts or over-predicts the

performance of a particular group
Slope Bias-when the slope of one group’s regression line differs significantly from others
Rating Error
-Rating-a numerical or verbal judgment (or both) that places a person or an attribute along a
continuum
-Rating Scale-a scale of numerical or word descriptors
-Rating Error-a judgment resulting from the intentional or unintentional misuse of a rating
scale
Severity Error
Central Tendency Error
Generosity and Leniency Error
(Note: To avoid this to happen, it is advisable to use Rankings)
-Halo Affect-for some raters, some raters can do no wrong

-a tendency to give a particular rate a higher rating that he/she objectively deserves because of
the rater’s failure to discriminate among conceptually distinct and potentially independent
aspects of a ratee’s behavior
Test Fairness-extent to which a test is used in an impartial, just and equitable way
CHAPTER 7: UTILITY
Utility
-usefulness or practical value of testing to improve efficiency
-also used to refer to the usefulness or practical value of a training program or intervention
Factors that affect a test’s utility:

1. Psychometric Soundness
-validity sets ceiling on utility
-a test must be valid to be useful
2. Cost
-refers to disadvantages, losses or expenses in both economic and noneconomic terms
-economic, financial or budget-related in nature must certainly be taken into account
3. Benefits
-refers to profits, gains or advantages
Utility Analysis
-a family of techniques that entail a cost-benefit analysis designed to yield information relevant
to a decision about the usefulness and/or practical value of a tool of assessment
General Approaches in Utility Analysis:

1. Expectancy Data
● Expectancy Table/Chart-provide an indication of likelihood that a testtaker will

score within some interval of scores on a criterion measure – an interval may be
categorized as “passing”, “acceptable” or “failing”
● Taylor-Russell Tables-estimate of the percentage of employees hired by a
particular test who will be successful to their jobs
● Naylor-Shine Tables-used for obtaining the difference between the means of the
selected and unselected groups to derive an index of what the test is adding to
already established procedure
2. Brodgen-Cronbach-Gleser Formula
-a formula used to calculate the dollar amount of a utility gain resulting from the use of a
particular selection instrument under specified conditions
Utility gain-an estimate of the benefit (monetary/otherwise) of using a particular test or

selection method
3. Decision Theory
-a body of methods used to quantitatively evaluate selection procedures, diagnostic
classifications, therapeutic interventions or other assessment or intervention-related
procedures in terms of how optimal they are (most typically from a cost-benefit
perspective)
Hits and Misses
1. Hit
-definition: a correct classification
-example: a qualified driver is hired; an unqualified driver is not hired
2. Miss
-definition: an incorrect classification; a mistake
-example: a qualified driver is not hired; an unqualified driver is hired
3. Hit Rate
-definition: the proportion of people that an assessment tool accurately identified as
possessing a particular variable
-example: the proportion of qualified drivers with a passing score who actually gain
permanent employee status; the proportion of unqualified drivers with a failing score who
did not gain permanent status
4. Miss Rate
-definition: the proportion of people that an assessment tool inaccurately identified as
possessing a particular variable
-example: the proportion of drivers whom inaccurately predicted to be qualified; the
proportion of drivers whom inaccurately predicted to be unqualified
5. False Positive
-definition: falsely indicates that the testtaker possesses a particular variable
-example: a driver who is hired is not qualified
6. False Negative
-definition: falsely indicates that the testtaker does not possess a particular variable
-example: the assessment tool says to not hire but driver would have been rated as
qualified
Some Practical Considerations:

1. The Pool of Job Applicants
-the issue of how many people would actually accept the employment position offer to
them even if they were found to be qualified candidate
-many of the top performers on the test are people who are also being offered positions
by one or more other potential employers
2. The Complexity of the Job

-the more complex the job, the more people differ in how well or poorly they do that job
(Hunter et. al.)
3. The Cut Score in Use

-Cut Score/Cutoff Score
-a (usually numerical) reference point derived as a result of a judgment and used to divide
a set of data into two or more classifications, with some action to be taken or some
inference to be made on the basis of these classifications
-relative cut score-a reference point – in a distribution of test scores used to divide a set
of data into two or more classifications – that is a set based on norm-related
considerations rather than on the relationship of test scores to a criterion
-aka norm-referenced cut score
-normative
-fixed cut score-a reference point – in a distribution of test scored used to divide a set of
data into two or more classifications – that is typically set with reference to a judgment
concerning a minimum level of proficiency required to be included in a particular
classification
-aka absolute cut score
-criterion
-Multiple Cut Scores

-the use of two or more cut scores with reference to one predictor for the purpose of
categorizing testtakers
-Multiple-stage or Multi Hurdle

-the achievement of a particular cut score on one test is necessary in order to advance to
the next stage of evaluation in the selection process
-Compensatory Model of Selection

-a model of applicant selection based on the assumption that high scores on one attribute
can balance out low scores on another attribute
Methods of Setting Cut Scores:

1. Angoff Method
-devised by William Angoff
-a way to set fixed cut scores that entails averaging the judgments of experts
-must have high inter-rater reliability
2. Know Groups Method/Method of Contrasting Groups

-a system of collecting data on a predictor of interest from groups known to possess (and
not to possess) a trait, attribute or ability of interest
-a cut score is set on the test that best discriminates the high performance from low
performers
3. IRT-Based Methods
-in order to “pass” the test, the testtaker must answer items that are considered that has
some minimum level of difficulty, which is determined by the experts and serves as the
cut score
4. Other Methods:
Method of Predictive Yield
-a technique for identifying cut scores based on the number of positions to be filled
Discriminant Analysis
-a family of statistical techniques used to shed light on the relationship between certain
variables and two or more naturally occurring groups
CHAPTER 8: TEST DEVELOPMENT
Step 1: Test Conceptualization

-Conception of idea by the test developer
-norm-referenced-conceptualization of items based on testtakers norm
-criterion-referenced-conceptualization is on the construct that is need to master
Pilot Work-prototype of the test

-necessary for research reason; but not required for teacher-made test
-done to evaluate the items, which is really needed to put up in the actual test
-determined in the pilot testing – the best way to measure the construct
-note: there’s an instance that a test which is already good for construction, might need further
pilot research
Step 2: Test Construction
Scaling
-the process of setting rules for assigning numbers in measurement
-scaling methods:
a. Likert Scales
-a type of summative rating scale
-five alternative responses (sometimes seven)
-ordinal in nature
b. Paired Comparison
-scaling method whereby one of a pair of stimuli (such as photos) is selected according to a rule
(such as “select the one that is more appealing”)
c. Guttman Scale/Scalogram Analysis

-named for its developer; a scale wherein items range sequentially from weaker to stronger
expressions of the attitude or belief being measured
d. Thurstone’s Equal Appearing Intervals Method

-presumed to be interval in nature
-scaling systems:
a. comparative scaling (best to worst)
b. categorical scaling (section1, section 2, section 3)
Writing Items
-When devising a standardized test using a multiple-choice format, it is usually advisable that
the first draft contain approximately twice the number of items that the final version of the test
will contain.
-item pool-the reservoir or well from which items will or will not be drawn for the final version
of the test; the collection of items to be further evaluated for possible selection for use in an
item bank
Item Format
-the form, plan, structure, arrangement and layout of individual test items
1. Selected-Response Format
-a form of test item requiring testtakers to select a response
A. Multiple-Choice Format
-has 3 elements: stem, correct alternative/option, distractors/foils
-criteria of good multiple-choice:
-has one correct alternative
-has grammatically parallel alternatives
-has alternatives of similar length
-has alternatives that fit grammatically with the stem
-includes as much of the item as possible in the stem to avoid unnecessary
repetition
-avoids ridiculous distractors
-not excessively long
B. Matching-item
-a testtaker is presented with two columns: premises and responses, and must
determine which response is best associated with which premise
-testtaker could get perfect score even if he did not actually know all the answers
-to minimize the possibility, provide more options or state in the directions that each
response may be a correct answer once, more that once or not at all
C. Binary-Choice Items / True or False

-a multiple-choice item that contains only two possible responses
-criteria of a good binary-choice:
-contains a single idea
-not excessively long
-not subject to debate
-the correct response is definitely be one of the choices
2. Constructed-Response Items
-a form of test item requiring the testtaker to construct or create a response
A. Completion or Short Answer (Fill in the Blacks)

-requires the examinee to provide a word or phrase that completes a sentence
B. Essay
-is useful when the test developer wants the examinee to demonstrate a depth of
knowledge about a single topic
-allows for the creative integration and expression of the material in the testtaker’s
own words
-the main problem in essay is the subjectivity in scoring
Writing Items for Computer Administration:
Item Bank
-a collection of questions to be used in the construction of tests computer test
administration
Item Branching
-in computerized adaptive testing, the individualized presentation of test items drawn
from an item bank based on the testtaker’s previous responses
Computer Adaptive Testing reduces the:

-Floor Effect-a phenomenon arising from the diminished utility of a tool of assessment in
distinguishing testtakers at the low end of the ability, trait or other attribute being
measured (very low scored due to very hard questions)
-Ceiling Effect-the diminished utility of an assessment tool for distinguishing testtakers
at the high end of the ability, trait, or other attribute being measured (very high scored
due to very easy questions)
Scoring Items
1. Cumulative Model-a method of scoring whereby points or scores accumulated on
individual items or subtests are tallied and then, the higher the total sum, the higher the
individual is presumed to be on the ability, trait, or other characteristic being measured
(Example: High IQ Score > more intelligent)
2. Class or Category Scoring-a method of evaluation in which test responses earn credit
toward placement in a particular class or category with other testtakers. Sometimes
testtakers must meet a set number of responses corresponding to a particular criterion in
order to be placed in a specific category or class
(Examples: GPA of 1.50 and above will be placed on Star Section; GPA of 2 and below will
be placed on Lower Section)
3. Ipsative Scoring-an approach to test scoring and interpretation wherein the testtaker’s
responses and the presumed strength of a measured trait are interpreted relative to the
measured strength of other traits for that testtaker / forced to answer
(Example: High Score in Extraversion; Low in Agreeableness)
Step 3: Test Tryout

-The test should be tried out on people who are similar in critical respects to the people for
whom the test was designed
-The test tryout should be executed under conditions as identical as possible to the conditions
under which the standardized test will be administered
What is a good item?

-reliable and valid
-helps discriminate testtakers
-if:
high scorers – incorrect = bad item
low scorers – correct = bad item
high scorers – correct = correct item
low scorers – incorrect = correct item
Step 4: Item Analysis

-Statistical procedures used to analyze items
Item Difficulty Index

-In achievement or ability testing and other contexts in which responses are keyed correct, a
statistic indicating how many testtakers responded correctly to an item
-In contexts where the nature of the test is such that responses are not keyed correct, this same
statistic may be referred to as an item-endorsement index
Formula:
# of testtakers who answered correctly
_______________________________
Total # of testtakers
1.0 = no one got the correct answer

2.0 = everyone is correct
Level of Difficulty:
1.0 to 0.20 – very difficult
0.21 to 0.40 – difficult
0.41 to 0.60 – average
0.61 to 0.80 – easy
0.81 to 1.00 – very easy
Standards:
0.50 – optimal average item difficulty (whole test)
0.30 to 0.80 – average item difficulty on individual items
0.75 – true or false
0.625 – multiple choice (4 choices)
Item Reliability Index

-provides an indication of internal consistency of a test
Item Validity Index

-provides an indication of the degree to which a test is measuring what it purports to measure
-higher value; greater test’s criterion-related validity
Item Discrimination Index

-indicate how adequately an item separates or discriminates between high scorers and low
scorers on an entire test
-(+) value = high scorers answer item correctly
-(-) value = low scorers answer item correctly then high scorers
Formula:
UL – LL
____________
# of testtakers
Item-Characteristic Index
-graphic representation of item difficulty and item discrimination
-the steeper the slope, the greater item discrimination
-easy item – lean on left
-difficult item – lean on right
Qualitative Item Analysis

-rely primarily on verbal
-non-statistical procedures designed to explore how an individual test items work
-Think Aloud Test Administration-a method of qualitative item analysis requiring examinees
to verbalize their thoughts as they take a test; useful in understanding how individual items
function in a test and how testtakers interpret or misinterpret the meaning of individual items
-Sensitivity Review-a study of test items, usually during test development, in which items are
examined for fairness to all prospective testtakers and for the presence of offensive language,
stereotypes or situations
Step 5: Test Revision

-(as a stage in new test development): polishing and finishing touches
-(in the life cycle of an existing test): no hard-and-test rule exist when to revise a test but it
should be revised when significant changes in the domain represented, or new conditions of test
use and interpretations, make the test inappropriate for its intended use
Cross Validation-a revalidation on a sample of testtakers other than the testtakers on whom
test performance was originally found to be a valid predictor of some criterion
Co-Validation-the test validation process conducted on two or more tests using the same
sample of testtakers; when used in conjunction with the creation of norms or the revision of
existing norms; this process may also be referred to as co-norming
CHAPTER 9: INTELLIGENCE AND MEASUREMENT
INTELLIGENCE
-a multifaceted capacity that manifests itself in difference ways across the life
span
Intelligence defined: Views of the Lay Public
Research by STENBERG (1981)

-In general, the researchers found a surprising degree of similarity between
the experts’ and laypeople’s conception of intelligence
-However, in terms of academic intelligence
-Experts put emphasis on motivation, while laypeople stressed the importance
of social aspects
Research by SIEGLER and RICHARDS (1980)

-There’s a different conceptions of intelligence as a function of developmental
stage
Research by YUSSEN and KANE (1980)

-Suggested that children also have notions about intelligence as early as first
grade
Intelligence defined: Views of Scholars and Test Professionals
1. FRANCIS GALTON
-first person to published on the heritability of intelligence, thus framing the
contemporary nature-nurture debate
-he believed that the most intelligent persons were those equipped with the
best sensory abilities
-attempted to measure this sort of intelligence in many of the sensorimotor
and other perception-related tests he devised
INTERACTIONISM
(Heredity + Environment = Intelligence)
2. ALFRED BINET
-components of intelligence: reasoning, judgment, memory and abstraction
-more complex measure of intelligence
3. DAVID WECHSLER
-intelligence as “aggressive” or “global” capacity
-considered other factors (traits and personality) in assessing intelligence
-at first, he proposed two qualitatively abilities: Verbal and Performance
-then, he added other factors: Verbal Comprehension, Working Memory,
Perceptual Organization, Processing Speed
4. JEAN PIAGET
-intelligence is evolving biological adaptations to the outside world
-focused on the development of cognition in children
-schema (or schemata) -an organized action or mental structure that when
applied to the world, leads to knowing and understanding
-the basic mental operations:
-Assimilation -actively organizing new information so that it fits in what
already perceived and thought
-Accommodation -changing what is already perceived or thought so that it
fits with the new information
-Disequilibrium -causes the individual to discover new information perceptions
and communication skills
FACTOR ANALYSIS THEORIES
5. CHARLES SPEARMAN
-Theory of General Intelligence / Two-Factor Theory of Intelligence
-(g) - general intellectual ability
-(s) - specific components
-(e) - error components
-The greater the magnitude of g in a test of intelligence, the better the test
was thought to predict overall intelligence
-g factor is based on some type of general electrochemical mental energy
available to the brain for problem solving
-Abstract Reasoning were thought to be the best measures of g in formal test
-Group Factors - an intermediate class of factors common to a group of
activities but not at all
Ex: Linguistic, Mechanical, Arithmetical
6. JOY PAUL GUILFORD

-Intelligence is a systematic collection of abilities or functions for the
processing of information of different kinds in various ways
-de-emphasized (g)
-research on US Army Air Corps during the War, and he was able to identify
25 important mental ability factors
-Structure of Intellect Model (SI Model)
7. LOUIS LEON THURNSTONE

-intelligence is considered as mental trait. It is the capacity for abstraction,
which is inhibitory process
-seven primary abilities
-word fluency
-verbal comprehension
-spatial visualization
-number facility
-associative memory
-reasoning
-perceptual speed
8. HOWARD GARDNER
-intelligence is the ability to solve problems or to create products, that are
valued within one or more cultural settings
-theory of multiple intelligence:
-logical-mathematical
-bodily-kinesthetic
-linguistic
-musical
-spatial
-interpersonal
-intrapersonal
9. RAYMOND CATTELL
-two major types of cognitive abilities:
-Crystallized Intelligence (Gc)
-acquired skills and knowledge that are dependent on exposure to a
particular culture as well as on formal and informal evaluation
(Example: Vocabulary)
-Fluid Intelligence (Gf)

-nonverbal, relatively culture-free and independent of specific instruction
(Example: Encoding of Short Term Memory)
10. JOHN HORN

-Addition of several factors to his mentor’s, Raymond Cattel, work
-Gv - Visual Processing
-Ga - Auditory Processing
-Gq - Quantitative Processing
-Gs - Speed Processing
-Grw - Reading and Writing
-Gsm - Short Term Memory
-Glr - Long Term Storage and Retrieval
11. JOHN CARROLL

-Three Stratum Model of Human Cognitive Abilities
-Stratum III -the general level/general intellectual ability
-Stratum II -the broad level; 8 factors
-Stratum I -the specific level; more specific factors
12. MCGREW AND FLANAGAN

-Cattel-Horn-Carroll Models (CHC)
-10 Broad Stratum
-Over 70 narrow stratum
INFORMATION-PROCESSING VIEW
13. ALEKSANDR LURIA

-Information-Processing Approach -focuses on the mechanisms by which
information is processed -”how it is processes and what is being processed”
-two basic types:
-simultaneous (parallel)
-information is integrated at all time
-successive (sequential)
-each bit of information is individually processed in sequence
-Kaufman Assessment Battery for Children 2nd Edition rely heavily on this
concept
14. ROBERT STERNBERG

-Triarchic Theory of Intelligence
-Metacomponents -planning, monitoring, evaluating
-Performance Components -performing the instructions of
metacomponents
-Knowledge Acquisition -learning something new
15. Others
-PASS Model
-Planning -strategy development for problem solving
-Attention/Arousal -receptivity to information
-Simultaneous and Successive -the type of information processing
employed
Measuring Intelligence
Some tasks used to measure intelligence

A. Infancy (Birth to 18 months)
-measuring sensorimotor development
-techniques:
-testing alerting response
-assessing responsiveness
-focusing a light on the eyes of the infant
-testing orienting response
-assessing the ability in turning in direction of stimulus
-ringing of bell
B. Child
-measuring of verbal and performance abilities
C. Adult
-according to Wechsler, abilities such as retention of general
information, quantitative reasoning
-expressive language and memory, and social judgment
-obtain during clinical evaluation or corporate assessment
Some tests used to measure intelligence
Stanford-Binet Intelligence Scales: Fifth Edition (SB5)
1st Edition
-The first published intelligence test to provide organized and detailed
administration and scoring instructions
-The first American test to employ the concept of IQ. And it was the first test to
introduce the concept of an alternate item, an item to be substituted for a
regular item under specified conditions
-Criticism: lack of representativeness of the standardization sample
Revisions:
1937
-Included the development of two equivalent forms, labeled L (for Lewis) and
M (for Maud)
-New types of tasks for use with preschool-level and adult-level testtakers
-Adequate standardization sample
-Criticism: lack of representation of minority groups during the test’s
development
1960
-consisted of only a single form (labeled L-M) and included the items
considered to be the best from the two forms of the 1937 test, with no new
items added tot he test
-the use of the deviation IQ tables in place of the ratio IQ tables
1972
-the quality of the standardization sample was criticized
-norms may also have overrepresented the West, as well as large urban
communities
4th Edition - Stanford-Binet: Fourth Edition (1986)

-previous versions used age scale, but the 4th edition uses Point scale
-Point Scale -a test organized into subtests by category of item, not be
age at which most testtakers are presumed
-Test Composite -a test score or index derived from the combination of,
and/or a mathematical transformation of, one or more subtest scores
5th Edition - SB5 (2003)

-designed for administration to assessees as young as 2 and as old as 85 (or
older)
-yields a number of composite scores, including a Full Scale IQ derived from
the administration of ten subtests
-subtest scores (mean = 10; sd = 3)
-composite scores (mean = 100, sd = 15)
-In addition, the test yields five Factor Index scores corresponding to each of
the five factors that the test is presumed to measure
-It was based on CATTELL-HORN-CARROLL Theory of intellectual abilities
SB5 Factor Name
Fluid Reasoning (FR)

-novel problem solving, understanding of relationships that are not culturally
bound
Knowledge (KN)
-skills and knowledge acquired by formal and informal education
Quantitative Reasoning (QR)

-knowledge of mathematical thinking including number concepts, estimation,
problem-solving and measurement
Visual Spatial Processing (VS)

-ability to see patterns and relationships and spatial orientation as well as the
gestalt among diverse visual stimuli
Working Memory (WM)

-cognitive process of temporarily storing and then transforming or sorting
information in memory
-Routing Test
-A task used to direct or route the examinee to a particular level of
questions
-Direct an examinee to test items that have a high probability of being at
an optimal level of difficulty
-Teaching items
-designed to illustrate the task required and assure the examiner that the
examinee understands
-Floor -lowest level of the items on a subtest

-Ceiling -highest level of the items on a subtest
-Basal Level -A stage in a test achieved by a testtaker by meeting some

preset criterion to continue to be tested for example, responding correctly to
two consecutive items on an ability test that contains increasingly difficult
items may establish a “base” from which to continue testing
-Ceiling Level -A stage in a test achieved by a testtaker as a result of meeting

some preset criterion to discontinue testing for example, responding
incorrectly to two consecutive items on an ability test that contains
increasingly difficult items may establish a presumed “ceiling” on the
testtaker’s ability
-Testing the Limit -A procedure that involves administering test items beyond
the level at which the test manual dictates discontinuance
-SB5 has a test administration protocol that could be characterized as

adaptive in nature
-Extra-test Behavior -Observations made by an examiner regarding what the

examinee does and how the examinee reacts during the course of testing
Measured IQ Range Category

145-160 Very gifted or highly advanced
130-144 Gifted or very advanced
120-129 Superior
110-119 High average
90-109 Average
80-89 Low Average
70-79 Borderline impaired or delayed
55-69 Mildly impaired or delayed
40-54 Moderately impaired or delayed
The Wechler Tests
Wechsler-Bellevue (W-B1) or Wechsler-Bellevue (W-B) 1939

-Point Scale
-Items were classified by subtest
-Organized into six verbal subtests and five performance subtests and five
performance subtests, and all the items in each test were arranged in order of
increasing difficulty
-Wechsler-Bellevue 2 (W-B 2) - 1942; an alternative form

-Criticisms:
-The standardization sample was rather restricted
-Some subtests lacked sufficient inter-item reliability
-Some of the subtests were made up of items that were too easy
-The scoring criteria for certain items were too ambiguous
Wechsler Adult Intelligence Scale (WAIS) - 1955

-organized into Verbal and Performance scales
-Scoring yielded a Verbal IQ, a performance IQ, and a Full Scale IQ
-WAIS-R (1981)
-new norms and materials
-alternate administration of verbal and performance tests
-WAIS-III (1997)
-contained updated and more user-friendly materials
-test materials were made physically larger to facilitate viewing by older adults
-some items were added to each of the subtests that extended the test’s floor
in order to make the test more useful for evaluating people with extreme
intellectual deficits
-extensive research was designed to detect and eliminate items that may
have contained cultural bias
-norms were expanded to include testtakers in the age range 74-89
-yielded a full scale (composite) IQ as well as four Index Scores - Verbal
Comprehension, Perceptual Organization, Working Memory, and Processing
Speed -used for more in-depth interpretation of findings
-WAIS-IV (2008)
-It is made up of subtests that are designated either as core or supplemental
-Core subtest is one that is administered to obtain a composite score
-Supplemental Subtest is used for purposes such as providing additional
clinical information or extedning the number of abilities or processes
sampled
-Intended for use with individuals ages 16 to 90 years and 11 months
-contains ten core subtests (Block Design, Similarities, Digit Span, Matrix
Reasoning, Vocabulary, Arithmetic, Symbol Search, Visual Puzzles,
Information and Coding)
-and five supplemental subtests (Letter-Number Sequencing, Figure Weights,
Comprehension, Cancellation and Picture Completion)
-more explicit administration instructions as well as the expanded use of
demonstration and sample items - this in an effort to provide assessees with
practice in doing what is required, in addition to feedback on their
performance
-all of the test items were thoroughly reviewed to root out any possible cultural
bias
-Floor = 40, Ceiling = 160
Wechsler Intelligence Scale for Children (WISC)

-1st edition 1949
-currently in its 5th edition
-WISC-V (2014)
-ages 6 years old to 16 years and 11 months
-FSIQ, Primary Index Scores and Ancillary Index Scores
-21 subtests; 15 composite scores
-completion time: 60 minutes
Wechsler Pre-School and Primary Scale of Intelligence (WPPSI)

-1st edition 1967
-WPPSI (2012)
-ages 2 years and 6 months up to 7 years and 7 months
-completion time:
-ages 2:6 to 3:11 = 30-45 minutes
-ages 4:0 to 7:7 = 45-60 minutes
Short Forms of Intelligence Test

-Short form refers to a test that has been abbreviated in length, typically to
reduce the time needed for test administration, scoring and interpretation
-In 1958, David Wechsler endorsed the use of short forms but only for
screening purposes. Years later, perhaps in response to the potential for
abuse of short forms, he took a much dimmer view of reducing the number of
subtests just to save time
-From a psychometric standpoint, the validity of a test is affected by and is
somewhat dependent on the test’s reliability. Changes in a test that lessen its
reliability may also lessen its validity
-Wechsler Abbreviated Scaled of Intelligence (WASI) 1999

-designed to answer the need for a short instrument to screen intellectual
ability in testtakers from 6 to 89 years of age
-the test comes in a two-subtest form (consisting of Vocabulary and Block
Design) that takes about 15 minutes to administer and a four-subtest form that
takes about 30 minutes to administer
-WASI-2 2011
-making the test materials more user friendly, and increasing the psychometric
soundness of the test
Group Tests of Intelligence
-1917 World War 1

-Army Alpha Test -administered to Army recruits who could read. It
contained tasks such as general information questions, analogies, and
scrambled sentences to reassemble
-Army Beta Test -designed for administration to foreign-born recruits with
poor knowledge of English or to illiterate recruits. It contained tasks
such as mazes, coding and picture completion
-An original objective of the Alpha and Beta tests was to measure the
ability to be a good soldier. However, after the war, that objective seemed to
get lost in the shuffle as the tests were used in various aspects of civilian life
to measure general intelligence. An Army or Beta test was much easier to
obtain, administer and interpret than a Stanford-Binet test, and it was
also much cheaper
-World War 2
-Army General Classification Test (AGCT) -administered to more than 12
million recruits
-Today
-group tests are still administered to prospective recruits, primarily for
screening purposes
-Screening tool -an instrument or procedure used to identify a particular
trait or constellation of traits at a gross or imprecise level
-Group Test in School Setting

-Group intelligence test results provide school personnel with valuable
information for instruction-related activities and increased understanding of
the individual pupil
-Group intelligence tests in the schools are used in special forms as early as
the kidergarten level. The tests are administered to groups of 10 to 15
children, each of whom receives a test booklet that includes printed pictures
and diagrams. For th most part, simple motor responses are required to
answer items. Oversized alternatives in the form of pictures in a
multiple-choice might appear on the pages, and it is the child’s job to circle or
place an X on the picture that represents the correct answer to the item
presented orally by the examiner. During such testing in small groups, the
testtakers will be carefully monitored to make certain they are following the
directions
-Some group intelligence test for school settings:
-California Test of Mental Maturity
-Kuhlmann-Anderson Intelligence Test
-Henmon-Nelson Tests of Mental Ability
-Cognitive Abilities Test
-Otis-Lennon School Ability Tests (OLSAT) (formerly Otis-Lennon Mental
Ability Tests - OLMAT)
-the first group intelligence test to be used in US schools
-designed to measure abstract thinking and reasoning ability and to assist in
school evaluation and placement decision-making
-a multiple choice test commonly used in the US to identify gifted children
-completion time: max 75 minutes
-age range: k to 12
-in general, group tests are useful screening tools when large numbers of
examinees must be evaluated either simultaneously or within a limited time
frame
Other Measures of Intellectual Abilities

-Cognitive Styles
-a psychological dimension that characterizes the consistency with which one
acquires and processes information
-Examples:
-Field Dependence vs Field Independence
-Reflection vs Impulsivity
-Visualizer vs Verbalizer
-Meaures of Creativity:
-Originality -the ability to produce something that is innovative or nonobvious
-Fluency -the ease with which responsesare reproduced and is usually
measured by the total number of responses produced
-Flexibility -the variety of ideas presented and the ability to shift from one
approach to another
-Elaboration -the richness of detail in a verbal explanation or pictorial display
-A criticism frequently leveled at group standardized intelligence tests (as well
as at other ability and achievement tests) is that evaluation of test
performance is too heavily focused on whether the answer is correct
-The heavy emphasis on correct response leaves little room for the evaluation
of processes such as originality, fluency, flexibility and elaboration
-Convergent thinking
-a deductive reasoning process taht entails recall and consideration of facts
as well as a series of logical judgments to narrow down solutions and
eventually arrive at one solution
-Divergent thinking
-a reasoning process in which thought is free to move in many different
directions, making several solutions possible
-requires flexibility of thought, originality, and imagination
-Remote Associates Test (RAT)

-developed by Sarnoff Mednick in the 1960s
-presents the testtaker with three words; the task is to find a fourth word
associated with the other three
-a test used to measure creative convergent thinking
-a possible weakness of this test is its focus on verbal associative habits -
meaning it might be more difficult for non-native speakers of English. Also, it
may not favor those who are more comfortable with visual thinking
-Torrance Test of Creative Thinking (TTCT)

-developed by E. Paul Torrance in 1960s
-consist of word-based, picture-based and sound-based test materials
-each subtest is designed to measure various characteristics deemed
important in the process of creative thought
-It is interesting that many tests of creativity do not fare well when evaluated
by traditional psychometric procedures
Intelligence: Some Issues
Nature vs Nurture
-Preformationism
-all living organisms are preformed at birth
-all of the organism’s structures, including intelligence, are preformed at birth
and therefore cannot be improved
-it is like a cocoon turned into butterfly
-Predeterminism
-one’s abilities are pre-determined by genetic inheritance and that no amount
of learning or other intervention can enhance what has been genetically
encoded to unfold time
-Arnold Gesell
-”training does not transcend maturation”
-mental development as a progressive morphogenesis of pattern of behavior
-behavior patterns are predetermined by “innate process growth”
-Francis Galton
-believed that genius was hereditary
-Richard Dugdale
-argued that degeneracy (being immoral) was also inherited
-Henry Goddard
-role of hereditary in feeblemindedness
-feeblemindedness is the product of recessive gene
-Lewis Terman
-the father of the American version of Binet’s test
-based on his testing he concluded that Mexican and Native American are
inferior
-Karl Pearson
-”Jews are somewhat inferior physiologically and mentally”
-Wendy Johnson
-VPR Model -strong genetic influence on mental ability
-In general, the proponents of the nurture side of nature-nurture controversy

emphasize the crucial importance and post-natal environment, socioeconomic
status, educational opportunities and parental modelling with respect to
intellectual development
-Interactionist View
-we are free to become all that we can be
The Stability of Intelligence:

-Intelligence does not seem to be stable for much of one’s adult life
-Full scale IQ may seem to remain the same over time, although the individual
abilities assessed may change significantly
-Verbal Intellectual skills to be highly stable over time
-Young adulthood intelligence is the most suitable determinant of cognitive
performance
-Terman, suggested that gifted children tended to maintain their superior
intellectual ability
-In contrast, Winner (2000) writes that child prodigies may become “frozen
into expertise”
The Construct Validity of Tests of Intelligence

-the evaluation of a test’s construct validity proceeds on the assumption that
one knows in advance exactly what the test is supposed to measure
-It is essential to understand how the test developer defined intelligence
Other issues:
-Flynn effect
-intelligence inflation/10 years
-Personality
-Street efficacy -perceived ability to avoid violent confrontations and to be safe
in one’s neighborhood
-Gender
-males have the edge when it comes to g factor in intelligence especially
when only the highest-scoring group on the ability test is considered
-males also tend to outperform females on tasks requiring visual spatialization
-girls may general outperform on language-skill related task, although
differences may be minimized when assessment is conducted by computer
-Family Environment
-divorce may have significant consequences in the life of child ranging from
impaired school achievement to impaired social problem solving ability
-Culture
-Culture loading -a test incorporates the vocabulary, concepts, traditions,
knowledge, and feelings associated with a particular culture
-Culture-Fair Intelligence Test
-designed to minimize the influence of culture with regard to various aspects
of the evaluation procedures
CHAPTER 10: TESTS OF INTELLLIGENCE
Measuring Intelligence
Some tasks used to measure intelligence

A. Infancy (Birth to 18 months)
-measuring sensorimotor development
-techniques:
-testing alerting response
-assessing responsiveness
-focusing a light on the eyes of the infant
-testing orienting response
-assessing the ability in turning in direction of stimulus
-ringing of bell
B. Child
-measuring of verbal and performance abilities
C. Adult
-according to Wechsler, abilities such as retention of general information,
quantitative reasoning
-expressive language and memory, and social judgment
-obtain during clinical evaluation or corporate assessment
Some tests used to measure intelligence
Stanford-Binet Intelligence Scales: Fifth Edition (SB5)
1st Edition
-The first published intelligence test to provide organized and detailed administration
and scoring instructions
-The first American test to employ the concept of IQ. And it was the first test to introduce
the concept of an alternate item, an item to be substituted for a regular item under
specified conditions
-Criticism: lack of representativeness of the standardization sample
Revisions:
1937
-Included the development of two equivalent forms, labeled L (for Lewis) and M (for
Maud)
-New types of tasks for use with preschool-level and adult-level testtakers
-Adequate standardization sample
-Criticism: lack of representation of minority groups during the test’s development
1960
-consisted of only a single form (labeled L-M) and included the items considered to be
the best from the two forms of the 1937 test, with no new items added tot he test
-the use of the deviation IQ tables in place of the ratio IQ tables
1972
-the quality of the standardization sample was criticized
-norms may also have overrepresented the West, as well as large urban communities
4th Edition - Stanford-Binet: Fourth Edition (1986)

-previous versions used age scale, but the 4th edition uses Point scale
-Point Scale -a test organized into subtests by category of item, not be age at
which most testtakers are presumed
-Test Composite -a test score or index derived from the combination of, and/or
a mathematical transformation of, one or more subtest scores
5th Edition - SB5 (2003)

-designed for administration to assessees as young as 2 and as old as 85 (or older)
-yields a number of composite scores, including a Full Scale IQ derived from the
administration of ten subtests
-subtest scores (mean = 10; sd = 3)
-composite scores (mean = 100, sd = 15)
-In addition, the test yields five Factor Index scores corresponding to each of the five
factors that the test is presumed to measure
-It was based on CATTELL-HORN-CARROLL Theory of intellectual abilities
SB5 Factor Name
Fluid Reasoning (FR)

-novel problem solving, understanding of relationships that are not culturally bound
Knowledge (KN)
-skills and knowledge acquired by formal and informal education
Quantitative Reasoning (QR)

-knowledge of mathematical thinking including number concepts, estimation,
problem-solving and measurement
Visual Spatial Processing (VS)

-ability to see patterns and relationships and spatial orientation as well as the gestalt
among diverse visual stimuli
Working Memory (WM)

-cognitive process of temporarily storing and then transforming or sorting information in
memory
-Routing Test
-A task used to direct or route the examinee to a particular level of questions
-Direct an examinee to test items that have a high probability of being at an
optimal level of difficulty
-Teaching items
-designed to illustrate the task required and assure the examiner that the examinee
understands
-Floor -lowest level of the items on a subtest

-Ceiling -highest level of the items on a subtest
-Basal Level -A stage in a test achieved by a testtaker by meeting some preset criterion
to continue to be tested for example, responding correctly to two consecutive items on
an ability test that contains increasingly difficult items may establish a “base” from which
to continue testing
-Ceiling Level -A stage in a test achieved by a testtaker as a result of meeting some

preset criterion to discontinue testing for example, responding incorrectly to two
consecutive items on an ability test that contains increasingly difficult items may
establish a presumed “ceiling” on the testtaker’s ability
-Testing the Limit -A procedure that involves administering test items beyond the level at
which the test manual dictates discontinuance
-SB5 has a test administration protocol that could be characterized as adaptive in

nature
-Extra-test Behavior -Observations made by an examiner regarding what the examinee

does and how the examinee reacts during the course of testing
Measured IQ Range Category

145-160 Very gifted or highly advanced
130-144 Gifted or very advanced
120-129 Superior
110-119 High average
90-109 Average
80-89 Low Average
70-79 Borderline impaired or delayed
55-69 Mildly impaired or delayed
40-54 Moderately impaired or delayed
The Wechler Tests
Wechsler-Bellevue (W-B1) or Wechsler-Bellevue (W-B) 1939

-Point Scale
-Items were classified by subtest
-Organized into six verbal subtests and five performance subtests and five performance
subtests, and all the items in each test were arranged in order of increasing difficulty
-Wechsler-Bellevue 2 (W-B 2) - 1942; an alternative form

-Criticisms:
-The standardization sample was rather restricted
-Some subtests lacked sufficient inter-item reliability
-Some of the subtests were made up of items that were too easy
-The scoring criteria for certain items were too ambiguous
Wechsler Adult Intelligence Scale (WAIS) - 1955

-organized into Verbal and Performance scales
-Scoring yielded a Verbal IQ, a performance IQ, and a Full Scale IQ
-WAIS-R (1981)
-new norms and materials
-alternate administration of verbal and performance tests
-WAIS-III (1997)
-contained updated and more user-friendly materials
-test materials were made physically larger to facilitate viewing by older adults
-some items were added to each of the subtests that extended the test’s floor in order to
make the test more useful for evaluating people with extreme intellectual deficits
-extensive research was designed to detect and eliminate items that may have
contained cultural bias
-norms were expanded to include testtakers in the age range 74-89
-yielded a full scale (composite) IQ as well as four Index Scores - Verbal
Comprehension, Perceptual Organization, Working Memory, and Processing Speed
-used for more in-depth interpretation of findings
-WAIS-IV (2008)
-It is made up of subtests that are designated either as core or supplemental
-Core subtest is one that is administered to obtain a composite score
-Supplemental Subtest is used for purposes such as providing additional
clinical information or extedning the number of abilities or processes sampled
-Intended for use with individuals ages 16 to 90 years and 11 months
-contains ten core subtests (Block Design, Similarities, Digit Span, Matrix Reasoning,
Vocabulary, Arithmetic, Symbol Search, Visual Puzzles, Information and Coding)
-and five supplemental subtests (Letter-Number Sequencing, Figure Weights,
Comprehension, Cancellation and Picture Completion)
-more explicit administration instructions as well as the expanded use of demonstration
and sample items - this in an effort to provide assessees with practice in doing what is
required, in addition to feedback on their performance
-all of the test items were thoroughly reviewed to root out any possible cultural bias
-Floor = 40, Ceiling = 160
Wechsler Intelligence Scale for Children (WISC)
-1st edition 1949
-WISC-V (2014)
-ages 6 years old to 16 years and 11 months
-FSIQ, Primary Index Scores and Ancillary Index Scores
-21 subtests; 15 composite scores
-completion time: 60 minutes
Wechsler Pre-School and Primary Scale of Intelligence (WPPSI)

-1st edition 1967
-WPPSI (2012)
-ages 2 years and 6 months up to 7 years and 7 months
-completion time:
-ages 2:6 to 3:11 = 30-45 minutes
-ages 4:0 to 7:7 = 45-60 minutes
Short Forms of Intelligence Test

-Short form refers to a test that has been abbreviated in length, typically to reduce the
time needed for test administration, scoring and interpretation
-In 1958, David Wechsler endorsed the use of short forms but only for screening
purposes. Years later, perhaps in response to the potential for abuse of short forms, he
took a much dimmer view of reducing the number of subtests just to save time
-From a psychometric standpoint, the validity of a test is affected by and is somewhat
dependent on the test’s reliability. Changes in a test that lessen its reliability may also
lessen its validity
-Wechsler Abbreviated Scaled of Intelligence (WASI) 1999

-designed to answer the need for a short instrument to screen intellectual ability in
testtakers from 6 to 89 years of age
-the test comes in a two-subtest form (consisting of Vocabulary and Block Design) that
takes about 15 minutes to administer and a four-subtest form that takes about 30
minutes to administer
-WASI-2 2011
-making the test materials more user friendly, and increasing the psychometric
soundness of the test
Group Tests of Intelligence
-1917 World War 1

-Army Alpha Test -administered to Army recruits who could read. It contained
tasks such as general information questions, analogies, and scrambled sentences to
reassemble
-Army Beta Test -designed for administration to foreign-born recruits with poor
knowledge of English or to illiterate recruits. It contained tasks such as mazes,
coding and picture completion
-An original objective of the Alpha and Beta tests was to measure the ability to be a
good soldier. However, after the war, that objective seemed to get lost in the shuffle as
the tests were used in various aspects of civilian life to measure general
intelligence. An Army or Beta test was much easier to obtain, administer and interpret
than a Stanford-Binet test, and it was also much cheaper
-World War 2
-Army General Classification Test (AGCT) -administered to more than 12 million
recruits
-Today
-group tests are still administered to prospective recruits, primarily for screening
purposes
-Screening tool -an instrument or procedure used to identify a particular trait or
constellation of traits at a gross or imprecise level
-Group Test in School Setting

-Group intelligence test results provide school personnel with valuable information for
instruction-related activities and increased understanding of the individual pupil
-Group intelligence tests in the schools are used in special forms as early as the
kidergarten level. The tests are administered to groups of 10 to 15 children, each of
whom receives a test booklet that includes printed pictures and diagrams. For th most
part, simple motor responses are required to answer items. Oversized alternatives in
the form of pictures in a multiple-choice might appear on the pages, and it is the child’s
job to circle or place an X on the picture that represents the correct answer to the item
presented orally by the examiner. During such testing in small groups, the testtakers will
be carefully monitored to make certain they are following the directions
-Some group intelligence test for school settings:
-California Test of Mental Maturity
-Kuhlmann-Anderson Intelligence Test
-Henmon-Nelson Tests of Mental Ability
-Cognitive Abilities Test
-Otis-Lennon School Ability Tests (OLSAT) (formerly Otis-Lennon Mental Ability Tests -
OLMAT)
-the first group intelligence test to be used in US schools
-designed to measure abstract thinking and reasoning ability and to assist in school
evaluation and placement decision-making
-a multiple choice test commonly used in the US to identify gifted children
-completion time: max 75 minutes
-age range: k to 12
-in general, group tests are useful screening tools when large numbers of examinees
must be evaluated either simultaneously or within a limited time frame
Other Measures of Intellectual Abilities

-Cognitive Styles
-a psychological dimension that characterizes the consistency with which one acquires
and processes information
-Examples:
-Field Dependence vs Field Independence
-Reflection vs Impulsivity
-Visualizer vs Verbalizer
-Meaures of Creativity:
-Originality -the ability to produce something that is innovative or nonobvious
-Fluency -the ease with which responsesare reproduced and is usually measured by the
total number of responses produced
-Flexibility -the variety of ideas presented and the ability to shift from one approach to
another
-Elaboration -the richness of detail in a verbal explanation or pictorial display
-A criticism frequently leveled at group standardized intelligence tests (as well as at
other ability and achievement tests) is that evaluation of test performance is too heavily
focused on whether the answer is correct
-The heavy emphasis on correct response leaves little room for the evaluation of
processes such as originality, fluency, flexibility and elaboration
-Convergent thinking
-a deductive reasoning process taht entails recall and consideration of facts as well as a
series of logical judgments to narrow down solutions and eventually arrive at one
solution
-Divergent thinking
-a reasoning process in which thought is free to move in many different directions,
making several solutions possible
-requires flexibility of thought, originality, and imagination
-Remote Associates Test (RAT)

-developed by Sarnoff Mednick in the 1960s
-presents the testtaker with three words; the task is to find a fourth word associated with
the other three
-a test used to measure creative convergent thinking
-a possible weakness of this test is its focus on verbal associative habits - meaning it
might be more difficult for non-native speakers of English. Also, it may not favor those
who are more comfortable with visual thinking
-Torrance Test of Creative Thinking (TTCT)

-developed by E. Paul Torrance in 1960s
-consist of word-based, picture-based and sound-based test materials
-each subtest is designed to measure various characteristics deemed important in the
process of creative thought
-It is interesting that many tests of creativity do not fare well when evaluated by
traditional psychometric procedures
CHAPTER 11: PRESCHOOL AND EDUCATIONAL ASSESSMENT
Infant Scales
Brazelton Neonatal Assessment Scale (BNAS)

-3 days to 4 weeks of age
-provides an index of a newborn’s competence
-47 scores: 27 behavioral, 20 elicited
-widely used research tool
-drawbacks:
-no norms
-poor test-retest reliability
-does not predict future intelligence
Gesell Development Schedules (GDS)

-2.3 to 6.3 years of age
-provide an appraisal of the developmental status of children
-five areas:
-gross motor
-fine motor
-adaptive
-language
-personal-social
-produces developmental quotient (DQ)
-drawbacks:
-standardization sample inadequate
-no evidence of reliability or validity in test manual
-problem with directing and scoring
Bayley Scales of Infant and Toddler Development - Third Edition

(BSID-III)
-age range 2 to 30 months
-purports to measure cognitive and motor functions
-two scales: motor and mental
-psychometrically rigorous
-predicts well for retarded infants
Cattell Infant Intelligence Scale (CIIS)

-age range 2 to 30 months
-purports to measure infant intelligence
-age scale
-uses mental age and IQ concepts
-downward extension of Binet scale
-drawbacks:
-outdated
-psychometrically unsound
Major tests for Young Children
McCarthy Scales of Children’s Abilities (MSCA)

-age range 2 years old to 8 years old
-present a carefully constructed individual test of human ability
-its battery of 18 tests samples a wide variety of functions long held to be
related to human intelligence. Of the 18 scales, 15 are combined into a
composite score known as the general cognitive index (GCI), a standard
score with a mean of 100 and a standard deviation of 16
-GCI: Verbal Scale, Perceptual-performance, Quantitative
-Additional Scales: Memory and Motor
Kaufman Assessment Battery for Children

2nd Edition (KABC-II)
-Age range 3 to 18 years old
-Individual ability test
-18 subtest; 5 global scales
-sequential processing
-simultaneous processing
-learning
-planning
-knowledge
-based on Aleksandr Luria’s theory, Roger Sperry’s split-brain and Ulric’s
information processing
General Individual Ability Tests for Handicapped and Special

Populations
Columbia Mental Maturity Scale - Third Edition (CMMS)

-purports to evaluate ability in normal and variously handicapped children
from 3 through 12 years of age
-multiple-choice format
-no time limit
-contains 92 different cards grouped into eight overlapping levels, or scales,
according to chronological age
-coefficients range between .85 and .90 for both split-half and test-retest
reliabilities
-highly vulnerable to random error
-a reliable instrument that is useful in assessing ability in many people with
sensory, physical or language handicaps
Peabody Picture Vocabulary Test - Fourth Edition (PPVT - IV)

-age range of 2 through 90 years
-PPVT-IV is not usually used with the deaf, because the instructions are
administered aloud
-the test purports to measure hearing or receptive (hearing) vocabulary,
presumably providing a nonverbal estimate of verbal intelligence
Leiter International Performance Scale - Revised (LIPS-R)

-a performance scale
-age range of 2 to 18 years
-purports to provide a nonverbal measure of general intelligence by sampling
a wide variety of functions from memory to nonverbal reasoning
-can administer it without using language and it requires no verbal response
from subjects
-often used when assessing children with autism
-one can apply it to a large range of disabled individuals, particularly the deaf
and language-disabled
Porteus Maze Test (PMT)

-a popular but poorly standardized nonverbal performance measure of
intelligence
-consists of maze problems, specifically, it includes 12 mazes that increase in
complexity across age levels
-the participant is required to trace the maze from the starting point to the goal
while following certain rules
-can be administered without verbal instruction and thus can be used for a
variety of special populations
Testing Learning Disabilities
Illinois Test of Psycholinguistic Abilities (ITPA-3)

-assumes that failure to respond correctly to a stimulus can result not only
from a defective output (response) system but also from a defective input or
information-processing system
-designed for use with children ages 2 through 10
-widespread use and interest among educators, psychologists, learning
disability specialists and researchers
Woodcock-Johnson III
-designed as a broad-range individually administered test to be used in
educational settings
-it assesses general intellectual ability (g), specific cognitive abilities,
scholastic aptitude, oral language and achievement
-the Woodcock-Johnson III’s cognitive ability standard battery includes 10
tests such as verbal comprehension, visual-auditory learning, spatial relations,
and visual matching
-has relatively good psychometric properties
-based on CHC Model
Visuographic Test
Benton Visual Retention Test - Fifth Edition (BVRT-V)

-assumes that brain damage easily impairs visual memory ability
-designed for individuals ages 8 and older
-consists of geometric designs briefly presented and then removed
-the subject must then reproduce the designs from memory
Bender Visual Motor Gestalt Test (BVMGT)

-also used in the assessment of brain damage, the BVMGT has a variety of
uses and is one of the most popular individual tests
-in consists of nine geometric figures (such as a circle and a diamond) that the
subject is simply asked to copy
Memory-for-Designs (MFD) Test

-simple drawing test that involves perceptual-motor coordination
-requiring only a 10-minute administration
-individuals 8 to 60 years of age
-15 drawings
-drawings are scored from 0 to 3, depending on how they compare with
representative drawings from normal controls and people with varying
degrees of brain injury
Achievement, Aptitude and others
RTL Model
Response to intervention model as a multi level prevention framework applied
in educational settings that is designed to maximize student achievement
through the use of data that identifies students at risk for poor learning
outcomes combined with evidence-based intervention and teaching that is
adjusted on the basis of student responsiveness
Achievement Tests
● Designed to measure accomplishment
● A test of achievement may be standardized nationally, regionally, or
locally, or it may not be standardized at all
● A sound achievement test is ong that adequately samples the targeted
subject matter and reliably gauges the extent to which the examinees
have learned it
● Curriculum-based assessment (CBA)
- a term used to refer to assessment of information acquired from
teachings at school
- Curriculum-based measurement (CBM)
- a type of CBA, is characterized by the use of standardized
measurement procedures to derive local norms to be used in the
evaluation of student performance on curriculum-based tasks
Aptitude Tests
● Tend to focus more on informal learning or life experiences
● Also referred to as prognostic tests, are typically used to make
predictions
Pre-School Level
● Checklist - a questionnaire on which marks are made to indicate the
presence or absence of a specified behavior, thought, event, or
circumstance
● Rating Scale - a form completed by an evaluator (a rater, judge, or
examiner) to make a judgment of relative standing with regard to a
specified variable or list of variables
● Apgar number - "everybody's first test"
● A score on a rating scale developed by physician Virginia Apgar
(1909-1974), an obstetrical anesthesiologist who saw a need for
a simple, rapid method of evaluating newborn infants and
determining what immediate action, if any, is necessary
● Informal evaluation - a typically nonsystematic, relatively brief, and
"off-the-record" assessment leading to the formation of an opinion or
attitude conducted by any person, in any way, for any reason, in an
unofficial context that is not subject to the ethics or other standards of
an evaluation by a professional
● At risk - children who have documented difficulties in one or more
psychological, social, or academic areas and for whom intervention is
or may be required
Diagnostic Tests
● a tool used to identify areas of deficit to be targeted for intervention
● Evaluative Information
● typically applied to tests or test data that are used to make
judgments (such as pass-fail)
- Diagnostic Information
● Typically applied to tests or test data used to pinpoint a student's
difficulty, usually for remedial purposes
Performance, Portfolio, and Authentic Assessment

● Performance assessment will be defined as an evaluation of
performance tasks according to criteria developed by experts from the
domain of study tapped by those tasks
o Performance Task
▪ A work sample designed to elicit from a particular domain
of study values
● Portfolio Assessment - refers to the evaluation of one's
representative knowledge, skills, and work samples/portfolio
● Authentic Assessment- evaluation of relevant, meaningful tasks that
may be conducted to evaluate learning of academic subject matter but
that demonstrate the student's transfer of that study to real-world
activities
Peer Appraisal Techniques

● Peer Appraisal - One method of obtaining information about an
individual is by asking that individual's peer group to make the
evaluation
o nominating technique is a method of peer appraisal in which
individuals are asked to select or nominate other individuals for
various types of activities
● Sociogram - One graphic method of organizing data results of peer
appraisal
Continuation… Information eming from KAPLAN
Group Achievement Tests

Stanford Achievement Test (SAT)
● It evaluates achievement in kindergarten through 12th grades in the
following areas: spelling, reading comprehension, word study and
skills, language arts, social studies, science, mathematics, and
listening comprehension
Metropolitan Achievement Test (MAT)
● Measures achievement in reading by evaluating vocabulary.word
recognition, and reading comprehension
● Also measures mathematics by evaluating number concepts, problem
solving and computation
● Now in its eighth edition, the MAT-8 was renormed in 2000, and
alternate versions of the test including Braille, large print, and audio
formats were made available for use with children having visual
limitations
Group Tests of Mental Abilities (Intelligence)

● Kuhlmann-Anderson Test (KAT)-Eighth Edition
● Henmon-Nelson Test (H-NT)
● Cognitive Abilities Test (COGAT)
College Entrance Test

● SAT Reasoning Test (SAT-1)
● American College Test (ACT)
Graduate and Professional School Entrance Test

● Graduate Record Examination (GRE)
● Miller Analogies Test
● Law School Admission Test
Nonverbal Group Ability Tests

Raven Progressive Matrices (RPM)
● One of the best known and most popular nonverbal group tests
● Although used primarily in educational settings, the Raven is a suitable
test anytime one needs an estimate of an individual's general
intelligence (Spearman's g)
● Group or Individual
● 5 years and Older
● 60 matrices, graded in difficulty
● Minimize the effect of language Better measure of intelligence than
Wechsler
● Worldwide Norms
Goodenough-Harris Drawing Test (G-HDT)

● One of the quickest, easiest, and least expensive to administer of all
ability tests
● Group or Individual
● The subject is instructed to draw a picture of a whole man and to do
the best job possible
● Scored by items included
● 70 points possible in drawing (ex.clothing)
● Works best with children and younger children
● Good psychometrics but outdated norms (not standardized)
Culture Fair Intelligence Test

● Designed to provide an estimate of intelligence relatively free of cultural
and language influences
● A paper-and-pencil procedure that covers three levels
● ages 4-8 and mentally disabled adults
● ages 8-12 and randomly selected adults
● high-school age and above-average adults
● Two Parallel forms are available Standardization varies to age levels
● Normative data from US, Western European Countries, and Australia
● Culture Fair Test is viewed as an acceptable measure of fluid
intelligence
● RPM is still better because CFIT requires more work and norms are
outdated
Chapter 12: Introduction to Personality
Terms
Personality - Individual's unique constellation of psychological traits that is relatively stable over
time
Personality Assessment - the measurement and evaluation of psychological traits, states,
values, interests, attitudes, worldview, acculturation, sense of humor, cognitive and behavioral
styles, and/or related individual characteristics
Personality Traits - Any distinguishable, relatively enduring way in which one individual varies
from another
Personality Type - a constellation of traits that is similar in pattern to one identified category of
personality within a taxonomy of personalities
Personality States - relatively temporary predisposition
The self as the primary referent

● Self-Report-a process wherein information about assessees is supplied by the
assessees themselves
● Self-Concept measure - an instrument designed to yield information relevant to how an
individual sees him or herself with regard to selected psychological variables
● Self-Concept Differentiation - the degree to which a person has different self-concepts
in different roles
Testtaker Response Style

● Response style refers to a tendency to respond to a test item or interview question in
some characteristic manner regardless of the content of the item or question
o Socially desirable responding - present oneself in a favorable light
o Acquiescence - agree with whatever is presented
o Nonacquiescence - disagree with whatever is presented
o Deviance - make unusual or uncommon responses
o Extreme - make extreme, as opposed to middle, ratings on a rating scale
o Gambling/cautiousness - guess-or not guess-when in doubt
o Overly positive - claim extreme virtue through self presentation in a superlative
manner
● Impression Management - a term used to describe the attempt to manipulate others'
impressions through the selective exposure of some information (it may be false
information) coupled with suppression of other information
Procedures and item formats

● Structured interview - the interviewer must typically follow an interview guide and has
little leeway in terms of posing questions not in that guide
● Graphology/Handwriting analysis
Frame of Reference
● Frame of Reference - defined as aspects of the focus of exploration such as the time
frame (the past, the present, or the future) as well as other contextual issues that involve
people, places, and events
● Q-sort Technique - an assessment technique in which the task is to sort a group of
statements, usually in perceived rank order ranging from most descriptive to least
descriptive
Acculturation and Related Considerations
● Acculturation is an ongoing process by which an individual's thoughts, behaviors,
values, worldview, and identity develop in relation to the general thinking behavior,
customs, and values of a particular cultural group
● Rokeach (1973) differentiated what he called instrumental from terminal values
● Instrumental Values - are guiding principles to help on attain some objective
● Terminal Values - guiding principles and a mode of behavior that is an endpoint
objective
● Also intimately tied to the concept of acculturation concept of personal identity
● Identity - a set of cognitive and behavioral characteristics by which individuals
define themselves as members of a particular group
● Identification - process by which an individual assumes a pattern of behavior
characteristic of other people, and referred to it as one of the central issues that
ethnic minority groups must deal with
● Worldview is the unique way people interpret and make sense of their
perceptions as a consequence of their learning experiences, cultural background,
and related variables
Strategies of Structured Personality Test Construction

● Deductive Strategies -Logic and reasoning
● Logical-content strategy - uses reason and deductive logic in the development
of personality measures
● Theoretical strategy - begins with a theory about the nature of the particular
characteristic to be measured
● Empirical Strategies data collection, statistics, experiments
● Criterion-Group strategy
● Factor analytic strategy
Chapter 13.1: Personality Tests Part 1
Objective Methods
Logical-Content
Woodworth Personal Data Sheet

● The first personality inventory ever
● Developed during World War I and published in its final form after the war
● Its purpose was to identify military recruits who would be likely to break down in combat
● Contained 116 questions to which the individual responded "Yes" or "No"
● The items were selected from lists of known symptoms of emotional disorders and from
the questions asked by psychiatrists in their screening interviews
Early Multidimensional Logical-Content Scale
Bell Adjustment Inventory

● Attempted to evaluate the subject's adjustment in a variety of areas such as home life,
social life, and emotional functioning
Bernreuter Personality Inventory

● Could be used for subjects as young as age 13 and included items related to six
personality traits such as introversion, confidence, and sociability
Mooney Problem Checklist

● Contains a list of problems that recurred in clinical case history data and in the written
statements of problems submitted by approximately 4000 high-school students
Criterion-Group Strategy
Minnesota Multiphasic Personality Inventory

● MMPI
● It contained 566 true-false items and was designed as an aid to psychiatric
diagnosis with adolescents and adults 14 years of age and older
● T-score
● 10 Scales:
● Hypochondriasis (Hs) Depression (D)
● Hysteria (Hy)
● Psychopathic deviate (Pd)
● Masculinity-femininity (M)
● Paranoia (Pa)
● Psychasthenia (Pt)
● Schizophrenia (Sc)
● Hypomania (Ma)
● Social Introversion (Si)
● Lie Scale (L.) - 15 items
● Infrequency Scale (F) - 64 items (faking bad)
● K-Scale - 30 items (defensiveness)
● Cannot Say Scale (?)
● MMPI-2
● More representative standardization sample (normal control group) used in the
norming
● Items were rewritten to correct grammatical errors and to make the language
more contemporary, nonsexist, and readable
● 567 true-false items, including 394 items that are identical to the original MMPI
items, 66 items that were modified or rewritten, and 107 new items
● 18 years old and older
● The TRIN scale is designed to identify acquiescent and non acquiescent
response patterns. It contains 23 pairs of items worded in opposite forms
California Psychological Inventory (CPI)-Third Edition

● Attempts to evaluate personality in normally adjusted individuals and thus finds more use
in counseling settings
● The test contains 20 scales, each of which is grouped into one of four classes
● 434 items
Factor Analytic Strategy

16PF
● Assesses various primary personality traits in order to setting and more recently by
recruitment consultants and provide feedback about an individual's disposition,
traditionally used by psychologists in a clinical or research prospective employers
● 185 multiple choice items
Theoretical Strategy
Edwards Personal Preference Schedule (EPPS)

● One of the best-known and earliest examples of a theoretically derived structured
personality test
● The theoretical basis for the EPPS is the need system proposed by Murray
● Ipsative Scoring
● Used in counseling centers
● 16 years to 85 years old
Combination Strategy
Revised NEO Personality Inventory (NEO PI-R)

● Widely used in both clinical applications and a wide range of research that involves
personality assessment
● A measure of the five major domains of personality as well as the six facets that define
each domain
● Based on the OCEAN
● The NEO PI-R is designed for use with persons 17 years of age and older and is
essentially self-administered
● Participants are asked to respond to 240 items using a 5-point scale
● Has a Filipino Translation
Chapter 13.2: Personality Tests Part 2
Projective Methods
Projective Method - technique of personality assessment in which some judgment of the

assessee's personality is made on the basis of performance on a task that involves supplying
some sort of structure to unstructured or incomplete stimuli
Projective Hypothesis - holds that an individual supplies structure to unstructured stimuli in a

manner consistent with the individual's own unique pattern of conscious and unconscious
needs, fears, desires, impulses, conflicts, and ways of perceiving and responding
Rorschach Inkblot Test

● Hermann Rorschach developed what he called a "form interpretation test" using inkblots
as the forms to be interpreted
● The Rorschach consists of 10 bilaterally symmetrical (or, mirror-image If folded in half)
inkblots printed on separate cards.
● Five inkblots are achromatic (meaning without color, or black-and-white)
● Two inkblots are black, white, and red.
● The remaining three inkblots are multicolored
● The test comes with the cards only, there is no test manual or any administration,
scoring, or interpretation instructions
● There is no rationale for why some of the inkblots are achromatic and others are
chromatic (with color)
Holtzman Inkblot Test

● Alternative to Rorschach
● both forms, A and B, of the Holtzman contain 45 cards. Each response may be scored
on 22 dimensions
Thematic Apperception Test

● The TAT was originally designed as an ald to eliciting fantasy material from patients in
psychoanalysis The 30 picture cards, all black-and-white, contain a variety of scenes
designed to present the testtaker with certain classical human situations"
● Some of the pictures contain a lone individual, some contain a group of people, and
some contain no people,
● Some of the pictures appear to be almost as real as photograph; others are surrealistic
drawings
● Apperception is derived from the verb apperceive, which may be defined as to perceive
in terms of past perceptions
Word Association Test

● A semistructured, individually administered, projective technique of personality
assessment that involves the presentation of a list of stimulus words, to each of which an
assessee responds verbally or in writing with whatever comes immediately to mind first
upon first exposure to the stimulus word
● Kent-Rosanoff Free Association represented one of the earliest attempts to develop a
standardized test using words as projective stimuli. The test consisted of 100 stimulus
words, all commonly used and believed to be neutral with respect to emotional impact
Sentence Completion Test

● a semi structured projective technique of personality assessment that involves the
presentation of a list of words that begin a sentence and the assessee's task is to
respond by finishing each sentence with whatever word or words come to mind
● Sacks Sentence Completion Test
● a 60-item test that asks respondents to complete 60 questions with the first thing
that comes to mind across four areas: Family, Sex, Interpersonal, Relationships
and Self concept
Figure Drawing Test

● Draw-a-Person Test - the subject, most often a child, is asked to draw the picture of a
person. Later the child is asked to tell a story about the person
● House-Tree-Person Test - the subject draws a picture of a house, tree, and person and
then makes up a story about it
● Kinetic Family Drawing Test - the subject draws a picture of his or her family

(Cohen) Psych Assessment Reviewer

Uploaded by

Copyright:

Available Formats

(Cohen) Psych Assessment Reviewer

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Cohen) Psych Assessment Reviewer

Uploaded by

Copyright:

Available Formats

Chapter 1: INTRODUCTION

Psychological Testing Psychological Assessment

Focuses on the results of the Focuses on the process

Step 3: Formal assessment will begin

Tools of Psychological Assessment:

3.) Case History Data/Case Study

4.) Behavioral Observation

5.) Role Play Test

6.) Computers as tools

Parties in the Assessment:

Types of settings where assessment conducted:

6. Governmental and Organizational

Han Dynasty (206 B.C.E. to 220 C.E.)

Ming Dynasty (1368-1644 C.E.)

Charles Darwin and Individual Differences

Experimental Psychology and Psychophysical Measurement

James Mckeen Cattell

The Evolution of Intelligence and Standardized Achievement Test

1939 – David Wechsler introduced a test designed to measure adult intelligence

The Measurement of Personality

Culture and Assessment

Some Issues regarding Culture and Assessment

Non-verbal Communication and behavior

Testing People with Disabilities

The Rights of Test Takers

1. The Rights of Informed Consent

2. The Rights to be informed of test findings

3. The Right to Privacy and Confidentiality

4. The Right to the least stigmatizing label

● Range (highest score – lowest score)

● Interquartile and Semi-interquartile Ranges

The Normal Curve

Z-score = raw score – mean / standard deviation

Correlation and Inference

Inferences (or deduced conclusions)

Coefficient of correlation (or correlation coefficient)

The Concept of Correlation

Graphic Representation of Correlation

What’s a Good Test?

Norm-referenced Testing and Assessment

Sampling to develop norms

Standardization or Test Standardization

Sample –a portion of the universe deemed to be representative of the whole population

4. Incidental or Convenience Sampling

Developing Norms for a Standardized Test

5. National Anchor Norms

Fixed-reference Group Scoring System

(variance – standard deviation squared)

Random Measurement Error

Random Error/Noise-source of error in measuring a targeted variable caused by unpredictable

Sources of Error Variance:

1. TEST RETEST RELIABILITY ESTIMATES

2. PARALLEL FORMS AND ALTERNATE FORMS RELIABILITY ESTIMATES

3. SPLIT HALF RELIABILITY ESTIMATES

Step 1. Divide the test into equivalent halves

Spearman Brown Prophecy Formula

(Other methods of estimating internal consistencies)

(heterogeneity-the degree to which a test measures different factors)

Homogenous-items in a scale are unifactorial/contain items that measure a single trait