Somewhere between these two types of tests—cognitive and non-cognitive—are various measures of adaptive functioning that often include both cognitive and non-cognitive components. Psychometrics is the scientific study—including the development, interpretation, and evaluation—of psychological tests and measures used to assess variability in behavior and link such variability to psychological phenomena.
In evaluating the quality of psychological measures we are traditionally concerned primarily with test reliability i. This section provides a general overview of these concepts to help orient the reader for the ensuing discussions in Chapters 4 and 5.
In addition, given the implications of applying psychological measures with subjects from diverse racial and ethnic backgrounds, issues of equivalence and fairness in psychological testing are also presented. Reliability refers to the degree to which scores from a test are stable and results are consistent.
When constructs are not reliably measured the obtained scores will not approximate a true value in relation to the psychological variable being measured. It is important to understand that observed or obtained test scores are considered to be composed of true and error elements.
A standard error of measurement is often presented to describe, within a level of confidence e. Test-retest : Consistency of test scores over time stability, temporal consistency ;. Parallel or alternate forms : Consistency of scores across different forms of the test stability and equivalence ; and.
Internal consistency : Consistency of different items intended to measure the same thing within the test homogeneity. A special case of internal consistency reliability is split-half where scores on two halves of a single test are compared and this comparison may be converted into an index of reliability. A number of factors can affect the reliability of a test's scores.
These include time between two testing administrations that affect test-retest and alternate-forms reliability, and similarity of content and expectations of subjects regarding different elements of the test in alternate forms, split-half, and internal consistency approaches. In addition, changes in subjects over time and introduced by physical ailments, emotional problems, or the subject's environment, or test-based factors such as poor test instructions, subjective scoring, and guessing will also affect test reliability.
It is important to note that a test can generate reliable scores in one context and not in another, and that inferences that can be made from different estimates of reliability are not interchangeable Geisinger, While the scores resulting from a test may be deemed reliable, this finding does not necessarily mean that scores from the test have validity. In discussing validity, it is important to highlight that validity refers not to the measure itself i.
To be considered valid, the interpretation of test scores must be grounded in psychological theory and empirical evidence that demonstrates a relationship between the test and what it purports to measure Furr and Bacharach, ; Sireci and Sukin, Historically, the fields of psychology and education have described three primary types of evidence related to validity Sattler, ; Sireci and Sukin, :. Construct evidence of validity : The degree to which an individual's test scores correlate with the theoretical concept the test is designed to measure i.
Content evidence of validity : The degree to which the test content represents the targeted subject matter and supports a test's use for its intended purposes; and. Criterion-related evidence of validity : The degree to which the test's score correlates with other measurable, reliable, and relevant variables i. Other kinds of validity with relevance to SSA have been advanced in the literature, but are not completely accepted in professional standards as types of validity per se.
These include. Diagnostic validity : The degree to which psychological tests are truly aiding in the formulation of an appropriate diagnosis. Ecological validity : The degree to which test scores represent everyday levels of functioning e. Cultural validity : The degree to which test content and procedures accurately reflect the sociocultural context of the subjects being tested. Each of these forms of validity poses complex questions regarding the use of particular psychological measures with the SSA population.
For example, ecological validity is especially critical in the use of psychological tests with SSA given that the focus of the assessment is on examining everyday levels of functioning.
Measures like intelligence tests have been sometimes criticized for lacking ecological validity Groth-Marnat, ; Groth-Marnat and Teal, More recent discussions on validity have shifted toward an argument-based approach to validity, using a variety of evidence to build a case for validity of test score interpretation Furr and Bacharach, In this approach, construct validity is viewed as an overarching paradigm under which evidence is gathered from multiple sources to build a case for validity of test score interpretation.
Five key sources of validity evidence that affect the degree to which a test fulfills its purpose are generally considered AERA et al. Test content : Does the test content reflect the important facets of the construct being measured? Are the test items relevant and appropriate for measuring the construct and congruent with the purpose of testing? Relation to other variables : Is there a relationship between test scores and other criterion or constructs that are expected to be related?
Internal structure : Does the actual structure of the test match the theoretically based structure of the construct? Response processes : Are respondents applying the theoretical constructs or processes the test is designed to measure? Consequences of testing : What are the intended and unintended consequences of testing? As part of the development of any psychometrically sound measure, explicit methods and procedures by which tasks should be administered are determined and clearly spelled out.
This is what is commonly known as standardization. Typical standardized administration procedures or expectations include 1 a quiet, relatively distraction-free environment, 2 precise reading of scripted instructions, and 3 provision of necessary tools or stimuli. All examiners use such methods and procedures during the process of collecting the normative data, and such procedures normally should be used in any other administration, which enables application of normative data to the individual being evaluated Lezak et al.
Standardized tests provide a set of normative data i. Norms consist of transformed scores such as percentiles, cumulative percentiles, and standard scores e. Without standardized administration, the individual's performance may not accurately reflect his or her ability. For example, an individual's abilities may be overestimated if the examiner provides additional information or guidance than what is outlined in the test administration manual. Conversely, a claimant's abilities may be underestimated if appropriate instructions, examples, or prompts are not presented.
When nonstandardized administration techniques must be used, norms should be used with caution due to the systematic error that may be introduced into the testing process; this topic is discussed in detail later in the chapter.
It is important to clearly understand the population for which a particular test is intended. The standardization sample is another name for the norm group.
Norms enable one to make meaningful interpretations of obtained test scores, such as making predictions based on evidence. Developing appropriate norms depends on size and representativeness of the sample. In general, the more people in the norm group the closer the approximation to a population distribution so long as they represent the group who will be taking the test.
Norms should be based upon representative samples of individuals from the intended test population, as each person should have an equal chance of being in the standardization sample.
Stratified samples enable the test developer to identify particular demographic characteristics represented in the population and more closely approximate these features in proportion to the population.
For example, intelligence test scores are often established based upon census-based norming with proportional representation of demographic features including race and ethnic group membership, parental education, socioeconomic status, and geographic region of the country.
When tests are applied to individuals for whom the test was not intended and, hence, were not included as part of the norm group, inaccurate scores and subsequent misinterpretations may result. Tests administered to persons with disabilities often raise complex issues. Test users sometimes use psychological tests that were not developed or normed for individuals with disabilities. It is critical that tests used with such persons including SSA disability claimants include attention to representative norming samples; when such norming samples are not available, it is important for the assessor to note that the test or tests used are not based on representative norming samples and the potential implications for interpretation Turner et al.
Performance on psychological tests often has significant implications high stakes in our society. Tests are in part the gatekeepers for educational and occupational opportunities and play a role in SSA determinations. As such, results of psychological testing may have positive or negative consequences for an individual.
Often such consequences are intended; however, there is the possibility for unintended negative consequences. It is imperative that issues of test fairness be addressed so no individual or group is disadvantaged in the testing process based upon factors unrelated to the areas measured by the test.
Biases simply cannot be present in these kinds of professional determinations. Moreover, it is imperative that research demonstrates that measures can be fairly and equivalently used with members of the various subgroups in our population. It is important to note that there are people from many language and cultural groups for whom there are no available tests with norms that are appropriately representative for them.
As noted above, in such cases it is important for assessors to include a statement about this situation whenever it applies and potential implications on scores and resultant interpretation.
While all tests reflect what is valued within a particular cultural context i. Bias leads to inaccurate test results given that scores reflect either overestimations or underestimations of what is being measured. When bias occurs based upon culturally related variables e. Relevant considerations pertain to issues of equivalence in psychological testing as characterized by the following Suzuki et al.
Functional : Whether the construct being measured occurs with equal frequency across groups;. Conceptual : Whether the item information is familiar across groups and means the same thing in various cultures;. Scalar : Whether average score differences reflect the same degree, intensity, or magnitude for different cultural groups;.
Linguistic : Whether the language used has similar meaning across groups; and. Metric : Whether the scale measures the same behavioral qualities or characteristics and the measure has similar psychometric properties in different cultures.
It must be established that the measure is operating appropriately in various cultural contexts. Test developers address issues of equivalence through procedures including. Cultural equivalence is a higher order form of equivalence that is dependent on measures meeting specific criteria indicating that a measure may be appropriately used with other cultural groups beyond the one for which it was originally developed.
Trimble notes that there may be upward of 50 or more types of equivalence that affect interpretive and procedural practices in order to establish cultural equivalence.
For most of the 20th century, the dominant measurement model was called classical test theory. This model was based on the notion that all scores were composed of two components: true score and error. The model further assumes that all error is random and that any correlation between error and some other variable, such as true scores, is effectively zero Geisinger, The approach leans heavily on reliability theory, which is largely derived from the premises mentioned above.
Since the s and largely since the s, a newer mathematically sophisticated model developed called item response theory IRT. The premise of these IRT models is most easily understood in the context of cognitive tests, where there is a correct answer to questions.
The simplest IRT model is based on the notion that the answering of a question is generally based on only two factors: the difficulty of the question and the ability level of the test-taker. Computer-adaptive testing estimates scores of the test-taker after each response to a question and adjusts the administration of the next question accordingly. For example, if a test-taker answers a question correctly, he or she is likely to receive a more difficult question next. It has been found that such computer-adaptive tests can be very efficient.
IRT models have made the equating of test forms far easier. Equating tests permits one to use different forms of the same examination with different test items to yield fully comparable scores due to slightly different item difficulties across forms.
To convert the values of item difficulty to determine the test-taker's ability scores one needs to have some common items across various tests; these common items are known as anchor items. Using such items, one can essentially establish a fixed reference group and base judgments from other groups on these values.
As noted above, there are a number of common IRT models. Among the most common are the one-, two-, and three-parameter models.
The one-parameter model is the one already described; the only item parameter is item difficulty. A two-parameter model adds a second parameter to the first, related to item discrimination. Item discrimination is the ability of the item to differentiate those lacking the ability in high degree from those holding it. Such two-parameter models are often used for tests like essay tests where one cannot achieve a high score by guessing or using other means to answer currently.
The three-parameter IRT model contains a third parameter, that factor related to chance level correct scoring. This parameter is sometimes called the pseudo-guessing parameter, and this model is generally used for large-scale multiple-choice testing programs.
These models, because of their lessened reliance on the sampling of test-takers, are very useful in the equating of tests that is the setting of scores to be equivalent regardless of the form of the test one takes.
The test user is generally considered the person responsible for appropriate use of psychological tests, including selection, administration, interpretation, and use of results AERA et al. Test user qualifications include attention to the purchase of psychological measures that specify levels of training, educational degree, areas of knowledge within domain of assessment e.
Test user qualifications require psychometric knowledge and skills as well as training regarding the responsible use of tests e. In addition, test user guidelines highlight the importance of understanding the impact of ethnic, racial, cultural, gender, age, educational, and linguistic characteristics in the selection and use of psychological tests Turner et al. Test publishers provide detailed manuals regarding the operational definition of the construct being assessed, norming sample, reading level of test items, completion time, administration, and scoring and interpretation of test scores.
Directions presented to the examinee are provided verbatim and sample responses are often provided to assist the examiner in determining a right or wrong response or in awarding numbers of points to a particular answer. Ethical and legal knowledge regarding assessment competencies, confidentiality of test information, test security, and legal rights of test-takers are imperative.
Resources like the Mental Measurements yearbook MMy provide descriptive information and evaluative reviews of commercially available tests to promote and encourage informed test selection Buros, To be included, tests must contain sufficient documentation regarding their psychometric quality e. Many instruments, such as those discussed throughout this report, would be considered qualification level C assessment methods, generally requiring an advanced degree, specialized psychometric and measurement knowledge, and formal training in administration, scoring, and interpretation.
However, some may have less stringent requirements, for example, a bachelor's or master's degree in a related field and specialized training in psychometric assessment often classified level B , or no special requirements often classified level A for purchase and use. While such categories serve as a general guide for necessary qualifications, individual test manuals provide additional detail and specific qualifications necessary for administration, scoring, and interpretation of the test or measure.
Given the need for the use of standardized procedures, any person administering cognitive or neuropsychological measures must be well trained in standardized administration protocols. He or she should possess the interpersonal skills necessary to build rapport with the individual being tested in order to foster cooperation and maximal effort during testing.
Additionally, individuals administering tests should understand important psychometric properties, including validity and reliability, as well as factors that could emerge during testing to place either at risk. Many doctoral-level psychologists are well trained in test administration; in general, psychologists from clinical, counseling, school, or educational graduate psychology programs receive training in psychological test administration.
For cases in which cognitive deficits are being evaluated, a neuropsychologist may be needed to most accurately evaluate cognitive functioning see Chapter 5 for a more detailed discussion on administration and interpretation of cognitive tests. The use of non-doctoral-level psychometrists or technicians in psychological and neuropsychological test administration and scoring is also a widely accepted standard of practice APA, ; Brandt and van Gorp, ; Pearson Education, Psychometrists are often bachelor's- or master's-level individuals who have received additional specialized training in standardized test administration and scoring.
They do not practice independently or interpret test scores, but rather work under the close supervision and direction of doctoral-level clinical psychologists or neuropsychologists. Interpretation of testing results requires a higher degree of clinical training than administration alone.
Threats to the validity of any psychological measure of a self-report nature oblige the test interpreter to understand the test and principles of test construction. In fact, interpreting tests results without such knowledge would violate the ethics code established for the profession of psychology APA, Diagnosis: Psychological assessment measures can support a qualified clinician in making a formal diagnosis of a mental health problem.
Mental health assessment with the purpose of supporting a diagnosis can include the use of semi-structured diagnostic interviews and validated questionnaires.
Items in self-report measures used for diagnosis often bear a close correspondence to criteria specified in the diagnostic manuals ICD and DSM. Psychologists, CBT therapists, and other mental health professionals often ask their clients to complete self-report measures regularly to assess changes in symptom severity.
Tolin, D. Psychiatry Research, , Information about both specific commercial and unpublished and psychological tests is amply available, including journal articles that discuss the application and scoring of a particular test. In many cases, you may be able to track down the test or measure itself of unpublished tests, but without the scoring key or manual. And indeed there still exists a selection of tests with scoring keys that are available to general researchers.
It can be helpful to look at tests even those without a scoring manual , such as those indexed in PsycTESTS , and reviews of commercially available psychological tests, to see how other researchers have measured a construct. This can inform you own research methods. PsycTESTS provides information on over 27, psychological tests, measures, and other assessment tools.
In many cases, the full-text of test instrument is provided. However, scoring materials are rarely provided. For non-commercial tests, you may wish to contact the test creator directly to inquire if further information can be provided directly to you.
Mental Measurements Yearbook with Tests in Print TIP Tests In Print "serves as a comprehensive bibliography to all known commercially available tests that are currently in print in the English language". The Yearbook also includes information on obtaining a test, as well as insightful reviews about a test , such as its construct validity and reliability.
This new open access journal, Psychological Test Adaptation and Development , publishes papers "on adaptations of tests to specific cultural needs, test translations, and the development of existing measures. The journal will focus on the empirical testing of the psychometric quality of these measures". Health and Psychosocial Instruments includes information on measurement instruments commercial or unpublished in the health fields, psychosocial sciences, organizational behavior, and library and information science.
Links to journal articles that discuss a particular test. Tests that have been published within books or journal articles are readily available and may meet your research needs.
Note that many articles and books provide information about tests, but only some of them may include the actual test instruments.
For more detailed information on identifying tests on specific subjects see the American Psychological Association's guide: Testing and Assessment. Note: There is no straightforward way to identify books in the Library Catalogue that include tests, but a subject search for either Psychological Tests or Psychological Testing is a good start.
0コメント