What has influenced test development up to now?
Content developments
Theoretical developments
-Intelligence
-Personality
Technical and methodological
Statistics
E.g., Factor analysis
Computers & the internet
Contextual needs
Future of Testing
Likely same influences will impact on testing into the future:
Content developments
Technical and methodological developments
Contextual changes
Content Development
Construct development
A construct is a hypothetical entity with theoretical links to other hypothesised variables, proposed to relate to a consistent set of observable behaviours, thoughts or feelings that is the target of a psychological test.
Theoretical advances, such as new constructs emerging in the literature, might give an idea on future tests and procedures likely to be developed.
Emerging Constructs
3:11-7:13 https://www.youtube.com/watch?v=9xTz3QjcloI
Expansion of constructs of intelligence
Gardner’s theory of multiple intelligences
Drive development of broader measures
Content Development
Big Five shaped development of a number of assessment measures
New concepts/increased attention driving new measure development, e.g.,
–Emotional intelligence
–Refers to a person’s capacity to monitor/manage emotions, understand the emotions of others, and use these insights to function better interpersonally
—Controversial: where to locate this in existing theory? Amalgamation of existing personality traits?
Integrity: dependability, theft proneness, counterproductive work behaviour.
—Specific type of personality test or a direct measure to test a job applicants honesty, trustworthiness or integrity
Content Development
Neuroscience & brain function
Technical and Methodological Developments
Increasing access to computers and internet over time
-Computer-assisted psychological assessment (CAPA)
Smart testing
Serious gaming
Potential for virtual reality, artificial intelligence in assessment
Computer Applications
1950s: computers first available for testing and assessment
CAT conceived
New developments in test theory including item response theory
Costs/skills prohibitive for mainstream use
Computer Applications
1980s: widespread proliferation of affordable home computers
Test developer access to affordable computing power
Development of computerised testing began
1990s: widespread growth of the internet
Possibility of internet testing
Testing as big business
Rapid proliferation of tests/testing
Are computer and pen and paper forms equivalent though?
Does computer presentation fundamentally change the construct being measured?
Generally the answer is no
Cross-mode correlations of 0.97 (e.g., Mead & Drasgow, 1993 meta-analysis)
Not much difference between ticking a box on a questionnaire with a pencil or mouse
Psychological decision-making processes remain the same
But….
speeded tests
psychomotor effects
Speeded tests are an exception (e.g., Greaud & Green, 1986)
Characterised by very simple tasks performed repetitively, as quickly as possible, within a short time limit (e.g., coding on WISC/WAIS)
Psychomotor effects on speeded tests, variations in response modality (i.e. pen & pencil vs. computer) do affect results
Computer-assisted testing: WISC-V as an example
https://www.youtube.com/watch?v=tp5B86ajbmw
Multidimensional Adaptive Testing (MAT)
MAT as an extension of Computerised adaptive testing (CAT) covered in educational testing lecture
-Multivariate generalisation
Revision: CAT is where a computer continuously monitors test-taker’s performance and selects next item to administer to get the most information
Multidimensional Adaptive Testing (MAT)
MAT takes adaptive testing to the next level by applying this same idea to a battery of tests rather than a single test
–Capitalises on idea that many constructs measured by a test are correlated
Performance on each item then informs items used for every subtest in a battery
Adapts simultaneously across subtests
Key advantage: reduces test time without sacrificing accuracy of measurement across a whole battery
Limitations of MAT
Like CAT, amount of effort to develop a sufficiently large item bank to draw from
Requires 100s of items with item parameters estimated
Requires data from large samples of examinees with extensive testing during development, even more so than in CAT
Potential for “chopping and changing” between item types as system selects any subtest in the battery
Item-Generative Testing
Possible solution for need for large item banks (MAT and CAT)
New items generated automatically by a computer based on an underlying rule or algorithm
–Main source of difficulty for subtest by rule/template computer can generate infinite number of actual items of desired difficulty or by randomly initialising key variables/applying a rule
-Potential future assessments based on cognitive models of test performance to drive item-generative testing
Time Parameterisation
Speed vs. accuracy?
BUT, Computer administered tests allow capture of response time
Challenge is how to use this?
Internet Testing
Revolutionised testing
Larger impact on distribution
than development of tests
Questions can be quickly circulated to psychologists and other uses
Internet versions of tests easily kept up to date and disseminated upon development
Can modify scoring and way questions presented easily
Information easily returned to test developer
Internet Testing: Risks & Limitations
“Digital divide” (Batram, 2000)
Some people have better access to the internet than others; best access tends to be most privileged
Strong tradition in testing of trying to avoid discrimination
Narrowing gap in recent years as computers and internet are becoming cheaper and more widespread
Potential to bridge service gaps in rural/remote areas
–More limited access to professionals/getting to a test centre
Risks and Limitations
Security of Information
Risks and Limitations
Proliferation of non-evidence-based assessments on the internet
Pop-psych and para-psychological
Major problem for field
Testing vs. Assessment
Internet suited to testing, but not assessment
Risk of being used (inappropriately) as a replacement for psychological assessment
Very open to misuse and misinterpretation
Industrial and Organisational Testing Online
Rise of online recruiters and job markets
Potential for automatic head hunting with “web bots” trawling web for CVs (e.g., LinkedIn)
Temptation for delivery of psychological tests/assessment direct to public without a psychologist
Supervised Testing in the Digital Age
Functions of supervision of assessment
Levels of Supervision
Open (“unsupervised mode”)
E.g., tests published online, in magazines, or books
Many personal development measures
If tests incurred significant development costs unlikely to be open
Only suitable for low-stakes testing
Controlled (e.g., password to access)
Suitable for first step in recruitment process
Recommended to follow-up with verified testing
Supervised mode (e.g., presence of proctor in non-secure environment) E.g., NAPLAN in 2018
Managed mode (formal examination conditions with test kept secure)
May include locally supervised or remote (e.g., using webcam technology, keystroke monitoring, and timing)
Raises additional complexities!