Though those are staples, I think any discussion of validity would be lacking if we didn’t give a nod to Lyle F. Bachman’s article, Building and Supporting a Case for Test Use (2005).
This article discusses validity practices and the adaptation of Stephen Toulmin’s Model of Argumentation to assessments. Bachman explains how this model provides a system for linking assessment (or survey) scores, assessment inferences, and assessment consequences.
Bachman summarizes other authors’ ongoing discussions of argument-based validity, which in my opinion gets down to one core idea: assessment results need to be convincing. A test developer may need to be able to defend an assessment by providing a convincing argument for why the consequences of the test results are valid.
You may have been in a situation where you thought, “Wow, I just can’t believe that person passed that test!” Of course you would be too polite to say anything, but the doubt would still be there deep down in your heart. It would be nice if a friendly test developer would step in and explain to you, point by point, the evidence and reasoning for why it was okay to believe the results.
Bachman describes a simple process for how one might structure these validity arguments using Toulmin’s structure. From my experience, people seem to like the Toulmin approach because it’s easy to understand and easy to communicate to stakeholders. Toulmin’s structure includes the following elements:
- A warrant with backing evidence
- A rebuttal with rebuttal evidence
- A claim
With this model, you make a claim based on the data from the participant’s performance. You support that claim with a warrant, which has its own backing research and data (e.g., a validity study, a standard setting study). You then also have to refute any alternative explanations that might be used as a rebuttal (e.g., a bias review).
Bachman extends this line of thinking by suggesting that test developers should be able to create this argument structure for both the validity inference of the assessment as well as the uses of the assessment. After all, there are plenty of valid assessments that get used in invalid ways. He defines four types of warrants we should consider when using the results to make a decision, which are paraphrased as follows:
- Is the interpretation of the score relevant to the decision being made?
- Is the interpretation of the score useful for the decision being made?
- Are the intended consequences of the assessment beneficial for the stakeholders?
- Does the assessment provide sufficient information for making the decision?
Even if you don’t follow through with a whole set of documents built around this process, these are good questions to ask about your assessment. Consider alternative arguments for why participants may be passing or failing, and be sure you can convincingly refute them in the event of a challenge.
Think critically about whether or not your assessment is measuring what it claims to measure, and then think about what backing evidence or resources could help you make that interpretation.