Posted by Austin Fossey
In my previous posts, I introduced the student model and the task model—two of the three sections of the Conceptual Assessment Framework (CAF) in Evidence-Centered Design (ECD).
The student and task models are linked by the evidence model. The evidence model has two components: the evaluation / evidence identification component and the measurement model / evidence accumulation component (e.g., Design and Discovery in Educational Assessment: Evidence-Centered Design, Psychometrics, and Educational Data Mining; Mislevy, Behrens, Dicerbo, & Levy, 2012).
The evaluation component defines how we identify and collect evidence in the responses or work products produced by the participant in the context of the task model. For the evaluation component, we must ask ourselves what is it we are looking for as evidence about the participant’s ability, and how will we store that evidence?
In a multiple choice item, the evaluation component is simply whether or not the participant selected the item key, but evidence identification can be more complex. Consider drag-and-drop items where you may need to track the options the participant chose as well as their order. In hot spot items, the evaluation component consists of capturing the coordinates of the participant’s selection in relation to a set of item key coordinates.
Some simulation assessments will collect information about the context of the participant’s response (i.e., was it the correct response given the state of the simulation at that moment?), and others consider aspects of the participant’s response patterns, such as sequence and efficiency (i.e., what order did the participant perform the response steps, and were there any extraneous steps?).
In the measurement model component, we define how evidence is scored and how those scores are aggregated into measures that can be used in the student model.
In a multiple choice assessment using Classical Test Theory (CTT), the measurement model may be simple: if the participant selects the item key, we award one point, then create an overall score measure by adding up the points. Partial credit scoring is another option for a measurement model. Raw scores may be transformed into a percentage score, which is the aggregation method used for many assessments built with Questionmark. Questionmark also provides a Scoring Tool for external measurement models, such as rubric scoring of essay items.
Measurement models can also be more complex depending on the assessment design. Item Response Theory (IRT) is another commonly used measurement model that provides probabilistic estimates of participants’ abilities based on each participant’s response pattern and the difficulty and discrimination of the items. Some simulation assessments also use logical scoring trees, regression models, Bayes Nets, network analyses or a combination of these methods to score work products and aggregate results.