A colleague recently asked for my opinion on an organization’s level of knowledge, experience, and sophistication applying psychometrics to their assessment program. I came to realize that it was difficult to summarize in words, which got me thinking why. I concluded that it was because there currently is not a common language to describe how advanced an organization is regarding the psychometric expertise they have and the rigour they apply to their assessment program. I thought maybe if there were such a common vocabulary, it would make conversations like the one I had a whole lot easier.
I thought it might be fun (and perhaps helpful) to come up with a proposed first cut of a shared vocabulary around the levels of psychometric expertise. I wanted to keep it simple, yet effective in allowing people to quickly and easily communicate about where an organization would fall in terms of their level of psychometric sophistication. I thought it might make sense to break it out by areas (I thought of seven) and assign points according to the expertise/rigour an organization contains/applies. Not all areas are always led by psychometricians directly, but usually psychometricians play a role.
1. Item and test level psychometric analysis
- Classical Test Theory (CTT) and/or Item Response Theory (IRT)
- Pre hoc analysis (beta testing analysis)
- Ad hoc analysis (actual assessment)
- Post hoc analysis (regular reviews over time)
2. Psychometric analysis of bias and dimensionality
- Factor analysis or principal component analysis to evaluate dimensionality
- Differential Item Functioning (DIF) analysis to ensure that items are performing similarly across groups (e.g., gender, race, age, etc.)
3. Form assembly processes
- Expert review of forms or item banks
- Fixed forms, computerized adaptive testing (CAT), automated test assembly
4. Equivalence of scores and performance standards
- Standard setting
- Test equating
- Scaling scores
5. Test security
- Test security plan in place
- Regular security audits are conducted
- Statistical analyses are conducted regularly (e.g., collusion and plagiarism detection analysis)
6. Validity studies
- Validity studies conducted on new assessment programs and ongoing programs
- Industry experts review and provide input on study design and finding
- Improvements are made to the program if required as a result of studies
- Provide information clearly and meaningfully to all stakeholders (e.g., students, parents, instructors, etc.)
- High quality supporting documentation designed for non-experts (interpretation guides)
- Frequently reviewed by assessment industry experts and improved as required
0. None: Not rigorous, no expertise whatsoever within the organization
1. Some: Some rigour, marginal expertise within the organization
2. Full: Highly rigorous, organization has a large amount of experience
So an organization that has decades of expertise in each area would be at the top level of 14 (7 areas x 2 for expertise/rigour in each area = 14). An elementary school doing simple formative assessment would probably be at the lowest level (7 areas x 0 expertise/rigour = 0). I have provided some examples of how organizations might fall into various ranges in the illustration below.
There are obviously lots of caveats and considerations here. One thing to keep in mind is that not all organizations need to have full expertise in all areas. For example, an elementary school that administers formative tests to facilitate learning doesn’t need to have 20 psychometricians working for them doing DIF analysis and equipercentile test equating. Their organization being low on the scale is expected. Another consideration is expense: To achieve the highest level requires a major investment (and maintaining an army of psychometricians isn’t cheap!). Therefore, one would expect an organization that is conducting high stakes testing where people’s lives or futures are at stake based on assessment scores to be at the highest level. It’s also important to remember that some areas are more basic than others and are a starting place. For example, it would be pretty rare for an organization to have a great deal of expertise in the psychometric analysis of bias and dimensionality but no expertise in item and test analysis.
I would love to get feedback on this idea and start a dialog. Does this seem roughly on target? Would it would be useful? Is something similar out there that is better that I don’t know about? Or am I just plain out to lunch? Please feel free to comment to me directly or on this blog.
On a related note, Questonmark CEO Eric Shepherd has given considerable thought to the concept of an “Assessment Maturity Model,” which focuses on a broader assessment context. Interested readers should check out: http://www.assessmentmaturitymodel.org/