Understanding Assessment Validity and Reliability

Assessments are not all created equal…Those that are both reliable and valid are the superior ones that support learning and measure knowledge most effectively. But how can authors make sure they are producing valid, reliable assessments?
I picked up some tips about this in revisiting the Questionmark report, Assessments through the Learning Process.

So, what is a reliable assessment? One that works consistently. If a survey indicates that employees are satisfied with a course of instruction, it should show the same result if administered three days later. (This type of reliability is called test-retest reliability.) If a course instructor rates employees taking a performance test, their scores should be the same as if any other course instructor scored their performances. (This is called inter-rater reliability.)

And what is a valid assessment? One that measures what it is supposed to measure. If a test or survey is administered to happy people, the results should show that they’re all happy. Similarly if a group of people who are all knowledgeable are tested, the test results should reveal that they’re all knowledgeable.

If an assessment is valid, it looks like the job, and the content aligns with the tasks of the job in the eyes of job experts. This type of validity is known as Content Validity. In order to insure this validity, the assessment author must first undertake a job task analysis, surveying subject matter experts (SMEs) or people on the job to determine what knowledge and skills are needed to perform job-related tasks. That information makes it possible to produce a valid test.

Good assessments are both reliable and valid. If we gave a vocabulary test twice to a group of nurses, and the scores came back exactly the same way both times, the test would be considered highly reliable. However, this reliability does not mean that the test is valid. To be valid, it would need to measure nursing competence in addition to being reliable.
Imagine administering a test of nursing skills to a group of skilled and unskilled nurses and the scores for each examinee are different each time. The test is clearly unreliable. If it’s not reliable, it cannot be valid; fluctuating scores for the same test takers cannot be measuring anything in particular. So the test is both unreliable and invalid. The reliable and valid test of nursing skills is one that yields similar scores every time it is given to the same group of test takers and discriminates every time between good and incompetent nurses. It is consistent and it measures what it is supposed to measure.

Assessments that are both reliable and valid hit the bullseye!

Related resources

How to Measure Construct Validity

Understanding Convergent & Discriminant Validity

What is the Difference Between Validity & Reliability?

Get in touch

Questionmark for Workday

I’m looking for

Related resources

How to Measure Construct Validity

Understanding Convergent & Discriminant Validity

What is the Difference Between Validity & Reliability?

Get in touch