In psychometrics, validity and reliability are used to assess how well a test measures something. When creating an assessment, organizations must consider how valid the results are, as well as how reliable (or consistent) they were throughout the testing process. The two concepts are closely related, but there are key differences which set them apart.
In this guide, we will run through the definitions of validity and reliability, the different types of each concept, and the differences between the two.
What is validity?
Validity means that something measures what it is meant to measure. If a particular assessment is designed to determine whether or not candidates have understood a set of compliance principles, it can be described as valid if it is able to show who understands the principles and who does not.
|Example: in order to be valid, a driving test should include a physical driving exam, not just a theory exam. Answering questions about driving a car would not be a valid test of whether a person can drive or not – this requires the physical act of driving the car.|
What are the three types of validity?
Three common ways to look at validity are are:
We have outlined these three types of validity in the table below.
|Type of validity||Approach||Example|
|Content||Is the content and composition of the test appropriate, given what is being measured?||If a student sits an exam to test their course knowledge, the exam would need to cover all the pertinent points outlined in the course in order to have content validity.|
|Construct||How well does the test measure the underlying construct that it is designed to measure?||If an assessment is designed to measure a candidate’s math ability, does it actually show evidence of measuring this? Would candidates who score well in this math test also score well in others? If so, the test has high construct validity.|
|Criterion||Can the results of the test be used to predict something related to the test?||If a candidate scores well on a sales skills assessment, are they actually more likely to perform better in a sales role? A test that accurately predicts real-world performance would have high criterion validity.|
To learn more about criterion validity, visit our guide to predictive validity (a subtype of criterion validity).
What is reliability?
Reliability means that something is consistent and replicable. For example, a reliable test must measure the same variables time and time again, producing the same or similar outcomes in each case.
|Example: sales assessments are used to measure candidates’ sales capabilities. The test would be reliable if a particular candidate were to take the same test on more than one occasion and the organization would reach the same conclusions about their abilities as a salesperson.|
What are the four types of reliability?
There are four different types of reliability, these are:
- Parallel forms
- Internal consistency
We have outlined these four types of reliability in the table below.
|Type of reliability||Approach||Example|
|Test-retest||Conducting the same test more than one time.||Pilots are tested for color-blindness multiple times – this is not a characteristic that will change, so we can test and retest to determine reliability.|
|Interater||Different people conduct the same test.||Researchers conduct a test on recovery time after surgery. All the researchers should agree based on their results to define a set recovery period for future patients.|
|Parallel forms||Different tests designed to be an equivalent of each other.||Multiple different past papers are handed out to a math class, which are an equivalent of their final exam.|
|Internal consistency||The correlation between multiple items / questions within a test.||If testing a patient for anxiety, all of the questions should be related to symptoms of anxiety. If they rate highly for each question, it is likely they will have anxiety disorder.|
What are the differences between validity and reliability?
There area few key differences between validity and reliability to be aware of:
Consistent vs. precise
Reliability means that something is consistent time and time again, whereas validity means that the test is a precise way of measuring what it is supposed to measure.
Reliability is simple to measure, as it only depends on a consistent set of results. However, validity can be more difficult to measure. To show that a test is valid, the assessor needs some means of showing that it actually measures what it is meant to measure (e.g. candidates who score well on this test also score well on other tests designed to assess the same thing).
Can a test be valid but not reliable?
A valid test will always be reliable, but the opposite isn’t true for reliability – a test may be reliable, but not valid. This is because a test could produce the same result each time, but it may not actually be measuring the thing it is designed to measure.
For example, candidates could all consistently get the same results in a sales assessment, so it is reliable. However, if the assessment doesn’t actually measure a salesperson’s abilities in a suitable way, then we can’t describe it as valid.
On the other hand, if the assessment was actually a good measure of a salesperson’s abilities (i.e. it was valid), we would expect the results to be broadly the same each time a particular candidate completed it (i.e. it would also be reliable).
For further context on this, take a look at our infographic on the reliability and validity of compliance assessments.
Should an assessment be both valid and reliable?
Yes. In order for an assessment to be trustworthy, it must be both reliable and valid. For example, when hiring members of staff, organizations need a reliable and valid assessment. If candidates kept passing the test, but it failed to accurately measure their abilities because it asked the wrong questions, it would not be a suitable assessment.
Take a look at our guide on why reliability and validity are the key to trust.