AI & Assessments: When the stakes are high

When the question of high-stakes tests and exams crops up, most media coverage focuses on education or certification. In this article, we instead look at high-consequence tests and exams in the workplace and outline some best-practice advice while also considering the role of AI.

What are high-stakes workplace exams?

Workplace exams are used to check if someone is competent before allowing them to do job tasks. Some of these exams can have very high consequences as people without the right skills can end up mis-selling services, causing regulatory compliance failures, and even health and safety violations.

For example:

Pharmaceutical companies often test their sales reps to check they know the details of their products. If a sales rep is wrong, the company might get into trouble for false advertising and selling.
Energy and manufacturing companies test their staff on health and safety or safe machine operation. An error here could mean an industrial accident.
High-tech companies test their engineers on diagnosing and fixing issues before sending them into the field. A mistake can mean critical equipment is down.
Healthcare organizations need their practitioners to be competent in healthcare assessments before they are allowed to perform certain procedures. A mistake puts a patient’s health at risk.

This article is aimed at those in the workplace who create and deliver online tests that have important consequences for both the organization and often, society too. We’ll identify some “low hanging fruit”, practices that if you’re not doing already could improve your workplace exams without significant effort.

As a last but essential point, we’ll also look at how the advent of AI may help or hinder high-stakes assessments in the future.

Creating high-consequence tests

The typical team responsible for workplace assessments often has to manage a wide range of assessments for different products, departments, and roles with a relatively small team. This contrasts with some public exam bodies or large certification organizations that may have larger teams and more resources to develop their exams.

One of the key principles for workplace tests is, therefore, efficacy. You must get the job done effectively with limited resources. Tests need to be reliable, allow valid use of scores, and be fair, inclusive, authentic, secure, and legally defensible. Above all, you need to use resources wisely and get the best value for your organization.

One could easily write a whole book on writing tests. (and if you are looking for one, a great book is ‘Criterion-referenced Test Development’ by Shrock and Coscarelli.) If you are already creating and delivering tests, here are seven potential improvements to workplace tests that are relatively low-hanging fruit, i.e. easy to add to your process and implement.

All of these are actionable with relatively few resources. For several, we point to free, recorded Questionmark webinars that provide further guidance. If you are doing them already, all is well and good; if not, consider if you can take any of the ideas below to improve your program.

1. Review how you set pass or cut scores

The pass (or cut) score is a key tool in making tests useful and needs to be set thoughtfully.

Some organizations have a fixed pass score of 70% or 75% for all exams, but that is not usually fair or appropriate. You should instead set the pass score based on how challenging the questions are and the competence level you are looking for.

The Angoff method is one way of doing this that is widely used and pretty easy to implement. See this recorded webinar “Setting a Cut Score – What’s Fair and What’s Not?” for more information on setting pass/cut scores. The pass score is a crucial element in the usefulness of the test. Consider the diagram to the right.

Too high a pass score means that there will be competent people who fail the test. This is an error of rejection. Typically someone will be rejected unfairly, and have to go through retraining or other remedial measures unnecessarily. And the organization loses productive time inadvisably. Too low a pass score means that incompetent people will pass the test. If someone who is not competent passes the test, then this is an error of acceptance. It means that they could be let loose to do workplace tasks that they will struggle with. Or in the worst case scenario, cause a compliance, safety or service failure, depending on the importance and cost of such errors.

2. Write questions to test above knowledge

Another essential area where many workplace tests can be improved is by testing above knowledge.

Knowledge is important in many job roles, but most real-world skills require a great deal more than just recall of knowledge and a test that does more than just check recall of knowledge will usually better match job requirements. However, it’s often much easier to write questions that ask for knowledge recall and so it’s a common trap that many workplace assessments fall into.

As such, when writing new questions, it’s worth putting in effort to make the questions test above knowledge. For example, ask people to apply their knowledge or check their understanding. There are lots of ways to do this, and it’s not expensive or time-consuming. See this recorded webinar: “Beyond Recall: Taking Competency Assessments to the Next Level” for some suggested routes.

3. Check regularly that each test matches job requirements

Of course, job skills are changing rapidly. With huge changes like the internet and generative AI, the skills that were needed in many jobs a few years ago are very different today, meaning that what we test and measure must also mirror those differences.

A common way to check that a test matches job requirements is via a job task analysis, which surveys existing workers to identify what they do day-to-day and what is important and frequent. You can then use the results of the surveys to create a test blueprint, ensuring the topics in the test match the necessary job skills.

Even if you don’t have the time to do a thorough job analysis for every test, doing some work in this area can be very useful. For example, you could share a blueprint of the test with job experts for review or survey test-takers as to whether they feel that the questions are fair and reflect their job. There are also pioneers using generative AI to make job analyses by summarizing work-related documents.

4. Measure on-the-job skills with observational assessments

If job roles involve practical tasks, e.g. operating or fixing machinery or working with people, then a conventional on-screen assessment may not be the best way to measure skill. People need to know how to do the task, not just how to answer questions on it.

An observational assessment is a test where an observer watches a participant perform a task and rates their performance, making it possible to evaluate skills or abilities that are difficult to measure using “traditional” assessments.

Observational assessments often have the instructor or observer using a tablet or a smartphone and can be done “in the field”. Their results can also be collated with more traditional assessments to provide a rounded picture of a person’s capability.

See this webinar “Observational Assessments – What are they, why, and when should you consider using them?” for more practical details.

5. Create real-world scenarios using scenario-based assessments

Scenario-based assessments, sometimes called case-based assessments, are very good at bringing test-takers closer to a real-world working environment. They usually present test-takers with a case or scenario and ask the test-taker to apply their knowledge and skills using the information at hand to answer a series of questions.

For example, when measuring an oil and gas worker’s ability to follow OSHA regulations correctly during a pressure drop situation, your scenario might include real-world documents such as maintenance records and pressure reading graphs. Or, for a nursing student, you might present a scenario of a patient who has arrived at a hospital with a set of symptoms and, using material like medical history notes and ECG readings, ask the student to diagnose and recommend a treatment plan.

Whether it’s creating a financial fraud scenario to measure banking employees’ knowledge of regulatory requirements or testing an engineer’s understanding of a plane’s engine, scenario-based assessments bring real-world stakes to assessing.

One way of creating scenario-based assessments is with tools like Advanced Assessments, which are designed to help you test and validate almost any skill, at any level, in any way.

Get in touch

Bring real-world stakes to your L&D today.

Talk to us

6. Consider deterrence within your test security efforts

If people cheat at workplace tests, the consequences can be severe. Someone might be permitted to do an important job role that they cannot do well and mis-sell a product or service, cause a compliance issue, or cause injury or damage.

The 2022 ITC/ATP Technology-Based Assessment Guidelines are the gold standard in how to run technology-based assessments, and they suggest dividing test security measures into three categories:

Prevention: Measures to protect against people being able to cheat
Deterrence: Ways to persuade someone that cheating is wrong or not worth the effort.
Detect/Respond: Ways of finding that test fraud is happening and responding to it.

A lot of effort is usually put into preventing, detecting, and responding to test fraud. However, it can also be extremely valuable and relatively inexpensive to put effort into deterrence too.

Ways of deterring people from cheating at workplace tests include:

Providing routes for learning the material so that people don’t feel that they need to cheat.
Making it clear in policies or agreements that violating test rules is a serious disciplinary matter.
Provide examples of people caught cheating and the sanctions they received as a means to deter others.

Combat test fraud

Manage test fraud with our handy checklist.

Learn more

A small effort on deterrence can often add as much value as a larger effort in other areas. See this blog article from our partner Caveon with some useful guidance on deterrence in test security.

7. Conduct item analysis to improve question quality

Item analysis is a way of conducting a simple statistical analysis of test results to identify weak questions. It can identify questions that are too easy, too hard, potentially ambiguous, or potentially miskeyed, and also those that might be irrelevant to the test objectives.

For example, in an item analysis report chart, most questions might be color-coded green to indicate that they are fine, but some might be flagged as amber or red to indicate they require a more detailed review.

Item analysis can be done by non-psychometricians and is relatively accessible and not that time consuming. If you don’t make use of item analysis, you make the results of tests less useful as you will be including questions that don’t really help in determining whether someone is competent.

See our recorded webinar Item Analysis for Beginners, for an accessible introduction on how to do it.

8. Take action from failures and weak points

Last but not least, learn from weaknesses. If someone fails a test, then you almost certainly will have a procedure to address the issue, usually with some training or other intervention to help the person. It’s vital to do this both for business and regulatory compliance purposes.

Reporting on individuals in the assessment platform

But do you also check questions or topics that people often do badly in? Most assessment systems will allow you to see reports of results by topic and divide this up by job role or department. If you can identify one area that people in a particular location or sector are weak at, you also have another opportunity for a potential intervention.

In doing so, you can potentially head off a regulatory failures or health and safety issues before they happen. Tests given to the workforce are one of the few ways of centrally determining skills and capabilities of your employees, and it’s worth using the data they provide to improve your organization.

How is AI going to change the picture in the short term?

Let’s move on to consider how AI may change workplace testing, particularly high-consequence testing.

In the short term, there are probably two developments that are of most interest.

The first of these is the real possibility that test-takers will use AI to help them answer questions. AI companies have marketed their tools by claiming that AI can get a reasonable score on public exams, and there is a genuine risk that test-takers could also use AI to help answer workplace tests. This is a test security threat that needs to be managed, much like other threats such as people using others to help them pass exams or sharing test questions. Measures like supervising or proctoring exams, encouraging people to take tests honestly, and using observational assessments can help.

The second development is that AI is being increasingly used to help write questions. Writing good questions is very time-consuming, and AI can be used as an assistant and can reduce the time taken considerably. AI support can handle the heavy lifting of content creation to let authors redirect their brainpower to high-level design and question review. You can select a question type and area of interest, and the AI can generate a usually high-quality question and learner feedback fast. It’s important to review the question as AI can make mistakes, but AI has huge potential to save time authoring tests. See Learnosity’s Author Aide for an example of how one such system works.

Where will AI take us in the longer term?

We don’t yet know where AI will take us in the longer term, but we do know that AI will always be better than it is in 2024, and the pace of change is likely to be rapid.

AI will likely be able to reliably rate human performance in practical tasks, so it may be possible for AI to take the place of a human observer in assessments of practical workplace skills. This will make it much easier to deploy practical assessments where people perform genuine workplace tasks as part of the assessment.

AI may also be able to score people while doing tasks, performing a so-called “stealth assessment”. This is where AI monitors and measures what is happening in the background and alerts of issues or suggests areas of improvement.

It’s also likely that many workforce tasks will be performed by people with AI assistance, and that people working together with AI will be more effective than people without AI. As such, the nature of what we assess is likely to change. So that future assessments will need to judge how people can achieve tasks with AI.

The future has lots of potential for AI to improve workplace tests and reduce the consequences of people making mistakes. In the meantime, we hope that the suggestions in this article will help you improve workplace tests in the present.

AI & Assessments: When the stakes are high

What are high-stakes workplace exams?

Creating high-consequence tests

1. Review how you set pass or cut scores

2. Write questions to test above knowledge

3. Check regularly that each test matches job requirements

4. Measure on-the-job skills with observational assessments

5. Create real-world scenarios using scenario-based assessments

Get in touch

6. Consider deterrence within your test security efforts

Combat test fraud

7. Conduct item analysis to improve question quality

8. Take action from failures and weak points

How is AI going to change the picture in the short term?

Where will AI take us in the longer term?

Why human oversight in AI-based assessments matters for bias, trust, and accuracy

L&D in 2026: from traditional to transformative

AI Scoring: How it works

Get in touch

I’m looking for

What are high-stakes workplace exams?

Creating high-consequence tests

1. Review how you set pass or cut scores

2. Write questions to test above knowledge

3. Check regularly that each test matches job requirements

4. Measure on-the-job skills with observational assessments

5. Create real-world scenarios using scenario-based assessments

Get in touch

6. Consider deterrence within your test security efforts

Combat test fraud

7. Conduct item analysis to improve question quality

8. Take action from failures and weak points

How is AI going to change the picture in the short term?

Where will AI take us in the longer term?

Why human oversight in AI-based assessments matters for bias, trust, and accuracy

L&D in 2026: from traditional to transformative

AI Scoring: How it works

Get in touch