AI is very good at many parts of constructing, scoring, and reporting on assessments. It can create questions from your specific course material and select appropriate questions to build a valid test. It can score test answers, and tools like those offered by Learnosity are often as accurate as a human grader.

While AI can significantly help SMEs in many areas of the assessment lifecycle, it is still susceptible to mistakes. It can be wrong while sounding very confident and is prone to bias. So, when AI-based assessment is used, it’s critical to have human oversight of AI. Stakeholders need to be able to trust test scores when making decisions about people, including hiring and firing, assigning positions of responsibility, and guiding educational outcomes.
Why human oversight is essential for ethical and trustworthy AI-based assessment
- AI lacks ethical judgment. Unlike humans, AI cannot apply moral reasoning or interpret assessment outcomes in light of societal values, fairness, or real-world consequences.
- Human accountability is often legally and professionally required. AI systems are not legally responsible for assessment outcomes. Organizations and their staff remain accountable for certifying results and addressing errors.
- Human review helps manage risk. Oversight allows humans to identify errors, bias, or inappropriate conclusions before AI-generated results are finalized or acted upon.
- Ethical context must be applied deliberately. Human reviewers provide the contextual understanding needed to ensure assessment decisions align with organizational standards, legal requirements, and social expectations.
- Trust depends on visible human involvement. Stakeholders are far more likely to trust AI-based assessments when qualified humans actively review, validate, and approve AI outputs.
The key point to remember with AI-based assessments is that AI doesn’t have a moral compass and, like humans, can be biased and make mistakes. The strength in adding human oversight to AI-based assessments is that, between the two approaches, it is much less likely that said mistakes and biases make their way into the final result.
Effective AI oversight requires a deliberate strategy in how to recruit, train, and empower human overseers. This article gives some actionable steps on how to do this.
Challenges of human oversight in AI-based assessment

One difficulty in overseeing AI is automation bias. This is the tendency to over rely on AI or other technology. People can trust technology systems even though they might have experience, judgment, or other evidence that contradicts what the system says. For example, one reads newspaper stories where someone drives a car following an automated route direction and ends up on an unpaved road or on the water’s edge. They trust technology so much that they might almost literally drive into the sea. It’s important that human overseers are trained to resist automation bias. In assessment, automation bias might be more subtle, e.g. accepting a borderline score without review.
Beyond individual psychological tendencies, organizations must also decide how to structure human involvement in AI systems. Oversight has to be focused in a way that allows the human to genuinely oversee it. AI analyzes and produces a huge amount of information, and it can be difficult for humans to properly evaluate the AI’s work and to provide effective review and oversight. The most common approach is Human In The Loop (HITL), where a human reviewer is involved in each and every decision, for example, they review every question created or every score made.
Of course, human oversight is not effective just because you have a human in the loop. If you don’t have someone with the right knowledge and experience; if you don’t have someone with the right training; if you don’t have someone with enough time and space and context to do a proper review; all these will make human oversight ineffective. They must also be motivated and have the appropriate attitude and vigilance to properly oversee.
Here is some guidance on selecting and managing people to act as effective human reviewers for AI in assessment.
How to select effective human overseers for AI-backed assessments
An excellent ACM research paper suggests several factors for people to act as effective AI overseers:
- Domain expertise. Overseers need expertise to be able to identify when the AI is wrong and to adjust it appropriately.
- Conscientiousness. This is a character trait of people who are self-disciplined and orderly, and is likely to help with effective oversight.
- Training. Training overseers, including on how to deal with automation bias and how to spot hallucinations, will make overseers more capable.
- Motivation. Motivated overseers are more likely to be vigilant and work hard on the subtle work of reviewing and improving AI suggestions.
- Avoid exhaustion. Tired people will be less effective overseers. Have enough overseers and schedule them effectively to encourage alertness.
In the assessment context, diversity among overseers is also crucial to ensure there is sufficient context for identifying cultural bias or assumptions that AI may reflect.
Why time pressures undermine AI oversight
The ACM article also suggests that it’s important that overseers don’t have too much time pressure as pressured people will be less effective overseers.
Another article from consulting giant BCG puts this very well.
“GenAI is often employed to drive efficiency. … But thoroughly evaluating GenAI output takes time – in many cases more than the system’s designers envisioned when they set efficiency targets. Managers are often held to these targets, creating pressure on teams, intended or not, to keep the efficiencies coming. Concerned about the negative repercussions of slowing things down, people are likely to perform only cursory reviews of system outputs.”
Boston Consulting Group
Cursory review means bad review. It undermines the entire purpose of human oversight; it provides the appearance of accountability without the substance. Given the importance of assessment outcomes on test-taker lives and on society as a whole, it’s important to give overseers time to review.
Put a human oversight process with guidelines in place for AI-based assessments
As the BCG article goes on to suggest, “Guidelines are better than vibes”.
It’s helpful to give overseers detailed guidance, with examples of edge cases and what to do about them. And how and where to escalate if they need to.
For example, if overseers are reviewing AI scoring, ensure that there are lots of examples of scores at different levels, and what they should do if they disagree with the AI. Overseers work best when they clearly understand their expectations and have clear guidelines on what they should do when.
Just like managing manual graders, it’s also important to review the performance of overseers – calibrate and compare their actions to those of others and to what you expect. Intervene with additional guidance or training if some overseers are less effective than others.
Choose AI tools from responsible assessment vendors
Much of the responsibility for AI that actually supports SMEs in assessment development lies at the door of the vendor and software. With that in mind, here are some of the key issues to consider
- Human overseers need the right information and context. Clear, interpretable data and guidance help overseers make informed decisions.
- User interface matters. The interface should be understandable, allow drilling into details, and highlight areas of AI uncertainty where practical.
- Vendor responsibility is key. Well-designed AI tools require less human intervention. Purpose-built assessment AI with built-in guardrails reduces the burden on overseers.
- Design matters. Choose AI designed specifically for assessment, reflecting what “good” looks like in the assessment context.
- Oversight is not a checkbox. Effective human oversight requires planning, resources, and ongoing attention—not just token review.
- Combine qualified people with structured processes. Selecting trained overseers, giving them clear guidelines and adequate time, and embedding oversight from the start ensures AI efficiency while maintaining fair, valid, and trustworthy results.
To summarize, human oversight of AI in assessment is not a checkbox exercise. It requires thoughtful planning, appropriate resources, and ongoing attention. By using software from a responsible provider, selecting qualified overseers, and providing them with clear guidelines and adequate time, you can harness AI’s efficiency. By building oversight into your processes from the start, you maintain the human judgment essential for fair, valid, and trustworthy assessment outcomes.
Why Questionmark
Learnosity is a global leader in assessment solutions, delivering digital, dynamic experiences tailored to modern learning habits and lifestyles.
Questionmark is Learnosity’s out-of-the-box product suite of assessment solutions built exclusively for certification and workforce needs. Our secure, scalable, and modern solutions, which are designed for skill-based and highly regulated industries, enable organizations to manage the entire assessment workflow from question creation, delivery, and reporting and analytics. And because we’re powered by Learnosity, we offer a roadmap of AI product innovation to ensure our customers remain future-ready:
- Author Aide: An AI authoring tool designed to help SMEs create questions (including in bulk and from specific course or training material) 10X faster. Humans are central to the process and can review and edit questions as necessary.
- AI Scoring: An AI scoring tool designed to save SME time with grading and scoring, as well as delivering learner feedback faster than before for more efficient improvement. Humans can review scores and feedback to sense-check outputs.
- Item Bank Health Check (coming soon): This AI item bank health tool allows SMEs to review item banks at scale with AI and see if human or AI-authored questions have bias or mistakes. The AI will then suggest improvements
If you’re interested in more information about the need for humans to be in the loop of AI-powered assessments, you can download the recent Association of Test Publishers whitepaper, on the subject, which was supported by Learnosity.
For more information on how Questionmark powers AI-backed assessments, talk to us.