Assessment at an inflection point: What 2026 means for hiring and development

For the past decade, assessment has been judged on a familiar set of questions. Is it valid, is it fair, is it scalable, is it defensible? Those questions still matter. What has changed is the context in which organisations are asking them.

Across the British Psychological Society’s Division of Occupational Psychology annual conference in Cardiff, the mood was pragmatic rather than ideological. The profession is not arguing about whether assessment will change, it is wrestling with how to change it without losing psychological rigour. The pressure to innovate is real, driven by candidate expectations, speed, cost, global scalability, and the uncomfortable new reality of GenAI-enabled cheating. At the same time, scrutiny is tightening around fairness, adverse impact, explainability, and legal defensibility. Assessment teams are being asked to move faster while also being more accountable.

Three interlocking themes stood out. AI is moving closer to judgement, neurodiversity is becoming an assessment design requirement rather than a bolt-on adjustment, and validity is being treated more realistically, as evidence that has to survive organisational reality, not just academic scrutiny.

AI in assessment: from automation to active rater

There will come a time when we need to accept that AI will sit somewhere in the assessment system. The question is where. The conference message was clear: the industry is moving beyond using AI for administration and towards using it as an assessment actor.

The conference message was clear: the industry is moving beyond using AI for administration and towards using it as an assessment actor.

The shift is conceptual. Traditional digital assessment assumes a fixed scoring logic where items map to constructs and reliability is about instrument consistency. Large language models do not behave like that. They operate more like raters, reading evidence and making judgements. That is a different kind of measurement problem, with its own risks and failure modes.

The most useful discussions were not abstract claims that “AI is biased”, but where alignment between human and AI judgement holds up, and where it breaks. Behaviourally explicit evidence tends to create stronger agreement. Nuance, ambiguity, and inference create weaker agreement. This mirrors what we already know about human behavioural ratings; raters diverge most when rubric boundaries are fuzzy. The unsettling implication is that we sometimes hold AI to standards we do not consistently apply to human assessors.

“The unsettling implication is that we sometimes hold AI to standards we do not consistently apply to human assessors.”

For organisations, the message is not to rush into AI scoring or block it out on principle. It is to govern it properly. Be explicit about what the model is allowed to judge, which constructs it rates, and what “good evidence” looks like. Recognise the limits of transcript-only evidence in roles where presence and interpersonal impact matter. Design oversight that genuinely reduces risk rather than offering a vague “human in the loop” reassurance.

This direction of travel is now being reinforced by regulation. Under the EU AI Act, AI used in employment decision making is classified as high risk, requiring clear accountability, human oversight, bias monitoring, and formal technical documentation. In practice, this means AI in assessment can no longer sit as an informal layer on top of existing processes; it has to operate within the same validation and governance framework as any other selection method.

Three risks deserve attention

Automation bias can lead assessors to defer to confident system outputs. Model drift and versioning can quietly undermine consistency across cohorts. Accountability becomes blurred when ownership of the standard is unclear. Treated well, AI may act as a second rater that improves consistency or flags weak indicators; treated badly, it becomes an unaccountable decision maker.

Neurodiversity: from adjustment to universal design

A second thread was the maturation of how neurodiversity is approached in assessment. The older frame focused on adjustments after disclosure. The modern frame is better; neuroinclusion is a design challenge that begins before anyone applies.

Many familiar formats contain hidden barriers. Ambiguity, heavy context-guessing, cognitive overload, and complex instructions do not only harm experience, they distort measurement. If an assessment is taxing the wrong things, you have a validity problem as well as an inclusion problem.

The conference highlighted the psychological impact of assessment tools on neurodivergent people, particularly in personality and development contexts. When language is vague or judgemental, candidates may mask, feel threatened, or try to infer the “right” frame of reference. In that moment the tool is measuring trust and interpretation burden as much as the intended construct, reducing stability and comparability.

The direction of travel is towards universal design. Clearer scenarios, better scaffolding, explicit behavioural anchors, and lower ambiguity reduce cognitive noise for neurodivergent candidates and also reduce vulnerability to superficial faking. Inclusion and integrity can move together.

There is a subtler implication for assessment strategists. Neurodiversity is not simply about testing differently; it raises interpretation and fairness questions. Differences in patterns should prompt cautious, job-relevant validation rather than assumptions of deficit or advantage. Neuroinclusion becomes part of psychometric interpretation and design, not a separate DEI workstream.

Validity is widening, and getting more honest

The third theme will resonate with anyone involved in assessment, even if they do not use the term “validity”. Tools rarely win on predictive coefficients alone. They win on a blend of scientific credibility, candidate acceptance, operational feasibility, and internal defensibility.

The conference reflected this reality. Criterion-related validity still matters, but it is not the only lens. Evidence increasingly includes experience, fairness and consequences, transparency, and practicality. Organisations often make a category error by treating validity as a single number and then wondering why stakeholders remain unconvinced. Stakeholders care about whether the process can be defended, aligns with the job, feels fair, and can be implemented without breaking the wider system.

A strong undercurrent concerned criterion design. When performance criteria are vague or overly aggregated, relationships with predictors will be inconsistent. When criteria are behaviourally specific and job-relevant, prediction becomes clearer. The implication is not that personality is a silver bullet, but that prediction depends on how success is defined.

This connects to shifts in leadership assessment. Traditional predictors show modest relationships with outcomes, while context-specific capabilities like adaptability, emotional and cultural effectiveness, and digital leadership are gaining ground. The direction is away from static trait narratives and towards dynamic, competency-led models that take context seriously.

What this means in practice

The challenge is not to chase novelty but to build an assessment system that can evolve without becoming incoherent.

Five principles to follow:

  1. Treat assessment like an operating system, not a set of products. Integration failures are the real risk. Start with clear constructs, criteria, methods, and governance.
  2. Design criteria before tools. Without behaviourally defined success you cannot evaluate prediction or interpret results responsibly.
  3. If you use AI, govern it properly. Define what it rates, validate in context, monitor over time, and be explicit about accountability.
  4. Build neuroinclusion into design. Reduce unnecessary load, remove avoidable ambiguity, and make expectations explicit.
  5. Build a multi-lens validity case. Predictive evidence matters, but so does fairness, transparency, experience, and feasibility.

The larger point is optimistic. The conference did not signal a profession abandoning rigour. It signalled a profession learning to govern change responsibly. Organisations that thrive will be those that embrace innovation without outsourcing judgement, and that build assessment systems that are both scientifically credible and operationally sound.

Get

Isn’t it time that your company gets the tools to hire the best?

Get in touch with our sales to learn all about our solutions.

Follow

Would you like to have our content delivered to your feed? Follow us in your favorite channel!

Or subscribe to our newsletter
Curious about our assessments?

Download sample reports and see how data-backed insights can help you hire better.

Want to check out a sample report to see what Clevry can uncover?