Validity through Accessibility
Kelly Webb-Davies · Lead Education AI Consultant · University of Oxford AI Competency Centre
Explore the Model ↓Written assessments have long used academic language as a proxy for thinking. But what happens when that proxy breaks?
We want to assess thinking — the ability to understand, reason, argue, evaluate, and communicate. But we can't view thinking directly. We must infer it from language, and we've historically required that language to be "prestige academic English."
Generative AI has broken the link between thinking and writing. When a machine can produce polished text, a submitted essay no longer provides dependable evidence that the student did the thinking behind it. This is fundamentally a language problem, not a cheating problem.
AI detection tools are unreliable and discriminatory — they disproportionately flag non-native speakers. Honour declarations can't verify actual authorship. The "messy middle" of permitted-vs-unpermitted AI use is impossible to police. We need a structural solution, not a surveillance one.
VFWA separates ideation (thinking) from expression (writing) by using a two-stage method for written assessments. Together, they create a closed loop of verifiable evidence.
Students demonstrate their understanding in a secure, observed environment using an unseen prompt — in whatever language or modality works best for them.
Students refine their Stage 1 ideas into a formal written product at home, with unrestricted access to AI tools. The focus shifts to how they transform and justify their choices.
Stage 1 happens in a supervised, secure environment — like an exam hall or observed classroom session. Students respond to a previously unseen prompt that ensures they are thinking in real time, not reproducing pre-prepared AI output.
Crucially, this stage is ungraded. By removing marks from the initial expression, the goal is for students to feel safe to be creative, imperfect, and genuine — creating what researchers call epistemic safety.
Students complete Stage 1 under supervision — in a classroom or exam hall — ensuring the work is authentically theirs.
The stimulus is revealed only at the start of the session. This prevents pre-prepared generative AI output and ensures real-time reasoning.
Students choose how to express their ideas — handwritten notes, argument maps, voice recordings, or any combination. They use whichever language or modality best captures their thinking.
Stage 1 is not marked. This removes the pressure to perform in "perfect" academic English and encourages authentic, unfiltered thinking.
We don't assess the language — we assess the thinking. Stage 1 gives us a window into what the student truly knows, before any tool has the chance to polish or replace it, allowing them to express their thinking in their genuine authorial voice.
In Stage 2, students take their Stage 1 "anchor" home and develop it into a formal written product — an essay, report, memo, or any discipline-appropriate artefact.
AI tools are permitted and unrestricted. Students can use generative AI for language refinement, structural editing, accessibility support, and cognitive scaffolding. What matters is their evaluative judgement: how they select, adapt, and justify the transformations they make.
Students complete this stage in their own time and space, with access to all their usual tools, resources, and support networks.
Rather than banning AI tools, VFWA legitimises their use for refinement — just as we legitimise spellcheckers, grammar tools, and translation software.
The grade focuses on how the student transforms their raw ideas into a polished product — the decisions they make and the sources they integrate.
The final submission must be traceable back to Stage 1. This evidentiary link is what makes the whole model auditable.
The question is no longer "Did the student write this?" — it becomes "Does the student own the thinking behind this, and can they justify the choices they made?"
A conditional viva is not punitive. It is activated only when:
The viva gives the student a chance to verbally demonstrate their understanding and explain the transformations they made — it is an opportunity, not an accusation.
💡 The viva replaces unreliable AI-detection tools with a human, dialogic process. It centres understanding rather than surveillance.
Every feature of VFWA exists for a reason. This table maps each design choice to its underlying rationale — so you can explain the why to colleagues and students.
| Feature | Stage | What It Does | Why It Matters |
|---|---|---|---|
| Unseen Anchor | 1 | Stimulus is revealed only at session start | Prevents pre-prepared AI output; ensures real-time reasoning |
| Observed Setting | 1 | Supervised environment | Establishes authorship without AI detection |
| Ungraded | 1 | Stage 1 carries no marks | Creates epistemic safety — students think without fear of linguistic penalty |
| Choice of Modality | 1 | Students may speak, write, draw, or mix | Removes language barriers; embraces neurodivergent and multilingual expression |
| AI Unrestricted | 2 | All digital tools permitted | Eliminates policing; reflects professional practice; supports accessibility |
| Evaluative Judgement | 2 | Assessment focuses on transformation decisions | Shifts from product reproduction to critical thinking |
| Evidentiary Link | 1 → 2 | Stage 2 traceable to Stage 1 | Creates a "closed loop" of verifiable evidence |
| Conditional Viva | Safeguard | Oral dialogue triggered by weak link | Human, non-punitive verification; replaces AI detection |
The model is adaptable to any discipline. Here are three examples showing how Stage 1 and Stage 2 connect in practice.
Students receive an unseen fact scenario and write a handwritten analysis identifying key legal issues, relevant rights, and initial arguments.
Students expand their analysis into a 1,200-word legal memorandum, using AI to refine legal language and check citations — while maintaining the arguments anchored in Stage 1.
Students receive an unseen dataset and record their initial interpretation: what the data shows, potential explanations, and any anomalies.
Students develop a formal results-and-discussion section, integrating literature and refining scientific English — but the interpretation must stem from Stage 1.
Students read an unseen passage and produce an initial close reading — annotating literary techniques, themes, and their personal response.
Students craft a formal critical essay building on their Stage 1 observations, using AI for language polishing and secondary source integration.
VFWA reframes accessibility as a condition of validity. When language or writing barriers prevent students from demonstrating what they truly know, the assessment itself becomes invalid — not the student.
For many neurodivergent students — including those with ADHD and dyslexia — speaking keeps up with the speed of thought, while writing slows it down. VFWA lets students choose the modality that captures their thinking most faithfully.
Students who think in multiple languages shouldn't be penalised for "wrong" English. Stage 1 invites translanguaging — using all linguistic resources — so that great ideas aren't lost behind a language barrier.
By legitimising AI tools in Stage 2, VFWA provides built-in cognitive support — spell-checking, grammar assistance, structural scaffolding — that traditionally required expensive accommodations. Students can translate their Stage 1 thoughts from their own language or mode, and are supported to express them into academic language using AI.