Skip to main content
Back to Blog

The working teacher's guide to IB English Paper 1 marking criteria 2026

By Steven Swanson, Founder of ClassLens·

If you teach IB English A and you have been bracing for a Paper 1 rubric overhaul this May, you can put the brace down. Paper 1 did not change. Four criteria, five marks each, twenty marks per response. Same rubric at SL and at HL. Same rubric for Language A: Literature and for Language A: Language and Literature. The change everyone is talking about is the Paper 2 change. The two papers are running on different cycles, and Paper 1's marking criteria have been carried forward unchanged from the 2021 first-assessment guides into the May 2026 cycle.

That should be reassuring, and it mostly is. The complication is that almost every staffroom conversation about "the new IB English criteria" right now is about Paper 2. Three or four teachers in a department will read the Paper 2 summary, half-internalize it, and start applying the tighter Paper 2 logic to Paper 1 mocks, and the scores drift in ways that nobody can explain at the moderation meeting. The most common Paper 1 marking error I am hearing about this year is not a misreading of the descriptors. It is a quiet contamination from the Paper 2 change.

If you are teaching at a school in one of the affected Middle East countries (Bahrain, Iran, Iraq, Israel, Jordan, Kuwait, Lebanon, Oman, Palestine, Qatar, Saudi Arabia, or the UAE), your May 2026 looks different from the rest of the cohort. The IB announced on 30 March 2026 that the May 2026 examination session was cancelled in the UAE, and subsequently in Bahrain, with the Non-Exam Contingency Measure (NECM) and other flexibilities (transfer, defer, withdraw with refund) available across the broader affected region on a country-by-country basis as national authorities decide. Where NECM applies, your students' grades are being determined under externally assessed coursework and teacher-predicted grades rather than the timed paper, and your marking workload has shifted accordingly. The Paper 1 criteria still apply to the work you are reading; the timing and stakes are simply different. The rest of this post applies regardless of which path your school is on this cycle.

This post is the criterion-by-criterion grading guide I wish someone had written for me before I started a Paper 1 mocks pile. It is not a vendor pitch. One section near the end describes ClassLens; one after that lists its limits. Skip both and the rest of the post still earns its keep.

What the rubric actually says

Paper 1 is the guided analysis paper. In Language A: Literature it is called the guided literary analysis. In Language A: Language and Literature it is the guided textual analysis. Different name, identical rubric. The current Subject Guides, first assessment 2021, carry the four criteria forward into May 2026 without revision.

The four criteria, the marks per criterion, and the Band 5 descriptors as printed in the IB Subject Guides (Literature Guide pp. 37-39 and Language and Literature Guide pp. 36-37) are:

Criterion A. Understanding and interpretation. 5 marks.

Band 5: "The response demonstrates a thorough and perceptive understanding of the literal meaning of the text. There is a convincing and insightful interpretation of larger implications and subtleties of the text. References to the text are well chosen and effectively support the candidate's ideas."

Criterion B. Analysis and evaluation. 5 marks.

Band 5: "The response demonstrates an insightful and convincing analysis of textual features and/or authorial choices. There is a very good evaluation of how such features and/or choices shape meaning."

Criterion C. Focus and organization. 5 marks.

Band 5: "The presentation of ideas is effectively organized and coherent. The analysis is well focused."

Criterion D. Language. 5 marks.

Band 5: "Language is very clear, effective, carefully chosen and precise, with a high degree of accuracy in grammar, vocabulary and sentence construction; register and style are effective and appropriate to the task."

Twenty marks per response. Each criterion is scored holistically as a best fit against the band descriptor, not by tallying sub-features inside it. The Subject Guides are explicit on this point and so is the examiner training.

SL versus HL: same rubric, different volume

SL students choose one of two unseen passages and write one response in one hour fifteen minutes. The response is scored against the four criteria above, out of 20 marks total. Weighting is 35 percent of the final grade.

HL students write two responses, one on each of two unseen passages, in two hours fifteen minutes. Both responses are required (no choice between them). Each response is scored independently against the same four criteria, out of 20 marks each. The HL paper total is 40 marks, summed across the two responses. Weighting is 35 percent of the final grade.

A note on HL passage choice and order. HL students do not pick between the passages, but they do pick the order in which to write the responses. The defensible move is to read both passages quickly before committing, then write first about the one with the clearer interpretive entry point. Saving the harder passage for the second response often costs marks across Criteria C and D, because the student is now writing under fatigue on the response that was already going to be harder. Some teachers coach the opposite (hardest first while the mind is fresh); both schools have proponents. Whichever you coach, coach the choice explicitly so students aren't deciding it for the first time under exam pressure.

The IB confirms in the Language and Literature Guide, page 41, that "the assessment criteria for this paper are the same at HL and SL." There is no separate HL band descriptor and no HL bonus. The HL student is being asked to do the same quality of work twice, against the same standard.

This matters for how you mark HL papers. Each response is marked independently. A strong response 1 cannot rescue a weak response 2. In practice, the largest single source of lost marks at HL is uneven time management: students who pour ninety minutes into response 1 and write response 2 in forty-five minutes lose a full band on Criteria C and D in the second response even when their analytical work would have been at the same level. If you are noticing that your HL students' second responses are systematically a band below their first, the lesson is time-budgeting, not analysis.

The guiding question is optional, not the scorecard

This is the single most common Paper 1 misunderstanding I hear from teachers new to the IB. Each unseen passage comes with one guiding question, and the IB Subject Guide is explicit that it is "not compulsory" to answer it. The guiding question is there to help the student find an analytical foothold under exam pressure. The four criteria reward the analytical response itself, not whether the student answered the guiding question. A student who ignores the guiding question entirely and writes a strong response that follows a different analytical thread can still earn Band 5 across all four criteria. A student who answers the guiding question diligently but descriptively earns Band 3 at most.

What this means in the marking pass: do not reward "the student answered the guiding question well." Reward the analytical work the student did on the text, against the four criteria. The guiding question is scaffolding, not a scorecard.

What examiners are flagging (and how teachers mis-apply Paper 1)

The IB's full Subject Reports live behind the Programme Resource Centre login and I will not quote what I cannot link to. What I can do is summarize the consensus that has emerged from teacher-facing commentary on the May 2024 session, including examiner-authored InThinking pages and the wider test-prep secondary literature (LitLearn, Hack IB, Young Scholarz). Three patterns recur often enough that they should reshape how you read.

Description as analysis (Criterion B inflation). This is the dominant Paper 1 error in teacher-predicted grades. A student writes "the author uses a metaphor of the city as a body, with veins for streets and a heart for the central square." That is a description, not an analysis. Analysis is "the author's metaphor of the city as a body recasts urban infrastructure as biological necessity, naturalizing inequality between the heart and the extremities." Identifying a device is Band 2 work. Evaluating how the device shapes meaning is Band 4 or 5. Teachers reward identification too generously because identification is easier to grade quickly. Examiners do not.

The practical fix is to read Criterion B with a single question in mind: did the student tell me what the feature does, or only what it is? "Is" is descriptive. "Does" is analytical. If the essay only tells you what the features are, the ceiling on B is 3, regardless of how many features the student noticed.

The five-paragraph template at Criterion C. Criterion C (Focus and organization) rewards "effectively organized and coherent" presentation of ideas. It does not reward the mechanical five-paragraph essay structure that produces an introduction, three body paragraphs, and a conclusion regardless of whether the text supports that shape. A response that hammers a poem into five paragraphs when the poem turns on a single volta is structurally fighting the text, not organizing analysis of it. Teachers reward the formula because it is recognizable. Examiners penalize it when the formula obscures the argument.

The practical fix is to ask whether the structure serves the analysis or constrains it. A four-paragraph response that follows the logic of the text earns higher Criterion C than a five-paragraph response that retrofits the text into a template.

Conflating guiding-question compliance with quality. I said this above. I am saying it again because it is the most consistent low-leverage error in teacher predicted grades. The guiding question is not scored. A response that answers the guiding question well but stays descriptive earns Band 3. A response that ignores the guiding question and produces sharp interpretive work earns Band 5. Teachers who reward guiding-question compliance are marking the wrong artifact.

Best fit, not bottom up

Each criterion is marked as a best-fit holistic judgment against the band descriptor. The Subject Guides say so, the examiner training says so, and the moderators apply it that way. The implication is that you cannot mark Paper 1 by tallying sub-features and adding them up. You read the response, locate the band descriptor it most closely matches, and award the mark.

For experienced teachers coming from systems with detailed mark-by-feature rubrics (AP Lang, A-Level English, even the older Paper 1 rubric pre-2021), the best-fit instinct takes a few cycles to rebuild. The first sign that you are mismarking is that your scores cluster tightly in the middle (every essay at Band 3) because tallying produces averages. Best-fit marking should produce a wider spread, because the band descriptors are themselves discriminating.

A useful internal check: after marking a response, ask yourself which Band 5 phrase the response did or did not deliver. If you cannot name the missing Band 5 quality, the response is probably at Band 5. If you can name it crisply, the response is at the band below.

The text types matter

Paper 1 passages are unseen, but the kind of unseen passage rotates predictably. Literature Paper 1 draws from four literary forms: prose fiction, prose non-fiction, drama, and poetry. Language and Literature Paper 1 draws from non-literary text types: op-eds, advertisements, speeches, infographics, blog posts, interview transcripts, public-information leaflets, and so on. The IB rotates deliberately, so a student who has only practiced two forms is exposed.

When you mark, hold the text type lightly in mind. A Criterion B analysis of an advertisement involves visual rhetoric and target-audience inference that an analysis of a poem does not. The rubric is the same; the surface features the student should be analyzing are not. A response that analyzes an advertisement using only the vocabulary appropriate to a poem (imagery, line breaks, metaphor) is missing half of what's there. The best-fit on Criterion B should reflect that miss.

The learner portfolio is the official preparation vehicle for Paper 1, named explicitly in the Subject Guides. Most teachers under-use it. If a student arrives at the Paper 1 exam having only ever practiced two of the four forms, that is a portfolio-coverage problem, not an exam-day problem. The portfolio is also the place where best-fit marking on practice passages should be modeled aloud so students internalize what each band actually rewards.

Three habits that hold up on Paper 1

These are the three habits I would commit to before the next Paper 1 mock cycle. None of them require a tool. All three are teacher-to-teacher recommendations.

Mark each criterion in a single pass. Read the response straight through, then make four best-fit decisions in order: A, B, C, D. Do not stop mid-essay to score a criterion. Stopping fragments your sense of the response as a whole and pushes you toward bottom-up marking. The best-fit judgment is a judgment about the whole response; you have to read the whole response before you make it.

Anchor the stack before you grade it. Pick three essays from the pile, ideally ones you suspect will fall at Band 2, Band 3, and Band 5. Mark those three slowly. Write one sentence per criterion explaining the band call. Tape the sheet to your desk and mark the rest of the stack against those anchors. Calibration drift across a stack of 50 is the largest avoidable source of unfair Paper 1 marking, and the anchor sheet is the cheapest fix.

For HL, mark response 1 and response 2 in separate sittings. Or at least separate them with a longer break than you take between SL essays. The fatigue accumulated reading response 1 carries into response 2 and pushes Band 4 work into Band 3 territory in your eye. Splitting the marking sittings restores the calibration for the second response. The IB marks them independently. You should too.

A fourth habit, optional and only for teachers marking very high volumes (80 plus responses per cycle): take a 90-second break between every essay, and a longer break after every batch of 10. Fatigue costs accuracy more on the tighter 5-mark Paper 1 bands than on wider rubrics.

Where ClassLens fits

ClassLens is an AI-assistive grading and teaching tool that drafts grades and feedback against a rubric the teacher configures. For a Paper 1 marking workflow, it slots in after the teacher has already hand-marked the first three or four responses in a stack to build a calibration anchor. Once the anchor is set, the rest of the stack can be uploaded. ClassLens drafts a band score for each criterion and a short justification per response. The teacher reviews each draft in the Batch Review Dashboard, edits anything they disagree with, and clicks "Return Checked" to release the reviewed grades to students in a single batch. Auto-return was permanently removed from the product on 2026-04-12. There is no mode in which an AI-generated grade reaches a student without teacher review.

For Paper 1 specifically, the teacher pastes the four IB criteria and their own interpretation of the band descriptors into the rubric configuration, picks a strictness level, and processes a batch. The Batch Review Dashboard also produces a knowledge gap report at the class level, showing which criterion bands are strongest and weakest. On Paper 1, Criterion B (analysis and evaluation) is the typical weak point in a cohort regardless of how it's diagnosed; if it surfaces in the Knowledge Gap Report, you can drive the next mini-lesson from the description-versus-analysis pattern without compiling it by hand.

On data posture, AI inference runs on Google Cloud Vertex AI under the Google Cloud Data Processing Addendum with Zero Data Retention enrolled, no model training on submissions, and submissions processed transiently. ClassLens is Google OAuth verified including the restricted Drive scope, CASA Tier 2 complete, SOC 2 Type I attested by Percilchofe CPA LLC (License No. 1188), and a CISA Secure by Design Pledge signatory. International school IT directors will recognize this as the Google Cloud Workspace stack with the standard no-training, no-retention posture. The free tier is 100 submissions per month, enough for a single section's Paper 1 mock cycle. Paid tiers are Pro at $10 per month (1,000 submissions) and Max at $20 per month (5,000 submissions). Sign in at classlens.com or install via the Google Workspace Marketplace.

Disclosure: I am the founder of ClassLens. I am also a working high school teacher, though not at an IB school. Take the prior paragraphs as a tool description, not a peer recommendation. The IB DP English context is not my classroom. If the description sounds useful, test the workflow against your own marked responses before trusting it on a live cycle.

What ClassLens won't do

The honest list. ClassLens drafts grades. It does not replace your judgment on Paper 1, and there are specific places where its draft is reliably worse than yours.

It does not reliably distinguish description from analysis. This is the single most important Criterion B judgment, and it is the one the AI gets wrong most often. The AI will see a student identifying three features and call it analysis. You will see the same response and notice that the student never told you what the features do. Criterion B is the criterion to override most carefully.

It does not handle the text-type-specific reading. A poem read with the analytical vocabulary appropriate to an advertisement, or an advertisement read with the vocabulary of a poem, looks "thorough" to the AI in a way it should not look thorough to you. Hold the text type in mind on every B override.

It does not catch the five-paragraph template problem on Criterion C. The AI rewards organized prose, and the five-paragraph essay is recognizably organized. The AI will give it Band 4 or 5 on C even when the structure is fighting the text. This is the second-most- common override.

It does not hold a stable rubric anchor across a long stack. After 40 or so responses, the AI's calibration starts to drift in either direction. Reset the rubric configuration between large batches, or hand-mark a fresh anchor essay every 30 to 40 responses to keep the draft scores honest.

It does not handle pedagogical context. If a student in your stack has been working all year on getting Criterion D from Band 2 to Band 3, the AI does not know that, and its margin comments will not reflect it. Your handwritten "the sentence variety has finally shifted, this is what we have been working on" is doing work the AI cannot.

It does not grade oral commentary. Individual Oral assessment is a conversation, not a text. ClassLens reads text. The IO sits outside what the tool is good for, and I would not try to use it there.

It does not handle high-stakes grading well, and I would not use it for the actual May submissions or NECM coursework that goes to the IB. ClassLens is for mocks, drafts, formative assignments, and the high-volume rubric work that does not require your final professional judgment. Anything that reaches the IB or a student transcript should be marked by you, with the AI draft as a second-opinion check at most.

The thing it does well is the mechanical, high-volume part of marking. The thing it does not do is the interpretive, contextual, or relational part. Teachers who try to use it for the second part get burned, and rightly.

Closing

Paper 1's rubric did not change for 2026. If you have been losing sleep over recalibration, the sleep was probably unnecessary. What did change is the staffroom noise level, the temptation to import the Paper 2 logic onto Paper 1 marking, and the volume of online posts conflating the two. The fix for all of that is the rubric itself: the four criteria, the five-mark bands, the verbatim descriptors above. Print them. Tape them to your desk. Mark against them.

If you have a stack of past Paper 1 mocks in a Google Drive folder and you want to see what an AI-drafted grade looks like under the (unchanged) criteria before trusting it on a live cycle, that is a clean way to test the workflow. Sign in with any Google account at classlens.com, paste the four criteria and your interpretation of the bands into the rubric configuration, and grade up to 100 submissions per month free. The places where you disagree with the AI are the places where your professional judgment is most valuable, and also the places where you should not let an AI override you.

If you teach Paper 2 as well, the companion guide to the May 2026 Paper 2 criteria changes is here, and it walks through what actually did change on that paper, what it means for marking, and where the contamination risk into Paper 1 marking comes from.

Whether your cohort is sitting the timed paper this May or completing under NECM, the Paper 1 work is the same. Read the response. Make four best-fit calls. Move on.

Sources

Steven Swanson is a high school engineering teacher with 22 years of classroom experience and the creator of ClassLens, an AI-powered grading tool built for Google Classroom. Try it free at classlens.com.

Try ClassLens Free

AI-powered grading for Google Classroom. Set up in under five minutes. No credit card required.