Standard Setting Non-Existent Exams

I wrote a blog about those #SQAresults, #covid, the Scottish education system, and tried not to panic about what’s going to happen next academic year …

The Scottish Qualifications Authority (SQA) released its exam results this week to huge uproar. This is a fascinating, and horrifying glimpse into what might await us in higher education, and the rest of the UK. Let’s talk about it. 

The SQA Higher exams are sat in 5th (15-16 years old)  and 6th year (16-17 years old), and are typically the exams that get you into university. Typically, the maximum you can sit in a year is 5. I have 6 Highers, because I sat 5 in 5th Year and 2 in 6th Year (and failed my Higher Psychology – a discussion for another day). There are also Advanced Highers, which we won’t discuss here. 

To get in to do Zoology at the University of Glasgow, you now need 5 A Highers at the end of 6th Year (it was easier back in my day!).

You sit a preliminary exam around Christmas time (the prelim) which are, to the best of my knowledge, set by the individual school based on what has been taught so far. The actual Higher paper, sat in May, is held at the same time and same place with the same paper across the country. This year, students could not sit their Highers, and so the SQA asked teachers to estimate what they thought students would get instead. 

The majority of students who sit these exams are aged 15-18 years old. Over the past four years, 76.8% (+/-1.2%) of students aged 15-18 have achieved an A, B or C grade in their Highers. This year, teachers estimated that 88.9% of this age category would achieve an A, B, or C grade. 

The SQA had a problem. 

The teachers estimates would have meant a 12% point rise in the number of students across the board who received a A-C Higher grade. Why were the teachers estimates so high? What was the SQA going to award students? Could the SQA use the students’ last exams, a prelim that wasn’t standardised across material or paper, to fairly discriminate between ‘excellent’ and ‘satisfactory’ students? What were they going to do?

Option A: Use the teachers estimates

Teachers were told to guess at what their student could do on their best possible day. This, I think, is a crucial mistake in the story, because best possibly days are rare, and have big impacts on performance. I rarely ever had my best possible day on my exams (see my failed Higher Psychology). I expect teachers also felt very sorry for students, and I expect they wanted to support students through this. I would not be surprised if a few schools leant on their teachers to whisper “hey, we could do with some better results this year”. The result? The estimates were far out of line of normal exams. 

If the SQA awarded the estimated grade, they would devalue the exam and the accreditation. This is complicated because the SQA is also the first national examining body in the UK to release grades, thanks to Scotland’s early summer. Every year, we get stories about how grade inflation is making exams easier to pass, and their results harder to trust. These stories arise from creeps of 2 or 3% points. 12% points would have been scandal.  Scottish students would have found it difficult to use those grades to demonstrate their ability, and access to university may have been a challenge. The SQA may have feared that other examining bodies would take a different line, and they would disadvantage Scottish students by being perceived as lenient, who knows? Certainly the rest of the UK is watching Scotland right now. 

Option B: Use the prelim grade

The next solution may have seemed logical – why not use the last exam the students sat? The one that they would have based any appeals on in a better year? (My Higher Psychology prelim was a C if I remember correctly. The appeal went nowhere). 

Exams are a pretty poor way of assessing students. The one thing we can agree on is that you can be broadly sure the right student is sitting in the right seat (ehhhhh), and that every student is seeing the same paper at the same time. At a national level, that requires a massive amount of coordination. It is a phenomenal amount of work to ensure that the Higher Psychology paper I sat in C201 in Park Mains in 2004, is the exact same paper that the other 2778 students were sitting. That when I left, the first moment I could, enough time had passed that I wasn’t likely to be texting my pal in Stornoway the answers. That me and the other 826 students who failed that paper were all fairly marked. It is an exercise in logistics that prelims, which are taken from past papers (in fact, I think I knew exactly which past papers were being used in my Psychology prelim), and are dictated at the level of the school, cannot match up to. 

Again we come back to standards. If the students didn’t all sit the same exam, how can we be sure that these 2020 grades are the passport to the future our schooling system is built on?

Option C: Standard set

And so the SQA took a third road. If about 77% of students usually achieve an A-C grade, then we have no real reason to assume that in a normal year, about 77% of students wouldn’t achieve the same.

But therein lies the rub. The SQA did not take the average of everyone – it took the average of your school, perhaps hoping to smooth over that prelim issue a little. Unfortunately . . . exams are a really, really terrible way to assess students, and consistently students in lower Scottish Index of Multiple Deprivation (SIMD) categories, perform poorer. If you’re in the poorest 20% of the population, you are probably going to a school in a deprived area, with other poor students. Historically, your school will do poorly . . .

And this is what the data shows. 

Most peoples scores were inflated above the usual. Most peoples scores were brought back in line with what their school would likely do. Some very bright students in poor areas have probably done very poorly. Some middling students in very good schools may have benefits. There has been a lot of anger about this: 

And some more big picture observations

Model Answer: So what do we do?

The Scottish Greens have issued a ‘no detriment’ petition, which I have signed. This petition proposes that students should at least achieve the grade they achieved at their prelim.  But I actually don’t think this is a good answer either. 

The Scottish Government have assured students normal appeals procedures will go ahead, taking prelims into account, but I know from personal experience this doesn’t always get you the result you want, and up the page we just said prelims weren’t standardised, so . . . what do we do?

These exams didn’t happen. Even if they had, they would have been as shit as they always are in terms of equity, diversity and inclusion. COVID will disproportionately affect students in deprived areas, so why are we trying to pretend that four or five letters besides someone’s name, plucked from the aether, can tell us anything about these students abilities?

If I was in charge of university admissions, or had the ear of parliament and the SQA, I’d be advocating for “NULL” in those fields. I’d be advocating for more holistic assessment of incoming students to uni, much like Multiple Mini Interviews in medicine and veterinary medicine, and I’d be advocating for Scotland to take the lead here, because we need to fix this issue. We could be Finland, but we playing.

Exams are shit at assessing anything but whether a student can sit an exam. I don’t set exams in my courses at uni for this very reason, instead I set skills-based assessments wherever I can. I’m not perfect at this, and I could do better. I’ve recently had interesting conversations on twitter about whether we in the UK have an overly aggressive quality assurance approach when it comes to exams, and flexibility in QA this year is something I was firm we had to raise in our 10 Simple Rules paper. But I do like the Scottish Credit and Qualifications Framework, I like what it tries to standardise in terms of assessment throughout all levels of Scottish education.

I just don’t think we should pretend students have sat exams that they haven’t.

Covid fucking sucks. 

You can find my visualisation code over on github.