Why we can’t blame tech for the exam result scandal

Jonathan Birch

26 Aug 2020

‘Like the sinking of the Titanic’. A contributor for The Times Education Supplement did not hold back on their assessment of the UK Government’s management of this year’s academic results.

And who could blame them? The algorithm used by Ofqual to determine final grades – in lieu of students physically sitting the exams – was not only deeply flawed but regarded by many as purposely cruel.

And just as the cruise liner famously sank, so did the hearts of thousands after A-Level results day revealed that their final grades weren’t as they had hoped. Simply because a grading model dreamt up in April said so.

A staggering 40% of results were downgraded, and those from poorer backgrounds were said to be most negatively affected by the model backed by the Secretary of State.

Whilst the UK Government is working to fix this disaster, the manner in which this has all happened raises questions about how we use algorithms and equations to make such important decisions for society.

Not the first time

A recent Wired article by Matt Burgess explores these questions in detail, and highlights how the A-Level results debacle isn’t actually the first major failure of this kind for the UK public sector. Earlier this month, the Home Office dropped a “racist” visa decision algorithm that graded people on their nationalities, and the current system deciding Universal Credit has also been judged by many to be unfit for purpose.

The difference between the exam results fiasco and the other examples referenced by Burgess, is that the model used by Ofqual wasn’t powered by technology but by an equation known as ‘Pkj = (1-rj)Ckj + rj(Ckj + qkj – pkj)’. Those not already familiar with how this equation has been structured could do worse than read the Guardian’s thorough breakdown of each section and why it was always doomed to fail.

What these models all have in common is that they have done a catastrophic job of making automated socio-economic choices based on historic data. But their failings do not stem from a decision to use tailored equations, artificial intelligence (AI) or machine learning. They come from the thinking applied to each of these tools.

Thinking ethically

Take AI for example. Kathy Baxter, an Architect of Ethical AI Practice at Salesforce, explained that we can only trust AI to make ethically-sound decisions when the technology has been structured and used in a truly ethical way.

One way to do this is through diversity of thought. Organisations can reduce exclusion and bias from their AI models by looking at the inherent discrimination –accidental or otherwise – sitting within the teams building and applying it. The most obvious place to start is by assessing who is being hired and promoted based on gender, age, race, and social class.

Working with more diverse teams can help prevent thinking becoming restricted and thereby limit discriminatory bias. But striking this kind of cultural balance is incredibly difficult when there is a major shortage of skilled AI workers to choose from. Tencent research found that there are only 300,000 skilled AI engineers globally, even though millions are needed. And a recent report from the World Economic Forum found that gender diversity alone is a huge issue within the AI field – only 22% of the global workforce is female.

Hiring challenges

The Telegraph’s Technology Editor Robin Pagnamenta points to these challenges as contributing factors to the government’s recent algorithmic and modelling woes. He stresses that within the government, only limited expertise exists, which is forcing civil servants to rely on third party providers who may have little vested interest in the consequences of their work.

But hiring that expertise in-house is difficult when salaries for skilled workers have skyrocketed due to the skills shortage. Building a diverse team of technically skilled experts is even more challenging for the same reasons, especially when the talent pool is so restricted.

In the case of this year’s academic results, the skillsets of the people responsible for pulling together the statistical model are decidedly less important. After all, AI and machine learning technologies were not part of the process that has caused so much stress and upheaval to young people across the UK.

But when Ofqual’s chief regulator Sally Collier announced yesterday that she was stepping down in light of chaos, it was said that “the fault was not hers alone”, and that “ministers have questions to answer over the extent to which they scrutinised and challenged the methodology and reliability of the statistical model, particularly given the enormity of the task and the importance of getting it right”.

The exact skillsets might not be crucially important. But the range of minds involved will be. And how the government and Ofqual thought up this mess is a question we would all surely like an answer for.