R Statistics Analyst Portfolio · Jisc / HESA

Does graduating from a
high-skill provider
pay?

A rigorous quantile-regression analysis of 229 UK higher education providers, linking occupational outcomes from HESA Table 22 to salary distributions from Table 26 — to quantify the graduate skills premium and benchmark provider performance.

229 providers
UK higher education providers analysed
£92 per 1pp
Median salary uplift per 1pp increase in graduate-level roles (Q50 quantile regression)
£19,266
Highest skills premium above national median (Furness College)

What gap are we trying to close?

UK policy-makers and prospective students lack a rigorous, provider-level evidence base linking occupational outcomes to graduate earnings. Raw salary tables exist — but the causal signal is buried in noise.

🎯 The Core Question

Does attending a provider whose graduates enter high-skill professional roles (SOC major groups 1–3) translate into a measurable salary premium — and does this premium vary across the earnings distribution?

📊 The Data Gap

HESA publishes Table 22 (occupational outcomes) and Table 26 (salary bands) separately. No published analysis joins them at provider level to quantify the skills premium — the salary uplift attributable to professional occupation entry rates.

🏛️ Policy Relevance

Jisc and HESA need analyst-grade evidence to inform the Teaching Excellence Framework (TEF), Office for Students (OfS) Graduate Outcomes metrics, and widening participation benchmarking across 229 providers.

⚙️ Technical Challenge

Salary data is banded (not individual), counts are suppressed for small providers, and the relationship between occupational mix and earnings is heterogeneous across the distribution — requiring quantile regression, not OLS.

National Salary Range Across 229 Providers

£14,500
Min
£22,172
Median
£41,438
Max

£26,938 spread between lowest and highest-earning provider — equivalent to 186% of the national minimum

The evidence base

Two HESA Graduate Outcomes Survey tables — joined at provider level for the first time — covering UK-domiciled first degree graduates in full-time paid employment.

Table 22 — Occupational Classification

↗ HESA Source
FieldDescriptionType
providerHigher education provider name (UKPRN-linked)string
yearAcademic year (2017/18 to 2022/23)string
modeMode of former study: Full-Time / Part-Timefactor
soc_groupSOC 2020 major group (1–9 + Unknown)factor
is_graduate_roleDerived: TRUE if SOC group 1, 2, or 3boolean
countNumber of graduates (suppressed if <5)integer
pct_graduate_role% of graduates in SOC 1–3 (derived)numeric

Table 26 — Salary Bands

↗ HESA Source
FieldDescriptionType
providerHigher education provider name (UKPRN-linked)string
yearAcademic year (2017/18 to 2022/23)string
skill_groupHigh Skilled / Medium Skilled / Low Skilledfactor
salary_band14 salary bands from <£15k to £51k+string
salary_midpointBand midpoint used as numeric proxy (£)numeric
countNumber of graduates in band (0 = genuine zero)integer
weighted_mean_salaryΣ(midpoint × count) ÷ Σ(count) per providernumeric

Population Scope — Both Tables Aligned To:

🇬🇧
UK Permanent Address
🎓
First Degree Only
⏱️
Full-Time Employment
📍
Working in UK

Built in R on Google Colab Pro

Google Colab Pro chosen over free tier for guaranteed high-RAM runtime (51GB) essential for loading full HESA microdata, running bootstrap quantile regression (R=500), and serving a live Shiny dashboard via ngrok tunnel.

📦

Data Ingestion

Raw HESA CSVs loaded with skip-row handling, column harmonisation via janitor, and strict suppression-aware filtering (NA ≠ 0).

🔗

Provider-Level Join

Tables 22 and 26 joined on UKPRN + year + mode to create a unified module_b_fixed analytical dataset of 1,205 provider-year-mode records.

📐

Quantile Regression

Three quantile regressions (Q25, Q50, Q75) via quantreg::rq() with bootstrap SE (R=500, seed=42) to capture heterogeneous salary effects across the distribution.

📊

Interactive Dashboard

Full 4-tab Shiny app with bslib, plotly, DT, and shinyWidgets — live at the ngrok URL, serving real-time filtered charts and league tables.

🏆

Skills Premium Index

Each provider's weighted mean salary benchmarked against the national median (£22,172) to produce a signed skills premium in £ — enabling a ranked league table of 229 providers.

🌐

Public URL via ngrok

Shiny served on port 3838 using processx background process; ngrok binary tunnels traffic to a public HTTPS URL without firewall configuration.

Analysis Pipeline

1

Load Raw CSV

Tables 22 & 26 from HESA

2

Clean & Align

Suppress NA, filter scope

3

Join on UKPRN

Provider-level merge

4

Compute Metrics

Weighted means, % SOC 1–3

5

Quantile Regression

Q25 / Q50 / Q75

6

Skills Premium

Provider vs national median

7

Shiny Dashboard

Interactive exploration

tidyverse
Data wrangling pipeline
quantreg
Quantile regression (rq)
shiny + bslib
Interactive dashboard
plotly
Interactive charts
DT
Sortable data tables
janitor
Column standardisation
glue
String interpolation
processx + ngrok
Dashboard deployment

What the data reveals

Six headline findings from the analysis of 229 providers, 1,205 provider-year records, and three quantile regression models.

£22,172
National Median Salary
114
Providers Above Median
£92
Median Salary Uplift per 1pp SOC 1–3
229
Providers Analysed
Top 15 Providers by Skills Premium
Salary above national median (£22,172) — 2017/18 baseline
Finding 1: Furness College leads with a £19,266 premium, followed by The London Institute of Banking & Finance (+£18,703) and University College of Estate Management (+£17,059). Notably, these are specialist institutions — not Russell Group universities — suggesting that vocational-professional alignment drives salary outcomes more than institutional prestige.
Skills Premium vs % in Graduate-Level Roles
Each dot = one provider (n=229). Colour = above/below national median.
Finding 2: There is a positive but heterogeneous relationship between % graduates in SOC 1–3 and provider mean salary. The relationship is stronger at the top of the distribution (Q75 slope = £172/pp) than the bottom (Q25 = £80/pp) — meaning high-earning providers gain disproportionately more from occupational upgrading.
Quantile Regression Results
QuantileEstimate (£ per 1pp)95% CISignificance
Q25 — Lower quartile£80£58 – £102p < 0.001
Q50 — Median£92£60 – £124p < 0.001
Q75 — Upper quartile£172£142 – £203p < 0.001
Finding 3: All three quantile estimates are highly significant (p<0.001). The Q75 estimate (£172) is more than double the Q25 estimate (£80), confirming that the salary return to graduate-level employment is not uniform — it is largest for providers already in the upper salary tier.
Estimate by Quantile
Providers Furthest Below National Median
Skills deficit — salary below £22,172 national median
Finding 4: The Northern School of Art records the largest negative premium (−£7,672), followed by Askham Bryan College (−£6,005) and York College (−£5,672). Creative arts and agricultural colleges dominate the bottom tier — sectors where graduates commonly enter low-paid but vocationally satisfying roles.
Insight 5 — Skill Group Salary Gradient
£20,291
Low Skilled
Graduates entering SOC groups 7–9
£22,256
Medium Skilled
Graduates entering SOC groups 4–6
£27,635
High Skilled
Graduates entering SOC groups 1–3
Finding 5: The salary gradient across skill groups is £7,344 from Low to High Skilled — a 36% premium. However, this masks substantial within-group variation: some Low Skilled providers (e.g. Furness College) earn above the High Skilled average, reflecting sector-specific wage structures independent of occupational classification.
Insight 6 — 100% Graduate Role Providers: Mixed Outcomes
Pearson College
£32,700
100% SOC 1–3 · Premium +£10,528
Arts Educational
£32,250
100% SOC 1–3 · Premium +£10,078
Royal Academy of Music
£16,500
100% SOC 1–3 · Premium −£5,672
Finding 6: 100% SOC 1–3 graduate rate does not guarantee a positive skills premium. The Royal Academy of Music places all graduates in professional roles yet records a −£5,672 deficit. This reflects arts sector wage norms — professional classification does not always translate to high pay. The quantile regression captures this non-linearity.

Publication-quality outputs

Interactive Plotly charts generated directly from the R analysis — embedded below for reference alongside the live Shiny dashboard.

Chart 01 — Earnings by Skill Group
Chart 02 — Skills Premium Scatter with Quantile Regression Lines
Chart 03 — SOC Occupational Breakdown by Provider

Statistical rigour & caveats

This analysis follows HESA official statistics publication standards — including suppression handling, population alignment, and transparent uncertainty quantification.

📐
Salary Estimation

  • Salary band midpoints used as numeric outcome (true salaries unavailable)
  • Open-ended upper band (£51,000+) assigned conservative midpoint of £57,000
  • Weighted mean = Σ(midpoint × count) ÷ Σ(count) per provider-year-mode group
  • Providers with fewer than 10 graduates excluded (unreliable estimates)

🔒
Disclosure Control

  • HESA suppressed cells (marker ".") excluded from all calculations
  • Zero counts treated as genuine zeros, not suppressed values
  • No imputation applied — missing values excluded, not filled
  • Small provider results flagged; interpret with caution

📊
Quantile Regression

  • rq() function from quantreg package (Koenker, 2023)
  • Bootstrap standard errors: R=500 replications, seed=42
  • Quantiles: Q25 (lower), Q50 (median), Q75 (upper)
  • Predictor: % graduates in SOC major groups 1–3
  • Outcome: provider-level weighted mean salary

🔗
Population Alignment

  • Table 22 filtered to Undergraduate level to match Table 26
  • Both tables: UK permanent address, full-time paid employment
  • Table 22 covers 2017/18 only; Table 26 spans 2017/18–2022/23
  • Join performed on UKPRN (provider reference number) + year + mode

⚠️ Important Caveats

Salary midpoints approximate true earnings. The open-ended upper band introduces uncertainty for high-earning providers. Table 22 SOC classification reflects graduate destination 15 months after graduation — not lifetime outcomes. Response rates vary by provider (see HESA Table 5). Skills premium is descriptive, not causal — unobserved provider and student characteristics confound the relationship. HESA data used under Creative Commons Attribution 4.0 International licence.