Accessibility Debt Is Technical Debt: A Framework

Image description: A stack of red sticky notes on a glass office wall, the top one marked ‘DEBT’ in bold marker, with a blurred kanban board visible behind — the visual marker for accessibility-debt accounting in an engineering organisation.

Reading Time: 11 minutes

Every engineering organisation past its first eighteen months carries a register — formal or informal — of technical debt. The shape is familiar: a Jira label, a spreadsheet, a quarterly review with a VP of engineering, a severity column, a likelihood column, and a triage call that decides what gets paid down this quarter and what gets carried forward. The accounting is rough but it is real: leadership knows roughly how much debt the codebase carries, where it is concentrated, and what it costs to ignore another quarter. Accessibility debt — the accumulated set of WCAG failures, ARIA mis-implementations, keyboard traps, missing labels, contrast deficits, focus-order regressions, and inaccessible components shipped to production — is, in every meaningful sense, technical debt. It is documented in audit reports the same way bug debt is documented in error-monitoring tools. It compounds the same way: every new feature built on an inaccessible component multiplies the remediation cost. It carries interest in the form of class-action exposure, regulatory fines, and lost users. And yet most engineering organisations track it on a parallel ledger that never makes it to the technical-debt review.

This essay proposes folding accessibility debt into the engineering-debt accounting that already exists. Three concrete instruments do the work: a CVSS-inspired severity score that combines axe rule severity with component visit-rate and a user-impact tier; a remediation-cost estimator built from lines-of-code-touched and file-coverage; and a portfolio view that lets a VP of engineering see debt-by-component and debt-by-WCAG-pillar in the same dashboard that already shows their P1 bug backlog. The argument is not that accessibility belongs in engineering instead of in design or product — it lives across all three. The argument is that engineering leaders already have a competent triage framework for risks that compound silently, and the right move is to put accessibility under it rather than invent a parallel one that competes for attention.

The accounting frame

Treat the technical-debt ledger an engineering organisation already keeps as the model. In a healthy ledger, every debt item carries five attributes: a component (where in the codebase it lives), a severity score (how bad the consequence is if it is exploited or hit), a likelihood signal (how often the affected surface is actually touched in production), an estimated remediation cost (engineer-days, lines of code, files involved), and a portfolio bucket (security debt, performance debt, dependency debt, test debt). The ledger is reviewed quarterly. A burn-down chart tracks total debt over time. A small fraction of the engineering team’s capacity — typically 10 to 20 percent depending on the organisation’s maturity — is reserved for paying it down.

Accessibility findings, as they come out of an audit, do not naturally fit any of those columns. A typical audit report lists violations by WCAG success criterion (“1.1.1 Non-text content: missing alt”), severity by axe-core or WAVE classification (“critical / serious / moderate / minor”), and a page or screenshot reference. It does not say which component the violation lives in. It does not say how often the affected page is actually visited. It does not estimate remediation cost. And it does not bucket by anything other than WCAG pillar — which is a taxonomy designed for compliance reporting, not for engineering triage. The first job of the framework is to translate audit findings into the same five-column shape the rest of the debt ledger uses, so the same review meeting can talk about both.

Severity multiplied by likelihood

The Common Vulnerability Scoring System (CVSS), the industry-standard severity score for security vulnerabilities, is built from three groups of metrics: base (intrinsic properties of the flaw), temporal (state of exploit and patch availability), and environmental (relevance to the specific deployment). The base score combines an exploitability sub-score with an impact sub-score and produces a number from 0 to 10. The temporal and environmental scores adjust the base for the specific organisation’s context. The whole apparatus is designed so a generic finding — “CVE-2024-XXXX, base score 7.4” — can be re-scored locally by a defender who knows what their own deployment actually exposes.

An accessibility severity score modelled on CVSS would carry the same three layers. The base layer is the axe-core or Lighthouse severity rating for the rule that was violated — a “serious” violation on the rule “button-name” carries a base score in the 7-to-8 range; a “moderate” violation on “landmark-one-main” carries something in the 4-to-5 range. The base layer is the same whether the violation is on a marketing landing page or in a checkout flow. The environmental layer applies a multiplier for component visit-rate: a violation on the checkout page (which 100 percent of paying users hit) gets a multiplier of 1.0; a violation on a help-centre article that 4 percent of users visit gets a multiplier of 0.04. The visit-rate multiplier turns a generic finding into a finding scaled to the organisation’s actual traffic. The user-impact layer applies a tier multiplier for which assistive-technology users are blocked: a missing alt attribute on a decorative image blocks no one (tier 0); a missing label on a search input blocks every screen-reader user (tier 1); a keyboard trap blocks every keyboard-only user including people who use switch input and voice control (tier 2 — the broadest blast radius).

The combined severity score is the product: base × visit-rate × impact-tier, normalised to a 0-to-10 scale. The result is that a “serious” axe finding (base 7) on a checkout page (visit-rate 1.0) blocking every screen-reader user (tier 1) scores roughly 7.0 — a P1. The same “serious” finding on a deprecated admin page (visit-rate 0.005) blocking the same audience scores about 0.04 — a backlog item. A “moderate” axe finding (base 4) on the front-page hero (visit-rate 0.9) blocking every keyboard user (tier 2) scores about 7.2 — still a P1. The scoring captures the intuition that severity alone is not enough: a serious violation on a page nobody visits is less urgent than a moderate violation on the most-visited page in the product. CVSS made this same move for security a decade ago. Accessibility deserves the same treatment.

The remediation-cost estimator

The other half of the triage decision is cost. A P1 severity score that costs 200 engineer-days to remediate gets prioritised differently from a P1 severity score that costs 0.5 engineer-days. Engineering leaders make this trade-off implicitly all day; the cost estimator gives them a number to argue about rather than a feeling. The estimator is built from two signals available from the codebase itself: lines of code touched per fix (LOC-touched), and file coverage — how many files would change if the fix is applied consistently.

A missing-label fix on a single input is a one-file, two-line change. A missing-label fix on a shared input component used in 47 places is still a two-line change in source — but the file coverage is 47, the QA surface is 47 screens, and the design-system review touches the entire form library. A keyboard-trap fix in a custom date-picker that lives only in one route is a small change. A keyboard-trap fix in a custom date-picker that has been copy-pasted into eight teams’ routes over the past three years is a large change, because the consistent fix requires either eight parallel patches or a consolidation onto a single shared component first. The estimator does not need to be precise. It needs to be in the right order of magnitude — one engineer-day, ten engineer-days, fifty engineer-days, two hundred engineer-days — so that the triage call can compare two remediations with different shapes.

A useful heuristic the framework borrows from refactoring-cost estimation: cost grows linearly with LOC-touched up to about 50 lines and approximately with the square root of file coverage beyond about 5 files. A change touching 5 lines across 1 file is one engineer-day; the same fix replicated across 25 files is roughly five engineer-days, not twenty-five, because the second through twenty-fifth applications amortise the diagnostic and review overhead. The square-root scaling matters: it is the reason a design-system-level fix is so much cheaper per call-site than a per-team patch, and it is the central economic argument for paying down accessibility debt at the component level rather than the page level.

The portfolio view

Once every accessibility finding has a severity score and a cost estimate, the engineering organisation has a portfolio — exactly analogous to the security-vulnerability portfolio or the performance-regression portfolio that already lives in the engineering scorecard. The portfolio is sliced two ways. Debt-by-component sums severity across all findings that live in a given React or Vue component, surfacing the components that carry the most accessibility risk per engineer-day of refactor. Debt-by-pillar sums severity across the four WCAG pillars (Perceivable, Operable, Understandable, Robust), surfacing which class of failure is structurally underweighted in the team’s design and review practices.

The debt-by-component slice is the one that drives quarterly investment decisions. If 60 percent of total severity sits in fifteen components — which is typical — then a quarterly engineering investment of 20 engineer-days into those fifteen components retires roughly 60 percent of severity, and that retirement compounds across every page that uses those components. The debt-by-pillar slice is the one that drives process decisions: if 70 percent of severity sits under “Operable” (keyboard, focus, time-limit failures) the team’s design review is letting Operable issues through and the fix is a design-review checklist, not a remediation sprint. If 70 percent sits under “Perceivable” (alt text, captions, contrast, sensory characteristics) the gap is in content production and the fix is an authoring-tool guardrail, not a development sprint. The portfolio view turns audit findings into investment theses, which is the form engineering leaders actually fund.

Three industry-specific examples

The same accounting framework produces materially different prioritisation in different industries, because the visit-rate multiplier and the user-impact tier are sector-specific. Three short walk-throughs make the point.

Fintech consumer app

A consumer fintech (digital bank, neobroker, payments wallet) carries a small number of extraordinarily high-traffic flows — onboarding, balance check, transfer, transaction history — that 95 percent of monthly active users hit. It also carries a long tail of edge-case screens (joint-account governance, beneficiary nomination, tax-statement export) that fewer than 1 percent of users see. Under the severity score the visit-rate multiplier collapses the long tail almost entirely: a serious violation on a tax-statement export scores below 0.1 even with a tier-1 user-impact multiplier. The portfolio compresses to perhaps 30 components that produce 90 percent of total severity, all of them in the four core flows. Fintech engineering leaders typically have the budget to retire that compressed portfolio in two quarters of focused investment, and the regulatory backdrop — EU AI Act on automated decision-making, plus EAA Article 13 penalties — turns the investment into both a risk hedge and a competitive moat against incumbents whose flows still contain keyboard traps.

EdTech learning platform

An EdTech platform (K-12 or higher-ed) carries the opposite traffic shape: a long tail of content pages (every lesson, every assignment, every assessment) where the visit-rate per individual page is low but the cumulative footprint is enormous. The visit-rate multiplier does not collapse the portfolio the way it does in fintech. It also carries a user-impact tier amplification not present in fintech: students with disabilities are a federally-protected population in the US under Section 504 and the IDEA, and in the EU under the EAA’s education carve-out being phased in by 2027. The result is that a moderate violation on a single lesson page — visit-rate 0.001, impact-tier 1 — still scores at the level where it cannot simply be ignored, because the violation pattern repeats across approx. 8,000 lessons. EdTech debt is best attacked at the authoring-tool layer, because a single fix in the lesson-template component retires the violation across every page rendered from that template. The debt-by-component slice almost always points at three or four template components that anchor the entire content library.

SaaS B2B platform

A B2B SaaS platform (CRM, ERP, HR, devtool, observability) carries a third shape: high-density data-grid interfaces, long-tail admin screens, and integration-configuration flows that are visited by a small number of users repeatedly. Visit-rate per page can be misleading; the right denominator is session-time, not unique visits, because a power user spends six hours a day inside the data grid. Under a session-time-adjusted visit-rate the data grid scores far higher than the marketing-style screens, even when fewer than 10 percent of seats touch it. The user-impact tier is also amplified: enterprise procurement increasingly carries an accessibility-aware RFP requirement, which means a single tier-1 violation in the data grid can lose a six-figure contract in the procurement-questionnaire stage. SaaS engineering leaders typically conclude that the right pay-down strategy is component-by-component within the data-grid library, with each released version of the library carrying a measurable severity reduction the procurement team can quote on the next RFP.

A sample quarterly burn-down dashboard

Engineering organisations that track technical debt seriously publish a quarterly burn-down chart inside the engineering all-hands deck: total debt at start of quarter, debt retired during the quarter, debt added during the quarter (new findings from audits, new violations introduced by new features), debt at end of quarter. The accessibility-debt dashboard mirrors this exactly. The headline metric is total weighted severity — the sum of base-times-visit-rate-times-impact-tier across every open finding, on a 0-to-10 normalised scale aggregated up to a single portfolio number. A useful secondary metric is severity-per-thousand-pageviews, which controls for product growth: a dashboard that shows weighted severity falling while pageviews grow is a sign that the team is paying debt down faster than it is being introduced.

The dashboard’s other panels follow directly from the portfolio slices. Top 10 components by debt, with current severity and engineer-day estimate, plus a “fixed this quarter” annotation on components that moved off the list. Debt by WCAG pillar, as a stacked bar showing the Perceivable/Operable/Understandable/Robust mix and how it has shifted across the last four quarters. Debt added this quarter, broken out by whether the addition came from a new audit finding (existing latent debt that was discovered) or from a new violation introduced in a feature shipped during the quarter — that second number is the one that tells leadership whether the team’s design review and shift-left tooling are working. Forecast burn-down, projecting current-quarter velocity forward to estimate when total severity reaches a target threshold (typically the score at which the largest open enforcement risks are mitigated and the next round of procurement questionnaires can be answered with no caveat).

The dashboard is consciously boring. It looks like every other engineering dashboard a VP of engineering already reads — same axes, same conventions, same quarterly cadence. That is the point. Accessibility debt has historically lived outside the engineering scorecard because it lacked a representation engineering leaders could read at a glance. Putting it on the same dashboard, in the same shape, with the same severity-by-likelihood logic the rest of the engineering function already uses, removes the cognitive overhead of treating accessibility as a special case. It becomes one more category of engineering risk that gets measured, traded off, and burnt down on a schedule — which is what it has always been.

Final thoughts

The framework above does not change what counts as an accessibility failure. WCAG defines that. It does not change which users are affected, or what the law requires. The regulatory map already defines that. What it changes is the shape of the information passed from auditors to engineering leaders. Accessibility findings that arrive as PDF audit reports get re-cast as Jira tickets with severity scores, cost estimates, and component tags — the same shape every other engineering risk arrives in. Triage becomes possible. Burn-down becomes measurable. Quarterly investment becomes a number the VP of engineering can defend in the budget conversation.

There is a softer effect too. Engineering teams are good at maintaining things they can measure and bad at maintaining things they cannot. Accessibility has spent two decades sitting just outside the measurement boundary — described in WCAG language, audited in compliance language, but never folded into the engineering-debt language that drives quarterly decisions. The cost of that exclusion is visible in every audit report that lands on a director’s desk and produces a single all-hands sprint of frantic remediation followed by another twelve months of regression. The fix is not more audits. The fix is putting accessibility on the same ledger as the rest of the engineering work, with the same severity math, the same cost estimator, and the same quarterly cadence. Engineering leaders who do this stop being surprised by the next audit. The audit becomes confirmation of what the dashboard already showed.