APPLICATION · NO. 14JUNE 2, 2026

Policies as Administered

Public systems get specified from the policy as written. The policy as administered lives in the heads of caseworkers and COBOL maintainers, and the literature has known for twenty years that this is where the projects die.

NOAH ALEXANDER · LATENT VARIABLES

The requirement that was never written down

A state signs a vendor to replace a forty-year-old unemployment mainframe, and requirements lock in three months. Somewhere in the building, a caseworker keeps a printed grid of override codes that is the real eligibility logic for the exception-heavy fifth of cases, and a programmer near retirement is the only person who can safely change the rate calculation, hardcoded in three places nobody else reads. Neither fact will appear in the requirements document, because that document captures the policy as written, and what determines whether the system works on launch day is the policy as administered. That gap is not a footnote to public-sector failure; it is the mechanism, and the striking thing is how long the literature has said so plainly while the same projects keep failing the same way.

Start with the base rate. 18F's De-risking Government Technology field guide^[1] is built on a brutal number from the Standish data: only about 13 percent of government software projects over six million dollars succeed. The guide's diagnosis of why is not about technology but procurement discipline. Requirements get written without the people who operate the system, scope and date and budget all get fixed before anyone has touched the legacy code, and quality gets judged by document review instead of working software. The pitfalls that sink these projects, in their words, are only discoverable by actually working with the software code of a system. Paper cannot surface them, and paper is what requirements are made of.

Where California's unemployment claims narrowed to one step

California EDD Strike Team Detailed Assessment (Sept 2020): ~60% of claims auto-approved; of the ~40% needing manual review, 78% stalled on identity verification.

The bill is public record. Rhode Island's UHIP benefits system ballooned to roughly 492 million dollars with two class-action settlements. California's CCMS court system was killed in 2012 after about 500 million dollars. On HealthCare.gov, GAO found CMS obligated about 840 million dollars^[2], and it nearly took down a presidency at launch. And Michigan's MiDAS should haunt anyone tempted to automate around the frontline: it auto-adjudicated fraud with no human review, falsely accused tens of thousands of claimants^[3], and is estimated at 78 million dollars to replace. None of these was undone by a missing feature, but by a picture of the work that was wrong and a structure that kept the correction from the spec writers.

The diagnosis the canon keeps reaching

What convinces me this is structural is that the people who have rescued these programs describe the same recovery move, arrived at independently. Jennifer Pahlka's central claim in Recoding America^[4] is the cleanest statement of the disease: the separation between policy and implementation is false, and the gap between policy as written and policy as administered is where programs die. Her sharpest line names the incentive exactly. Implementers are judged on process compliance, not outcomes, so deviating from procedure can get you fired while a bad outcome only produces an awkward hearing. Nowhere in a government requirements document, she points out, is there a requirement that the service actually work. That is not cynicism but an accurate description of what the organization rewards.

The EDD Strike Team is the proof of concept. Given forty-five days to diagnose California's unemployment backlog, Pahlka and Yolanda Richardson interviewed hundreds of staff below the leadership layer, followed claims end to end, and found that of those needing manual review, about 78 percent were stuck on a single step^[5], identity verification, which was not even meaningfully preventing fraud. That fact was known in fragments on the call floor and invisible in every status report. It reads as a story about a clever team; it is really a story about location. The decisive fact existed inside the building the whole time, and leadership did not have it because the channels that carry information up carry the safe version, which had averaged the identity stall into a generic queue number.

The bill for public deliveries that went wrong

GAO-14-694 (HealthCare.gov ~$840M); "Automated Stategraft," Wisconsin Law Review (Michigan MiDAS ~$78M to replace); with Rhode Island UHIP and California CCMS as cited in the prose.

“There was literally no dashboard. There was no place to find out whether the site was up or down, except for watching CNN.”

MIKEY DICKERSON, ON ARRIVING AT THE HEALTHCARE.GOV RESCUE

Mikey Dickerson reached the same floor from the engineering side and gave it a procedure. His HealthCare.gov rescue^[6] ran on twice-daily stand-ups under three posted rules: no finger-pointing, knowledge rather than rank determines who talks, focus only on what hurts in the next forty-eight hours. Those rules exist because the contractor staff already knew the truth and corporate and political rank were suppressing it. He could not even find who was responsible for the site being up; fifty-five contractors, no owner, and the ownership vacuum is itself the diagnostic. Dickerson and Pahlka describe one finding in two vocabularies: the knowledge is on the floor, and the org chart holds it there.

The demand side is the most under-appreciated piece of the canon. John Seddon's concept of failure demand^[7], demand caused by a failure to do something right for the citizen, commonly runs 50 to 80 percent of all incoming contact in UK local authorities and police forces. In a call center where four of every five calls exist only because something upstream failed, every dollar of added capacity buys more capacity to absorb the agency's own defects. Seddon has leaders themselves classify real calls as value or failure demand, so they cannot dismiss the data as someone else's. I would push his framing further: failure demand is not only a measure of waste, it is a distributed map of which letters and portal screens are broken, held by the agents who hear it live and recorded nowhere because wrap-up codes are chosen for speed.

How much incoming contact is the agency's own failure

John Seddon / Vanguard Method: failure demand commonly runs 50-80% of all incoming demand in UK local authorities and police forces.

Pamela Herd and Donald Moynihan supply the part that decides whether a fix is even possible. Their administrative-burden framework^[8] decomposes every citizen-state interaction into learning, compliance, and psychological costs, and asks, for each burden, whether it traces to statute, regulation, an agency memo, or pure habit. That lineage decides whether the fix needs a legislative session or an afternoon, and it is not academic: administrative policy changes alone explained about 28.5 percent of the rise in SNAP participation between 2007 and 2011. But the lineage is almost never written down. It is institutional memory, carried by an analyst who half-remembers a consent order from the nineties, so an agency can spend a session asking the legislature to fix what a memo could have undone. Their reframing of burdens as policymaking by other means stops you treating attrition as an accident.

Why the structure produces the failure

Pull these threads together and they converge on a single mechanism, which is the thing I actually believe. The information that predicts whether a public delivery will work does not live in the data systems. It lives in the heads of the people doing the work, and it stays there because every channel built to carry it upward rewards the safe version. A caseworker who writes a workaround into a ticket invites discipline. A contractor engineer whose green status is a contract deliverable cannot file the red one. A manager who tells appointees the launch will fail has ended a career. So the grid stays off the record, the agent picks the fast wrap-up code, the supervisor rounds the floor up into a compliance number, and the number averages into a color that reaches the appointee as green. Each step is locally rational. The sum is a decision-maker structurally the last to know what the frontline knew first.

And the date makes it worse, because it is usually political before it is technical. A statute, a court order, a commitment given in testimony sets a launch day, and once the day is public it stops being a variable. The GAO High-Risk List criteria^[9], leadership commitment, capacity, a root-cause plan, monitoring, and demonstrated progress, are a fair x-ray of where these programs are weak, and the two they fail silently and earliest are capacity and monitoring, the ones nobody wants to confess. This is why Gary Klein's premortem^[10] is the sharpest instrument in the canon for this domain. Declaring the project already failed a year out and asking each person why raises correct identification of failure reasons by about 30 percent over asking what might go wrong, and it gives cover to the person who would never volunteer in a meeting that the launch is doomed. It routes around the incentive that produced the silence.

Taking the literature seriously means admitting where it stops. Not every failed delivery is a failure of information that better listening would have caught. Some are failures of structure or will: the thesis was wrong and the program should not have shipped, the capital for a real fallback was never appropriated, leadership would not make the no-go call because the date had been promised. No amount of asking the frontline fixes those. But the canon lets you tell the two apart early, and the post-mortems agree the first kind dominates: programs die, again and again, on knowledge that existed at the frontline and never reached the seat that decides.

The quiet scandal is that the recovery move is not exotic and not new. Pahlka's interviews below the leadership layer, Dickerson's knowledge-over-rank stand-ups, Seddon's leaders classifying real calls, Klein's prospective hindsight: every one is a version of the same act, getting a neutral party to ask a specific person about a specific recent case, somewhere the honest answer cannot be used against them. Agencies lack the picture not because it is unknowable, but because the only channels they built for asking are the ones the frontline learned long ago to give the safe answer to.

Which points at the kind of instrument this calls for. Not another status dashboard, which launders the floor's truth into a color before it climbs. Not another requirements workshop, which captures the policy as written by design. Something closer to what the canon has described for twenty years and few have run at scale before the contract is signed: a neutral, confidential conversation, anchored to a real recent case rather than an opinion on request, run with enough caseworkers and clerks and maintainers that no answer traces to one person. That is the instrument we are building at Latent Variables. The literature already told us where the administered policy sits; the problem was only ever reaching it before the requirements lock.

REFERENCES

1.18F, De-risking Government Technology: Federal Field Guide (revised 2024). Base rate of ~13% success for government software projects over $6M, from Standish data. guides.18f.gov/derisking
2.GAO-14-694, Healthcare.gov: Ineffective Planning and Oversight Practices. CMS obligated ~$840M across 62 contracts through March 2014. www.gao.gov/products/gao-14-694
3.Wisconsin Law Review, "Automated Stategraft": Michigan's MiDAS auto-adjudicated fraud with no human review and falsely accused tens of thousands. wlr.law.wisc.edu/automated-stategraft-faulty-programming-and-improper-collections-in-michigans-unemployment-insurance-program
4.Jennifer Pahlka, Recoding America (2023): the policy/implementation split is false, and the gap between policy as written and as administered is where programs die. www.recodingamerica.us
5.California EDD Strike Team Detailed Assessment (Sept 2020): ~78% of manual-review claims stalled on identity verification; diagnosis by interviewing staff below leadership and following claims end to end. www.govops.ca.gov/wp-content/uploads/sites/11/2020/09/Assessment.pdf
6.TIME, "Obama's Trauma Team," on Mikey Dickerson and the HealthCare.gov rescue: no dashboard, 55 contractors with no owner, knowledge-over-rank stand-ups. time.com/10228/obamas-trauma-team
7.John Seddon, the Vanguard Method and failure demand: demand caused by a failure to do something right for the citizen, commonly 50-80% of incoming contact in UK local government. beyondcommandandcontrol.com/how-to-study/step-2-studying-demand
8.Pamela Herd and Donald Moynihan, Administrative Burden: Policymaking by Other Means (2018); JPART 2015. Administrative changes explained ~28.5% of SNAP participation growth 2007-2011. academic.oup.com/jpart/article-abstract/25/1/43/885957
9.GAO-21-119SP, High-Risk Series: five criteria for assessing high-risk programs, including leadership commitment, capacity, and monitoring. www.gao.gov/products/gao-21-119sp
10.Gary Klein, "Performing a Project Premortem," Harvard Business Review (2007). Prospective hindsight raises identification of failure reasons by ~30%. hbr.org/2007/09/performing-a-project-premortem

Policies as Administered

The requirement that was never written down

The diagnosis the canon keeps reaching

Why the structure produces the failure

REFERENCES

RELATED RESEARCH