Money Is Spent and the Program Is a Rumor
A funded program is read off a dashboard the people delivering it learned to feed. The implementation literature has measured the gap between paper and practice for two decades, and the numbers are worse than most renewals assume.
YASHRAJ PATEL · LATENT VARIABLES
The number nobody can explain
A district buys a curriculum, a health system rolls a care bundle, a funder ties a grant to model fidelity, a chain ships a new in-store routine to six hundred locations. The money is already spent. Then the dashboard says adoption is partial, or says it is fine while the outcome metric sits flat, and nobody at headquarters can explain the gap, with a renewal weeks out. My argument is narrow: the gap is not mysterious. The implementation-science literature has measured it for twenty years, and the headline finding is that the picture of delivery a leader prices a renewal against is reliably wrong in a knowable direction. The program on the slide and the program in the room are two different programs, and the distance between them is better quantified than most realize.
I lead with the base rates because they are not in dispute. A review of QI work finds that 33 to 70 percent of documented QI gains are not sustained[1] once the funded period ends, a pattern the field calls improvement evaporation, and the change-management literature has long put the failure rate of organizational change near 70 percent. Dean Fixsen and Karen Blase, synthesizing the implementation research at NIRN[2], anchor on a number I trust precisely because it is so unflattering: under passive diffusion it takes roughly seventeen years for about 14 percent of original research to reach practice. Nothing reaches the floor on its own, and most of what is bought is bought as if it will.
FAILURE OF STRUCTURE OR WILL
Outside the instrument
- ~33% of licenses actually used (Glimpse K12 complement)
- 73% of licenses at least activated (LearnPlatform complement)
- The committed spend is asked about, if at all, after the renewal
- The dashboard is built from the optimistic procurement layer
FAILURE OF INFORMATION
What the listening recovers
- 67% of purchased school software licenses go unused (Glimpse K12)
- Up to 90% unused in some districts (Glimpse K12)
- 27% of licenses are never even activated (LearnPlatform)
- Over $1 billion a year in K-12 edtech licensing fees wasted (LearnPlatform)
Glimpse K12 (2019) and LearnPlatform via EdWeek Market Brief (2019) on K-12 edtech license waste.
Education has the cleanest measurement of the gap, because RAND built an instrument to catch it. The American Instructional Resources Survey runs parallel surveys of teachers and principals about the same materials[3], and the perception gap it exposes is the point: principals report providing standards-aligned curricula at far higher rates than teachers report using them, and in 2019 only about 34 percent of math teachers reported using a fully aligned material at all. That is the methodological lesson. Ask the layer above and the layer below about the same artifact and measure the distance, because the distance is the finding, and the dashboard is built from the optimistic layer.
The waste numbers are the part that should embarrass the procurement side. Glimpse K12 found that roughly 67 percent of purchased school software licenses go unused[4], up to 90 percent in some districts, and LearnPlatform put over a billion dollars a year in K-12 edtech licensing fees as wasted[5], with about 27 percent of licenses never even activated. TNTP's Mirage put teacher development near 18,000 dollars per teacher per year[6], with no evidence that any identifiable kind or amount of it consistently improved teaching. The money is committed; whether it bought anything is asked, if at all, after the renewal.
Why the canon keeps reinventing the interview
What convinced me this is structural is that the methods built to diagnose it converge, from different disciplines, on the same move. None trusts the system of record; all of them ask a specific person about a specific recent delivery. That repetition is the tell.
Laura Damschroder's Consolidated Framework for Implementation Research[7] is the de facto standard for diagnosing why a funded program is not being used, and its operational core is not the five domains everyone cites. It is the difference-maker design: interview at multiple sites, deliberately including high and low performers, rate each construct per site, and find the constructs whose ratings separate the sites that adopted from those that did not. The guidance is emphatic on a point most survey-based evaluations get wrong, that the sample questions must never be run as a script or a checklist, because that kills the conversation that produces the determinant. The diagnosis lives in talk, not in items.
Carl May and Tracy Finch's Normalization Process Theory[8] sharpens what that talk should be about. NPT's claim is that a new practice becomes routine through the work people do to embed it, not their attitudes toward it, so the instruction is to ask about the work the program creates and displaces rather than belief in it. An engagement survey measures sentiment; NPT measures whether anyone can even distinguish the new thing from what they did before, the precondition for everything downstream and invisible to a satisfaction score.
“The sample questions must not be used as an interview script or checklist, because doing so undermines the conversational interviewing that surfaces determinants.”
Fixsen's competency-driver point names the single most expensive misdiagnosis in the field. Training alone does not change practice; coaching is what does, and coaching is exactly what supervision logs cannot see, because they record that a session occurred and not what it contained. Russell Glasgow's RE-AIM framework[9] adds the discipline I respect most operationally: pair every metric with a who-and-how question only an interview can answer. Reach is which people were missed and why; maintenance is what quietly stopped after the funding ended. The number says something moved; only the conversation says what.
On whether a local change is legitimate adaptation or quiet drift, Shannon Wiltsey Stirman's FRAME[10] is the instrument I trust, and its authors are honest about the catch: it works precisely where providers are reluctant to report that they removed or shortened a core element. So the elicitation cannot ask for confessions of noncompliance; it asks for recent concrete deliveries and reads the drift out of the story. TNTP's Opportunity Myth applied that logic at scale[11], collecting roughly 30,000 student work samples instead of asking adults what was taught, and found students saw grade-level material only 26 percent of the time and showed grade-level mastery 17 percent. That is the move the whole canon shares: stop asking what is delivered, collect what was actually given.
The dashboard is often fiction, and the staff know it
RAND American Instructional Resources Survey, 2019: parallel principal and teacher reporting on the same materials.
Here is the finding that should change how a renewal is priced, more than any single adoption percentage. In a large share of these programs, the compliance data the decision rests on is not a noisy measurement of reality. It is a separate artifact, produced to satisfy a report, and the people producing it know it is fiction.
The frontline practitioner knows which components get skipped on a heavy shift, while the field records only that the session happened, and admitting the omission feels like writing yourself up. The first-line supervisor knows that last week's supervision was crisis triage, not the coaching the fidelity plan assumes, and the log shows occurrence either way. The data coordinator knows which fields get batch-backfilled at the deadline; the pilot champion knows the unbudgeted hours of reminders and cleanup the success ran on, none of it traveling to the next site. The clinical pharmacist knows which alerts everyone clicks through unread, which matters because override rates run 90 to 96 percent in many studies while about half of overrides are clinically appropriate, so the log alone cannot tell a dangerous bypass from a sensible one. Each holds a piece the dashboard cannot represent, and each has a reason to keep it off the record.
That is why I distrust treating partial adoption as a motivation problem to be solved with more training. The frontline is usually not refusing the program. It is reconciling an impossible task list, running a workaround the official process quietly depends on, or substituting for a protocol element that cannot run as written and reporting the substitution in the model's own vocabulary to keep the funding. Those are different diagnoses with opposite fixes, and a compliance number cannot tell them apart.
Weeks, not months, and why that finally matters
Gale et al., rapid versus traditional qualitative analysis using CFIR: consistent conclusions, traditional approach 69 days longer.
The standard objection is time. Real qualitative evaluation, coding transcripts to constructs and rating determinants by site, has historically taken twelve to twenty-four months, and a renewal does not wait. That objection is now weaker, and the evidence is specific. Alison Hamilton's rapid qualitative analysis at the VA[12] replaces full transcription and line-by-line coding with structured summary templates that roll into a respondent-by-domain matrix, so cross-site patterns surface in days. A head-to-head study by Gale and colleagues[13] found that rapid analysis reached conclusions consistent with traditional CFIR coding while the traditional approach took 69 days longer. That is the proof that weeks-not-months diagnosis is methodologically defensible rather than a corner cut, and the closest existing analog to interviewing at scale that I know of.
The threads converge on one mechanism. The information that explains a stalled program rarely lives in the data systems. It lives in the heads of the people doing the work, and it stays there because every channel built to collect it punishes honesty. Tell the auditor what you really skip and you have documented your own noncompliance. So the field gets coded clean, the clean field averages into a compliance rate, and the rate reaches the board as a color while the floor runs a different program. The sum is a renewal priced against a delivery picture the people delivering it already know is false.
The grim joke is that the truth is not missing. Every framework above is, underneath its vocabulary, describing the same recovery move: get a neutral party to ask a specific person about a specific recent delivery, somewhere the honest answer cannot be used against them. None of it is exotic. Organizations lack the answer not because it is unknowable, but because the only channels they built to ask are the ones the floor learned to feed.
Which points at the kind of instrument this calls for. Not another compliance dashboard, which collects the safe number by design, and not a twelve-month academic evaluation that lands after the renewal is signed. Something closer to what the canon has described for two decades and few have run at scale: a neutral, confidential conversation anchored to a real recent delivery rather than a feeling on request, run for enough people that no answer traces to one person, and returned in weeks. That is the instrument we are building at Latent Variables. The literature already measured where the truth sits. The open problem was only ever reaching it before the renewal commits.
REFERENCES
- 1.Sustainability of quality-improvement initiatives: review finding 33 to 70 percent of gains not sustained (improvement evaporation). pmc.ncbi.nlm.nih.gov/articles/PMC7808047
- 2.Dean Fixsen, Karen Blase et al., NIRN, An Overview of the Active Implementation Frameworks; Module 3 on the 17-year / 14 percent gap and letting/helping/making it happen. implementation.fpg.unc.edu/wp-content/uploads/Active-Implementation-Overview-M1.pdf
- 3.RAND, American Instructional Resources Survey: parallel teacher and principal surveys; ~34 percent of math teachers used a fully aligned material in 2019. www.rand.org/pubs/research_reports/RRA134-1.html
- 4.Glimpse K12 analysis: roughly two-thirds of purchased school software licenses go unused, up to 90 percent in some districts. www.globenewswire.com/news-release/2019/05/15/1825260/0/en/Glimpse-K12-Analysis-of-School-Spending-Shows-that-Two-Thirds-of-Software-License-Purchases-Go-Unused.html
- 5.LearnPlatform via EdWeek Market Brief: more than $1 billion a year in K-12 edtech licensing fees wasted; ~27 percent of licenses never activated. marketbrief.edweek.org/education-market/more-than-1-billion-in-k-12-ed-tech-licensing-fees-go-to-waste/2019/11
- 6.TNTP, The Mirage (2015): ~$18,000 per teacher per year on development, ~$8 billion across the 50 largest districts, no consistent improvement signal. tntp.org/assets/documents/TNTP-Mirage_2015.pdf
- 7.Laura Damschroder et al., Consolidated Framework for Implementation Research; CFIR User Guide, Implementation Science 2025, and the CFIR interview guidance against script use. cfirguide.org/cfir-implementation-research
- 8.Carl May and Tracy Finch, Normalization Process Theory and the NPT Toolkit: ask about the work a practice creates and displaces, not attitudes toward it. www.normalizationprocess.org/npt-toolkit
- 9.Russell Glasgow, RE-AIM and RE-AIM QuEST: pair each metric (Reach, Effectiveness, Adoption, Implementation, Maintenance) with a who/how question. re-aim.org
- 10.Shannon Wiltsey Stirman et al., The FRAME, Implementation Science 2019: characterizing modifications, especially where providers are reluctant to report drift. pmc.ncbi.nlm.nih.gov/articles/PMC6554895
- 11.TNTP, The Opportunity Myth (2018): ~30,000 work samples; grade-level material seen 26 percent of the time, mastery 17 percent. opportunitymyth.tntp.org
- 12.Alison Hamilton, rapid qualitative analysis (VA QUERI): structured summary templates and respondent-by-domain matrices in place of full coding. implementationscience.biomedcentral.com/articles/10.1186/s13012-024-01397-1
- 13.Gale et al., rapid versus traditional qualitative analysis using CFIR: consistent conclusions, with the traditional approach taking 69 days longer. www.ncbi.nlm.nih.gov/pmc/articles/PMC8252308