Pilots Fail and the Floor Knew Why
Thirty to forty billion dollars of enterprise AI spend, and almost none of it shows up in the numbers. The reason is not the model. It is that the organization measures its own AI use by asking the people who have learned to lie about it.
NOAH ALEXANDER AND YASHRAJ PATEL · LATENT VARIABLES
Thirty billion dollars, no return
Start with the number that should have ended the conversation about whether AI is working in the enterprise. MIT's Project NANDA put roughly 95 percent of corporate generative-AI pilots at no measurable return on profit and loss[1], against thirty to forty billion dollars of spend. Read it slowly, because the framing it invites is wrong. The instinct in the room is that the models are not ready, that the tools fell short, that next year's release will close the gap. That is the comfortable reading, and the evidence does not support it. The same report found that purchased tools cross into real workflows about twice as often as internal builds, which is not a story about model capability at all. It is a story about whether the work was understood before the money moved. NANDA's own phrase for the divide is learning: the systems that paid off were the ones that fit how people actually worked, and the ones that did not were priced off a picture of the work that nobody checked.
Set the older base rate next to it and the pattern hardens. The MIT Sloan Management Review and BCG program has tracked this for years, and at its low point found only about one company in ten getting significant financial benefit from AI[2]. Two studies, seven years apart, different methods, same floor. The failure is not new and it is not improving on the schedule the vendors promised, which tells you the binding constraint is not the silicon. It is upstream, in the organization, in the gap between what leadership believes the work is and what the work actually is. And that gap has a specific, measurable cause that almost no readiness assessment is built to see.
MIT Project NANDA, "The GenAI Divide: State of AI in Business 2025."
The instrument is pointed at people who are hiding
Here is the finding we keep returning to, because it dissolves the usual diagnosis. In the 2025 KPMG and University of Melbourne global study, 57 percent of employees said they hide their use of AI and present AI-generated work as their own[3]. The sample is not a focus group. It is more than 48,000 people across 47 countries, the largest read of working attitudes toward AI that exists. More than half of the workforce is concealing the exact behavior that every adoption dashboard, every readiness survey, every rollout metric is trying to measure. The instrument the organization uses to assess its own AI maturity is pointed directly at the population that has decided, rationally, not to answer it honestly.
The concealment is not idle. Microsoft's 2024 Work Trend Index found that 78 percent of AI users bring their own tools to work[4], and that more than half are reluctant to admit using AI on their most important tasks for fear of looking replaceable. So the use is heavy, it is personal, it runs on accounts the company never provisioned, and it is densest on the work that matters most. McKinsey sized the resulting blindness: leaders estimate that 4 percent of employees use generative AI for at least 30 percent of their daily work, while 13 percent say they actually do. The people setting the strategy are wrong about its current state by a factor of three, in the direction that guarantees they misprice the plan.
- The maturity survey asks the abstract question: do you use AI?
- The dashboard records the safe answer from the half of the workforce concealing it
- 57% hide their AI use and present the output as their own (KPMG, 2025)
- 52% are reluctant to admit using AI on their most important tasks (Microsoft, 2024)
KPMG and University of Melbourne, "Trust, Attitudes and Use of AI" 2025 (48,000+ respondents, 47 countries); Microsoft and LinkedIn, 2024 Work Trend Index.
“You cannot survey this out of people. You have to remove the reasons to hide and watch what surfaces.”
The person who named this most precisely is Ethan Mollick, who calls the hidden users secret cyborgs[5], and his account of why they hide is the part that should reorganize how readiness gets assessed. He gives three reasons, and none of them is irrational. Policy fear: a ban does not stop the use, it pushes it onto personal phones where nobody can see it. Social penalty: the same work is judged worse once a colleague learns AI touched it, so the smart move is to never let them learn. Job-security fear: the employee who reveals they automated most of a task has just volunteered for the next round of cuts. Each fear is a correct read of the local incentives. Add them up and you get an organization where the most valuable information about its own AI capability is held by the people with the strongest reasons to keep it. You cannot ask your way past that with a survey, because the survey is the thing they are hiding from.
What the experts have always known about self-report
This is not a new failure of measurement; it is an old one wearing new clothes. Sam Ransbotham and the MIT Sloan-BCG group caught it cleanly[6]: ask workers in the open whether they use AI and roughly two thirds say barely at all, then walk them through concrete product examples and 43 percent admit regular use. The number did not change. The honesty did. Self-report of AI use is unreliable unless it is anchored to specific tools and specific recent tasks, and almost no readiness instrument anchors anything. It asks the abstract question and records the safe answer.
The cognitive-science tradition settled this point decades before anyone was hiding a chatbot. Gary Klein built the Critical Decision Method on the discovery that experts cannot state what they know when asked directly[7]; you have to take them back through one real, non-routine incident and probe the cues, the assessment, the options they considered and rejected. Laura Militello and Robert Hutton turned that into Applied Cognitive Task Analysis[8], whose probes are episodic by construction and which, on validation, surfaced relevant cognitive content about 93 percent of the time. The doctrine that runs through all of it, made explicit in Crandall, Klein, and Hoffman's standard text, is blunt: espoused descriptions of how people work are systematically misleading. People narrate the procedure, not the practice. So an organization that wants to know which tasks AI already handles well has to do the same thing Klein did with fireground commanders, which is the opposite of what a maturity questionnaire does.
And the task is the right unit, not the job, which is the second thing the canon settled early. Brynjolfsson, Mitchell, and Rock scored all 18,112 O*NET tasks across 950 occupations on a suitability-for-machine-learning rubric[9] and found that most occupations contain some highly suitable tasks while almost none is automatable end to end, so the value comes from re-bundling tasks inside a job rather than swapping a person for a machine. Jesuthasan and Boudreau built the standard redesign playbook on the same premise[10]: deconstruct the job into tasks, decide per task whether to substitute, augment, or create new work. A program that picks its AI targets off job descriptions instead of the real task composition is choosing the wrong unit before it spends a dollar, and the job description was stale anyway.
The frontline stall, and the breach nobody logged
Two more numbers close the case, because they show the cost of the blindness landing in two different places. The first is adoption that dies in the middle of the organization. BCG's 2025 AI at Work survey put regular frontline use at about 51 percent against roughly 72 percent for leaders and managers[11]. The leader story and the frontline story diverge by twenty points, which means any readiness read taken at the top is describing a different organization than the one doing the work. The executive who reports that adoption is healthy is reporting on the floor they can see, which is the one most like their own.
BCG, "AI at Work 2025: Momentum Builds, but Gaps Remain."
The second number is where the hidden use turns into a liability. IBM's 2025 Cost of a Data Breach report found that breaches involving shadow AI carried a premium of about 670,000 dollars[12], and that only 17 percent of organizations had any technical control on what employees upload to public AI tools. Read those two figures together. The use is hidden because disclosure is penalized, the upload is unblocked because the controls were never built, and the first time most firms learn the real extent of their shadow AI is from a breach invoice. The same concealment that makes the productivity invisible to the strategy makes the exposure invisible to compliance, right up until it is a finding rather than a question.
McKinsey, "Superagency in the Workplace" 2025 (leader vs employee heavy-use estimate); IBM, Cost of a Data Breach Report 2025 (upload controls); Microsoft and LinkedIn, 2024 Work Trend Index (BYOAI).
The information already exists
Put the pieces in one line and the diagnosis is hard to escape. Ninety-five percent of pilots return nothing, not because the models cannot do the work, but because the work was priced off a picture nobody verified. The picture is wrong because more than half the workforce hides the behavior the picture is supposed to capture. They hide it for reasons that are individually correct. The signal degrades further at every level it climbs, so the frontline knows one thing, the manager rounds it up, the dashboard averages it, and the board receives a color. The strategy is then set against the color. This is not a knowledge problem in the sense that the answer is unknown. The answer is known, in detail, by named people in the building. It is a reaching problem. Every channel the organization built to ask the question is one the floor learned long ago to lie into, because answering honestly carries a cost and lying carries none.
What makes this worse than an ordinary measurement gap is that the people who know most have the strongest reason to say least. The frontline worker who quietly automated a task will not raise a hand. The top underwriter whose override logic a model would have to encode cannot state it on a form, and would not if she could, because the form goes to the people deciding whether her seat survives. The knowledge that would price the program correctly is precisely the knowledge the program's own instruments repel.
Which points at the kind of instrument the problem actually requires, and the canon has been describing it the whole time without operationalizing it at scale. Not a maturity survey, which records the safe answer by design. Not a leadership self-assessment, which reads the one floor the leader already understands. Something closer to what Klein did with the fireground and Mollick prescribes for the secret cyborgs: a neutral, confidential conversation, anchored to a real recent week rather than an abstract question, run by a party with nothing to gain and reported only in aggregate so no answer traces to one person. Remove the reasons to hide and the use surfaces. That is the instrument we are building at Latent Variables. The base rates already told us where the truth sits and how much it is worth. The only open problem was reaching it before the next pilot is funded against a number no one checked.
REFERENCES
- 1.MIT Project NANDA, "The GenAI Divide: State of AI in Business 2025": ~95% of enterprise generative-AI pilots show no measurable P&L return on $30-40B of spend; purchased tools succeed roughly twice as often as internal builds. fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo
- 2.MIT Sloan Management Review and BCG, Artificial Intelligence and Business Strategy program (incl. the financial-benefits report finding ~10% of firms get significant financial benefit from AI). sloanreview.mit.edu/big-ideas/artificial-intelligence-business-strategy
- 3.Nicole Gillespie, Steve Lockey et al., "Trust, Attitudes and Use of Artificial Intelligence: A Global Study 2025," KPMG and University of Melbourne (48,000+ respondents, 47 countries): 57% hide AI use and present AI output as their own. kpmg.com/xx/en/our-insights/ai-and-technology/trust-attitudes-and-use-of-ai.html
- 4.Microsoft and LinkedIn, 2024 Work Trend Index Annual Report: 78% of AI users bring their own tools to work; 52% reluctant to admit using AI on their most important tasks. www.microsoft.com/en-us/worklab/work-trend-index/ai-at-work-is-here-now-comes-the-hard-part
- 5.Ethan Mollick, "Detecting the Secret Cyborgs" and "Making AI Work: Leadership, Lab, and Crowd," One Useful Thing; Co-Intelligence (Portfolio, 2024). www.oneusefulthing.org/p/detecting-the-secret-cyborgs
- 6.Sam Ransbotham et al., MIT Sloan Management Review and BCG annual AI survey: 66% of workers say they barely use AI until prompted with concrete examples, at which point 43% admit regular use. sloanreview.mit.edu/projects/the-cultural-benefits-of-artificial-intelligence-in-the-enterprise
- 7.Gary Klein, Critical Decision Method; Klein, Calderwood, Clinton-Cirocco, "Rapid Decision Making on the Fire Ground" (1986; 2010 reprint). www.gary-klein.com/cdm
- 8.Laura Militello and Robert Hutton, "Applied Cognitive Task Analysis (ACTA): a practitioner's toolkit," Ergonomics 41(11), 1998 (~93% of elicited content cognitively relevant). apps.dtic.mil/sti/tr/pdf/ADA335225.pdf
- 9.Erik Brynjolfsson, Tom Mitchell, Daniel Rock, "What Can Machines Learn, and What Does It Mean for Occupations and the Economy?" AEA Papers and Proceedings, 2018 (18,112 O*NET tasks across 950 occupations scored on an SML rubric). www.aeaweb.org/articles?id=10.1257/pandp.20181019
- 10.Ravin Jesuthasan and John Boudreau, Reinventing Jobs (HBR Press, 2018) and Work Without Jobs (MIT Press, 2022). mitpress.mit.edu/9780262545969/work-without-jobs
- 11.BCG, "AI at Work 2025: Momentum Builds, but Gaps Remain": frontline regular use ~51% vs ~72% overall. www.bcg.com/publications/2025/ai-at-work-momentum-builds-but-gaps-remain
- 12.IBM, Cost of a Data Breach Report 2025: shadow-AI-involved breaches carry a ~$670K premium; only 17% of organizations have technical controls on uploads to public AI tools. www.ibm.com/reports/data-breach