The demo took a day. Trusting it took six months.

The first time I had a language model write an insight summary from our data, the whole thing took an afternoon. I pointed it at a performance dataset, asked for a monthly narrative, and watched it produce something that would have taken an analyst most of a day. Fluent, confident, well structured, with a clear headline finding and sensible recommendations underneath.

It also stated a number that was not in the data.

Not a wildly wrong number. A plausible one, sitting in a well-written sentence, the kind of figure that sails through a meeting because it sounds like everything around it. I only caught it because I knew the dataset well enough to feel something was off, and went back to check. That was the moment the real project started, and it had very little to do with models.

As AI Lead for our analytics operation, I get asked fairly often which model we use, and the honest answer is that it matters less than people expect. We work with Vertex AI and Gemini, with Claude, with OpenAI models, each where it fits. The part nobody asks about is the layer underneath: what a model is allowed to see, how its output gets checked against source data, and what happens when the two disagree. That layer is the product. The model is a component.

Building it took months of decidedly unglamorous work. Every automated narrative now keeps a trail back to the queries that produced its numbers, because a summary you cannot trace is a rumour with good formatting. New use cases start in a sandbox where a wrong answer costs nothing, and they graduate only when we can show their failure modes, not just their highlights. Agents that touch production data run inside defined workflows with validation gates, rather than roaming freely because the demo looked clever. And a human owns every output that reaches a stakeholder. The model drafts; someone accountable signs.

I expected more resistance to all this than we got. What I underestimated was how much trust the guardrails would buy. Analysts stopped treating AI output as a threat or a toy and started treating it as a fast first draft they could stand behind, because they knew exactly how it was produced and where to check it. The speed showed up where speed belongs: exploration, first drafts, the tedious reshaping of data that used to eat afternoons. The checking stayed where checking belongs, close to the people accountable for being right.

What we did not get, and this is the entire point, is a confident paragraph walking unchecked into a leadership meeting.

Two years into this wave, I have come to a simple diagnosis for teams struggling with AI adoption. They are almost never short on models. They are short on the boring layer that makes model output safe to believe. The demo takes a day. The trust takes months, and it is the only part with lasting value.