AIOps in Investment Banking: The Challenges

Why AI implementations fail between the demo and the real world

The demo works. It always works. The model answers fluently, the use case has been chosen to play to the technology's strengths, and everyone leaves impressed. The project gets approved.

Six months later, the implementation is in trouble.

This is not a story about bad technology. It is about a structural gap that the industry has a strong incentive not to discuss — and that organisations have an equally strong incentive not to examine until it is too late.

The demo is a controlled environment

The data is clean. The query is well-formed. The edge cases have been quietly removed from the test set. The person asking the question already knows it has a good answer.

Production is none of those things. Production has malformed inputs, ambiguous queries, users who do not behave as the design assumed, and data that has drifted from the conditions the model was trained on. It has system integrations documented three years ago and updated since, and business logic that exists only in the head of someone who left in 2022.

The model was not built for any of that.

The gap is organisational, not technical

The failure mode that follows is almost always diagnosed as a technical problem. But the underlying cause is typically a set of decisions made before the build began that treated the demo environment as a reasonable proxy for production. It is not. It never is.

What bridges the gap is governance: monitoring infrastructure to detect when model behaviour drifts from expectation; feedback loops to capture what the system gets wrong and why; retraining cadences to keep the model current as the underlying data changes; and escalation paths for edge cases the system cannot handle and should not attempt to. None of this is technically complex. All of it is organisationally demanding. Almost none of it appears in the original business case, because the business case was written by people who had seen the demo.

What gets cut and why

When implementation budgets come under pressure, governance goes first. It is invisible in the short term — cutting it does not break the build. The consequences arrive later, gradually, and are difficult to attribute to the decision that caused them.

This is not negligence. It is a rational response to misaligned incentives. The team that built the model has shipped. The people who will manage the consequences are often not the people who made the call.

The result is implementations that were technically sound, commercially approved, and operationally unready — running at degraded performance, failing quietly at the edges, and losing the confidence of the users they were meant to serve.

What a realistic implementation requires

Before a model goes near production, three questions need honest answers.

What does failure look like, and who will know when it happens? If there is no monitoring in place that would surface model degradation to someone with the authority to act on it, the implementation is not ready.

What are the edge cases, and what happens when the system encounters them? Every model has a boundary beyond which its outputs become unreliable. That boundary needs to be mapped, not discovered in production.

What does the operating model look like at eighteen months? The conditions that existed when the model was trained will have changed. The data will have shifted. The business context will have evolved. A model without a retraining and review cadence has a shelf life, whether or not anyone has acknowledged it.

These are not questions that complicate a good project. They are questions that determine whether a project that looks good in a demo will still be performing in two years.

The demo will always be impressive

That is its job. The question worth asking — before the approval, before the build, before the budget is committed — is what the distance looks like between what just happened in that room and what will need to be true on the day this goes live.

That distance is where implementations are won or lost. It deserves more attention than it typically gets.