
Part 1 of 5
The Augmented Treasurer series
The explainability problem — and why task decomposition is the answer
One of the most persistent concerns around deploying AI in treasury is the "black box" problem: how can a CFO or treasurer trust cash flow or hedging decisions produced by a probabilistic model? Andreas Hafver, who leads emerging assurance technologies at DNV and contributes to NorwAI's trustworthy AI work, argues that framing the issue this way misses the point.
The key insight is to separate how a model works internally from what it actually produces. An LLM is probabilistic by nature, but that does not mean its outputs are untrustworthy. What matters is whether those outputs can be verified.
Andreas Hafver: You can have a probabilistic large language model that is internally probabilistic. But in the end, it produces some content, and then you could ask yourself: does the answer it gives, is it logical? Does it actually cite correct sources, or is it making things up?
His prescription is task decomposition: rather than asking an AI to handle an entire treasury workflow end-to-end, break the process into smaller, auditable steps. Each step becomes easier to check, easier to verify, and easier to trust.
Andreas Hafver: Instead of having like, you know, just asking ChatGPT or Claude to do a full task, you would probably chunk it up, make AI tools to help with smaller parts of those tasks. By chunking it up, it makes it more reproducible and reliable. For each step, a human can check: does it make any sense? Does it cite the correct sources?
This also changes the role of the AI itself. Hafver draws a sharp distinction between AI doing calculations and AI running a calculator, a distinction with significant implications for treasury:
Andreas Hafver: You don't want the AI to do calculations. But what you do want it to do is to be able to call the calculator — to put things into a spreadsheet, run an algorithm, and read out the results. You don't want the AI to do these things itself. You want it to rely on other tools.
The mental model he returns to throughout the conversation is that of a supervised junior employee: capable, fast, and useful for tedious work — but not someone you leave unsupervised with a critical task.
Andreas Hafver: I see the LLMs more like the glue in between these different tools — like the junior employee you can give some tedious, repetitive tasks to help with. But you still need somebody more senior to have the overview and to approve things.
From auditing data to auditing the workflow
The second question put to Hafver — whether the shift from rules-based to inference-based AI requires auditing the AI agent itself, not just the data — drew a nuanced answer that has direct implications for treasury governance.
His view is that it is both: you must audit the data and the workflow, but you do not necessarily need to audit the opaque interior of the model itself. What matters is that the system follows a defined, verifiable process.
Andreas Hafver: You need to put them into a workflow — a workflow that follows some rules, that is compliant. You have certain procedures. You need to check that all of those are ticked off: that you actually looked up the right sources of information, that you actually did the calculation with a verified tool. You haven't just guessed the answer.
Applied to treasury, this translates to an AI-augmented but rules-anchored operating model. The TMS and any verified calculation engines remain as trusted, auditable tools. The AI's role is to orchestrate the flow between them — collecting data, spinning up scenarios, formatting outputs — while a human oversees each stage.
Andreas Hafver: I think of it more like a universal adapter. You can combine all kinds of tools that you have, but it has been too time-consuming for a human to input the data, reformat it, put it into the next tool, take the result, import it into another tool to make the report. The LLM handles those boring side-steps in between.
For the specific case of Monte Carlo scenario analysis — a recurring gap in legacy treasury management systems — Hafver clarifies a subtle but important point:
Andreas Hafver: You don't want to replace the simulator with AI. What you want to do is let the AI run the simulator. And let the AI analyse the output of the simulator. But the simulator itself — you don't want to replace that with AI.
Human accountability in an AI-augmented treasury
The third and arguably most important question concerns legal and ethical responsibility. If AI begins handling real-time liquidity positions and hedging recommendations, does the treasurer become an auditor rather than a decision-maker?
Hafver is careful here — "this is legal stuff, and it's always difficult to give an answer" — but his position is clear: the human must remain in charge, and the human remains responsible.
Andreas Hafver: I think a human always needs to be in charge. The only times when you might consider otherwise is if the frequency of decisions is such that there's no time for a human to check. In treasury, unlike an autonomous car deciding in a split second, I would imagine there's often enough time for humans to go in and check.
The treasurer's role, in his framing, evolves rather than disappears. They move from doing the tedious work to auditing whether the AI did the tedious work correctly — checking sources, verifying calculations, confirming the workflow was followed.
Andreas Hafver: They would still be responsible for the decisions. And they should be responsible for using AI in a responsible way. They shouldn't just ask the AI to do something and then do whatever it says without checking. They should check through the workflow: did it get the correct data? Did it run the simulation correctly?
He also introduces an elegant validation concept for treasury teams considering greater AI autonomy: the shadow AI. Before delegating decisions to an AI system, run it in parallel — invisible to the process — and compare its recommendations to the human's over time.
Andreas Hafver: You have some humans who make decisions, but in the background you run a shadow AI. You don't use the AI decision, but you record: what would it have decided? Over time, when you compare and see that in ninety-five percent of cases the AI agrees with the human, then you might, if the risk is low, say for example: that five percent is an acceptable business risk.
What's coming: smaller, local, and specialised models
Asked about the near-term direction of AI, Hafver moves away from the narrative of ever-larger general-purpose models and towards a more practically relevant trend for financial institutions: the rise of smaller, purpose-built models that can run locally.
Andreas Hafver: You also see a trend towards smaller models. And what I was saying about chunking up your problems, you may be able to get around it with smaller models because the task is very specialised. For many applications, you would like AI to run on your own computer, not in the cloud. That's possible with smaller models.
This has direct relevance for treasury, where data sovereignty and security are paramount. The parallel with the evolution of TMS systems, from on-premise, to cloud, and possibly back toward hosted, is not lost on him.
Andreas Hafver: I think you will see a hybrid approach. Certain things can be done locally, certain things in the cloud. But finding smart ways of doing it, to avoid sending data unnecessarily, both because of security issues and because of energy waste and unnecessary delays, that's the direction.
The term he flags as worth watching for treasury professionals is not the familiar "agentic AI" but a less-discussed concept: Computer Augmented Generation (CAG), which captures precisely the orchestration model he has been describing throughout — AI as the connective layer between verified, specialised
Keep in touch on social
Follow us on LinkedIn to keep up to date with all our news




