Why Payments Need Grounded AI, Not Just LLMs

This is part 1 of a series on building grounded AI for payment systems. This post sets up the problem. Part 2 covers the retrieval pattern, and part 3 covers the practical use cases in POS, SoftPOS, and e-commerce.

LLMs are impressive. In payments, impressive is not enough.

A generic model can explain what an authorization is. It can describe fraud detection. It can summarize 3-D Secure (3DS), chargebacks, or issuer decline codes. None of that means it understands your payment system.

It does not know the merchant configuration. It does not know which acquirer route that transaction took. It does not know whether a decline came from the issuer, the risk engine, the SoftPOS SDK, the device state, the 3DS flow, or a timeout somewhere in the middle.

And when it does not know, it may still answer confidently. I wrote about this failure mode in Syntactic Fluency, Semantic Fragility: the model’s fluency is constant, but its grip on the facts is not. That gap is tolerable in a brainstorming session. In payments, it is not.

Why a wrong explanation costs money

In most domains, a wrong answer from an AI assistant is an inconvenience. In payments, a wrong explanation propagates.

Tell a support agent that a decline was “a failed 3DS authentication” when the real cause was a SoftPOS device attestation failure, and you get a support case that goes nowhere, a merchant who retries the wrong fix, and possibly a chargeback or dispute built on a bad premise. If the explanation feeds a risk review or a compliance file, the cost compounds further.

Payments are a high-trust, regulated, latency-sensitive domain. The same decline code can mean different things depending on the card scheme, the issuer, the merchant category, the transaction channel, the terminal capability, and the authentication outcome. A model answering from generic training data has none of that context. It has vocabulary.

This matches what Nazar et al. observed when they tested vanilla LLMs on wireless environment perception in the ENWAR paper: general models produce plausible, superficial descriptions of a domain they only know through text, and the fix is retrieval of actual domain context before generation, not a bigger model. The same diagnosis applies to payments. The vocabulary is in the training data; the operational truth is not.

The wrong starting point

When teams start exploring LLMs for POS and e-commerce, the first idea on the whiteboard is usually some version of “let the AI approve or reject the payment.”

My position: that is the wrong starting point, and not because the models are weak.

Authorization runs in a window of two to three seconds, end to end, across multiple parties. A large model reasoning over logs and documentation does not fit in that budget. The components in that path are certified and deterministic for a reason: EMV processing, PIN handling, and cryptographic operations sit behind compliance boundaries (PCI MPoC for SoftPOS, among others) that a probabilistic text model has no business crossing. And an approve/reject decision made by a model that cannot show its evidence is an audit problem waiting to happen.

The safer and more useful starting point is the other direction:

Let AI help us understand what happened.

A transaction is a path, and every step leaves evidence

A payment transaction is not one event. It is a path through a system:

customer → checkout → authentication → risk → routing → issuer authorization → capture → settlement → reconciliation → dispute lifecycle

Every step leaves evidence. Transaction events. Issuer response codes. 3DS results. Fraud scores. Device telemetry. SoftPOS attestation results. SDK logs. Merchant configuration. Support history. Incident reports.

Today, that evidence is scattered across gateways, SDKs, risk engines, ticketing systems, and monitoring tools. When a merchant asks “why are my Tap to Pay transactions failing above 50 euros?”, someone has to walk that path by hand, system by system. That investigation work is where the time goes, and that is where the opportunity is.

Not a chatbot guessing from generic training. A grounded intelligence layer that retrieves the right evidence, connects the dots, and explains the situation in language a support agent, an engineer, a risk analyst, or a merchant can act on.

The deterministic payment engine keeps executing transactions. The AI layer makes the lifecycle around it explainable, searchable, and operationally useful. Explanation first; decisions stay where the certification is.

Where this series goes

All of this rests on one principle:

Before an LLM answers, it needs context.

Retrieval-Augmented Generation is the established pattern for delivering that context (Lewis et al. introduced it in 2020), and the ENWAR work shows how far it can go when the retrieved context is built from live multi-modal system signals rather than static documents. Part 2 translates that architecture into payments: turning transaction events, SoftPOS telemetry, fraud signals, and merchant configuration into a knowledge layer the model must answer from.

References

A. M. Nazar, A. Celik, M. Y. Selim, A. Abdallah, D. Qiao, A. M. Eltawil, “Enwar: A RAG-empowered Multi-Modal LLM Framework for Wireless Environment Perception,” arXiv:2410.18104, 2024.
P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” NeurIPS 2020, arXiv:2005.11401.
PCI Security Standards Council, Mobile Payments on COTS (MPoC) Standard.