Beneath every AI system you use every day — LLMs, code generators, structured data pipelines — there is an asymmetry worth understanding: these models are extraordinary at form, and fragile on meaning.

Your favorite AI can compose a flawless sonnet, generate syntactically perfect ISO 8583 messages, and produce compilable C++ on the first attempt. Ask it whether that output actually makes business sense, and you may get a confident, well-structured, beautifully formatted hallucination.

That is not a bug. It is a structural property of how these systems work.

Three things worth understanding:

1️⃣ Syntax and semantics are two different problems — and models only truly solve one of them.

Syntax asks: is this artifact well-formed? Semantics asks the harder question: does it mean something valid in this context? LLMs predict the most plausible next token based on a probability distribution — not the next true one based on logical necessity. That distinction matters enormously. Models excel at local correctness — each field, each clause, each line of code in isolation. What they struggle with is global coherence — the relational constraints that span multiple fields, layers, or concepts. A Terraform configuration that passes all validation checks and exposes your production database to the internet. A prescription where every field is correct and the drug combination is lethal. The form is right. The meaning is broken.

2️⃣ The ISO 8583 thought experiment makes this concrete.

Consider a generated authorization request where DE 22 is set to 051 — chip read, physical card inserted — and DE 55 carries EMV ICC data confirming the chip interaction. But DE 25 is set to 08 — mail/phone order, card-not-present. Every field is correctly encoded. The MTI is valid. The PAN passes Luhn. The BCD encoding is flawless. And the message is semantically impossible. You cannot simultaneously read a physical chip and conduct a mail-order transaction. Any payment processor rejects it instantly. Any experienced payments engineer catches it in seconds. The model missed it because it knows what values are syntactically valid for each field in isolation — but does not understand the domain invariant that binds them together.

3️⃣ The gap is manageable — but only if you architect for it deliberately.

Five strategies that work today: structured validation layers — use the model for generation, then pass output through domain-specific validators; semantic guardrails in the prompt — explicitly state domain invariants the model can respect when told but will violate when not; RAG to ground the model in authoritative documentation; human-in-the-loop for any domain where semantic errors carry real consequences; and fine-tuning on validated domain corpora to shift the probability distribution toward correctness. The model drafts. The domain expert validates. That division of labour is not a limitation. It is the correct architecture for where these systems actually are.

The engineering takeaway:

Trust the syntax. Verify the semantics. Always.

If AI can build the cathedral’s facade with perfect precision, our job is to make sure the foundation is not built on sand. As AI handles the burden of syntax, the engineer’s role is not diminished. It is sharpened.

Full breakdown on corebaseit.com: 🔗 https://corebaseit.com/posts/syntactic-fluency-semantic-fragility/

References

Searle, J. R. (1980). “Minds, brains, and programs.” Behavioral and Brain Sciences, 3(3). Bender, E. M. & Koller, A. (2020). “Climbing towards NLU.” ACL 2020. Ji, Z. et al. (2023). “Survey of Hallucination in Natural Language Generation.” ACM Computing Surveys.

#AI #LLM #GenerativeAI #AIArchitecture #SoftwareEngineering #Payments #ISO8583 #PaymentSecurity #AIEngineering #PromptEngineering #Hallucination #Fintech #corebaseit