Architectural Decision Records (ADR)

1. Define an Architecture Decision Record (ADR) and its primary purpose in an agile development environment.

An Architecture Decision Record (ADR) is a short document that captures an important architectural decision, along with its context and consequences.

Its primary purpose in an agile environment is to:

  • Preserve decision history as systems evolve quickly

  • Explain why decisions were made (not just what was built)

  • Help onboard new team members faster

  • Prevent re-debating the same issues repeatedly

  • Support incremental, evolutionary architecture

ADRs fit agile because they are lightweight, versioned, and written as decisions happen, not as heavy upfront documentation.

2. Structure — Five ADR Sections (Michael Nygard)

1. Title
Short description of the decision.
➡ Example: “Use gRPC for Internal Sensor Communication”

2. Status
Current state of the decision.
➡ Proposed, Accepted, Superseded, Deprecated

3. Context
Background, constraints, and problem statement.
➡ Why this decision was needed.

4. Decision
The actual choice made.
➡ Clear statement of what will be done.

5. Consequences
Results and impacts of the decision.
➡ Positive, negative, risks, trade-offs.

3. Significance — Two-way Door vs One-way Door

This analogy comes from decision theory popularized by Amazon.

Two-way door decision

  • Easy to reverse

  • Low cost to change

  • Example: Switching a logging library

One-way door decision

  • Hard or very expensive to reverse

  • Long-term system impact

  • Example: Choosing database technology or communication protocol

How it helps decide whether to write an ADR:

  • One-way door → Always write ADR

  • Two-way door → ADR optional unless high impact or repeated debate risk

4. Anti-Patterns

Groundhog Day Anti-Pattern
Same architectural discussions happen repeatedly because decisions aren’t recorded.

Email-Driven Architecture Anti-Pattern
Important decisions are buried in email threads, Slack messages, or meetings.

How ADRs help

  • Provide a single source of truth

  • Make decisions searchable and version controlled

  • Reduce time wasted re-discussing settled topics

  • Preserve knowledge beyond individual team members

5. Rationale — Why “Why” > “How”

The Second Law of Software Architecture says every decision involves trade-offs.

The Rationale is often more important because:

  • Implementation details change over time

  • Future teams need to understand constraints and trade-offs

  • Helps evaluate whether the decision is still valid later

  • Prevents repeating rejected approaches

If you know why, you can adapt the how safely.

6. Scenario — Smart Toaster Y-Statement

In the context of real-time communication between distributed internal sensors and the central control hub in a smart toaster,
we decided to use gRPC instead of REST,
to achieve low-latency, strongly typed, efficient binary communication with streaming support,
accepting increased complexity, tighter coupling through contracts, and reduced human readability of traffic.

7. Consequences — NoSQL for Breakfast Preferences

Positive Consequence 1
Flexible schema allows rapid feature evolution and new preference types without migrations.

Positive Consequence 2
Better horizontal scalability for large user preference datasets.

Negative Consequence
Reduced consistency guarantees and more complex querying/reporting compared to relational databases.

(Other valid negatives could include skill gaps, tooling maturity, or data duplication risks.)

8. Lifecycle — Replacing ADR #0005 with ADR #0012

You should:

  1. Create ADR #0012

    • Explain new legal/privacy context

    • State new decision

    • Reference ADR #0005 as superseded

  2. Update ADR #0005 Status

    • Change status to Superseded by ADR #0012

  3. Maintain History

    • Do NOT delete old ADRs

    • Keep traceability for audits and learning

This preserves decision lineage and compliance evidence.

9. Tooling — Ensuring Developer Compliance

Association Principle:
Architectural decisions should be directly connected to the code, tests, or artifacts they influence.

Method + Tool Example

Method:
Require ADR references in pull requests for architecture-impacting changes.

Tool / Technique:
Use GitHub Pull Request templates + CI checks

  • PR template includes: “Related ADR: ____”

  • CI validates ADR exists if certain files change (e.g., infra, data, API contracts)

Alternative tools:

  • Architecture linting rules

  • ADR repositories with markdown + automation

  • Tools like Backstage or ADR CLI workflows


ADR-0001 — Use Retrieval-Augmented Generation (RAG) Instead of Pure LLM Responses

Status: Accepted
Date: 2026-02-06

Context

The chatbot will answer cancer patient questions about:

  • Treatment pathways

  • Medication information

  • Appointment preparation

  • Hospital-specific procedures

Pure LLM responses risk hallucinating medical information and may not reflect hospital-approved guidance or Danish healthcare regulations.

Decision

We will implement a Retrieval-Augmented Generation (RAG) architecture:

  • Hospital-approved knowledge base (clinical guidelines, patient leaflets, oncology protocols)

  • Vector search retrieval

  • LLM generates answers using retrieved context only

Consequences

Positive

  • Reduced hallucination risk

  • Answers aligned with hospital-approved content

  • Easier clinical review and auditing

Negative

  • Higher system complexity

  • Requires knowledge base curation and updates

  • Slight latency increase vs direct LLM calls

ADR-0002 — Host Patient Data Processing Inside EU / Denmark-Compliant Cloud

Status: Accepted

Context

The system processes:

  • Potentially identifiable patient conversations

  • Health-related data (GDPR special category data)

Danish and EU privacy regulations require strict control over health data processing and residency.

Decision

We will:

  • Host services in EU-based healthcare-compliant cloud infrastructure

  • Store patient data only in GDPR-compliant systems

  • Use data minimization and automatic redaction pipelines

Consequences

Positive

  • Regulatory compliance

  • Reduced legal risk

  • Increased patient trust

Negative

  • Fewer vendor choices

  • Potentially higher infrastructure cost

  • Some cutting-edge AI services may be unavailable

ADR-0003 — Use Human-in-the-Loop Escalation for Medical Advice Risk

Status: Accepted

Context

Cancer patients may ask:

  • Prognosis questions

  • Medication safety questions

  • Emergency symptom questions

The chatbot must not provide unsafe or unverified clinical guidance.

Decision

Implement risk classification + escalation workflow:

  • Low risk → automated chatbot response

  • Medium risk → chatbot response + disclaimer + recommend contacting clinic

  • High risk → block answer + escalate to nurse/oncology team queue

Consequences

Positive

  • Improved patient safety

  • Aligns with medical governance

  • Reduces liability exposure

Negative

  • Requires clinical staffing integration

  • Operational overhead

  • Slower response for high-risk queries

ADR-0004 — Use Multilingual Support (Danish First, English Secondary)

Status: Accepted

Context

Primary users are Danish cancer patients, but:

  • Some patients speak English

  • Some staff will test or support in English

Decision

We will:

  • Optimize prompts and knowledge base primarily for Danish

  • Provide English fallback responses

  • Store knowledge base in Danish + translated versions where required

Consequences

Positive

  • Better patient accessibility

  • Higher answer quality for primary population

Negative

  • Translation maintenance overhead

  • Need bilingual clinical review

ADR-0005 — Log Conversations for Clinical Quality Review (With De-Identification)

Status: Proposed

Context

Hospital wants to:

  • Improve chatbot quality

  • Detect unsafe outputs

  • Train future models

But must protect patient privacy.

Decision

We will:

  • Store conversation logs after automatic de-identification

  • Remove CPR numbers, names, addresses, phone numbers

  • Allow opt-out logging for patients

Consequences

Positive

  • Enables continuous improvement

  • Supports incident investigation

  • Helps clinical audit

Negative

  • Requires strong anonymization pipeline

  • Residual re-identification risk

  • Additional compliance review overhead

ADR-0006 — Restrict Chatbot to Informational Support, Not Diagnosis

Status: Accepted

Context

LLMs can appear authoritative. Cancer diagnosis or treatment planning must remain clinician-led.

Decision

The chatbot will:

  • Provide educational and navigation support only

  • Never generate diagnosis statements

  • Include safety guardrails and prompt constraints

Consequences

Positive

  • Reduces clinical and legal risk

  • Clear scope for product

Negative

  • Some patients may expect deeper clinical answers

  • Requires careful UX messaging

ADR-0007 — Use Prompt Versioning + Model Version Tracking

Status: Accepted

Context

LLM behaviour changes over time due to:

  • Prompt updates

  • Model upgrades

  • Knowledge base updates

Healthcare requires traceability.

Decision

Every response will log:

  • Model version

  • Prompt template version

  • Knowledge snapshot ID

Consequences

Positive

  • Enables audit trails

  • Supports incident root-cause analysis

  • Helps regulatory reporting

Negative

  • More logging storage cost

  • Additional engineering complexity

ADR-0001 — Use LLM + Rules Hybrid for Rufus Content Optimisation

Status: Accepted
Date: 2026-02-06

Context

The tool must generate and optimise:

  • Product titles

  • Bullets

  • A+ content suggestions

  • Conversational answer snippets (Rufus Q&A style)

Pure LLM generation risks:

  • Brand compliance violations

  • Regulatory claim risks (especially dairy & nutrition claims in EU)

  • Inconsistent tone across brands (Mars vs Arla brand voices differ)

Decision

Use a hybrid system:

  • LLM generates candidate optimisations

  • Rules engine validates:

    • Brand voice rules

    • Nutrition / health claim compliance

    • Amazon category content policies

Consequences

Positive

  • Safer automated content generation

  • Maintains brand consistency

  • Reduces legal/compliance review load

Negative

  • Higher engineering complexity

  • Requires ongoing rule maintenance

  • Slightly slower generation pipeline

ADR-0002 — Centralised Brand Knowledge Graph for Product + Claim Relationships

Status: Accepted

Context

Mars and Arla manage:

  • Hundreds of SKUs

  • Complex claim relationships (e.g., protein content, sustainability claims, ingredient sourcing)

  • Frequent product refresh cycles

Simple flat product databases make optimisation logic brittle.

Decision

Implement a Brand Knowledge Graph storing:

  • SKU → Ingredients → Claims → Certifications → Regions → Campaigns

  • Queryable by optimisation engine and analytics

Consequences

Positive

  • Enables richer Rufus conversational answers

  • Supports cross-product optimisation insights

  • Improves data reuse across brands

Negative

  • Higher data modelling effort

  • Requires governance and stewardship

  • Harder initial setup vs relational tables

ADR-0003 — Use Retrieval-Augmented Generation (RAG) for Rufus Q&A Optimisation

Status: Accepted

Context

Rufus answers customer questions using Amazon catalog data plus contextual signals.
Optimisation requires aligning product content with likely customer questions.

Decision

Use RAG combining:

  • Amazon product catalog exports

  • Customer review themes

  • Search query datasets

  • Brand product documentation

LLM generates Rufus-aligned answer suggestions using retrieved evidence.

Consequences

Positive

  • Improves factual grounding

  • Aligns content to real customer language

  • Reduces hallucinated product claims

Negative

  • Requires vector index maintenance

  • Data freshness dependency

  • More infrastructure cost

ADR-0004 — Multi-Tenant Architecture With Brand Isolation

Status: Accepted

Context

The platform will serve multiple FMCG clients:

  • Mars

  • Arla

  • Potential future EU FMCG brands

Brand data and strategy must remain isolated.

Decision

Implement logical tenant isolation:

  • Separate brand vector indexes

  • Separate optimisation rule sets

  • Separate analytics dashboards

  • Shared core infrastructure

Consequences

Positive

  • Scales to additional clients

  • Protects commercial data

  • Enables brand-specific optimisation strategies

Negative

  • Increased deployment complexity

  • Higher operational monitoring overhead

ADR-0005 — Use EU Cloud Hosting With Data Export Controls

Status: Accepted

Context

Although product data is not personal data, tool usage may include:

  • Customer review text

  • Search behaviour datasets

  • Potential marketplace partner data

EU companies prefer EU-hosted processing.

Decision

Host analytics + optimisation services in EU cloud regions with:

  • Data export logging

  • Role-based access controls

  • Audit logging for client data usage

Consequences

Positive

  • Supports EU enterprise procurement requirements

  • Improves client trust

  • Easier compliance reviews

Negative

  • Some AI services may have delayed EU availability

  • Higher infra cost vs global-only hosting

ADR-0006 — Optimisation Recommendations Must Be Explainable

Status: Accepted

Context

Brand and category managers must trust optimisation outputs.
Black-box recommendations reduce adoption.

Decision

Every optimisation suggestion must include:

  • Source signal (reviews, search queries, competitor analysis)

  • Confidence score

  • Expected impact metric (CTR, CVR, visibility probability)

Consequences

Positive

  • Higher adoption by brand teams

  • Easier A/B test planning

  • Supports internal justification

Negative

  • Requires extra data processing

  • More complex UI and data pipelines

ADR-0007 — Continuous Learning via Closed-Loop Performance Feedback

Status: Proposed

Context

Optimisation value depends on learning from:

  • Content changes

  • Conversion performance

  • Rufus interaction trends

Decision

Implement feedback loop using:

  • Amazon performance metrics (where available)

  • Internal experimentation tracking

  • Content version → performance correlation

Consequences

Positive

  • Improves optimisation quality over time

  • Enables ROI measurement

Negative

  • Attribution complexity

  • Requires careful experiment design

Below is a clean, production-quality ADR set for adopting and operating Fin (Fin.ai) by Intercom as an enterprise AI Customer Service Agent.
I’ve structured these like ADRs you’d realistically store in a repo for architecture governance, security review, or procurement sign-off.

Fin.ai / Intercom — Example ADR Set

ADR-0001 — Adopt Fin (Intercom) as Primary AI Customer Service Agent

Status: Accepted
Date: 2026-02-06

Context

The organisation requires an AI agent capable of:

  • Resolving complex customer queries

  • Operating across multiple channels

  • Integrating with existing helpdesk infrastructure

  • Operating with minimal engineering overhead

Internal LLM build options were evaluated but present:

  • Higher time-to-market

  • Ongoing MLOps overhead

  • Increased governance and safety burden

Decision

We will adopt Fin by Intercom as the primary AI Customer Service Agent, due to:

  • Average ~65% resolution rate

  • No-code configuration model

  • Rapid deployment (≈ <1 hour setup)

  • Native helpdesk integration capability

  • Enterprise safety controls and certifications

Consequences

Positive

  • Fast deployment and time to value

  • Reduced internal AI engineering cost

  • Proven production reliability

  • Continuous vendor model improvement

Negative

  • Vendor lock-in risk

  • Limited low-level model customisation

  • Ongoing per-resolution operational cost

ADR-0002 — Multi-Channel Customer Interaction Strategy

Status: Accepted

Context

Customers contact support through:

  • Live chat

  • Email

  • SMS

  • Phone (voice assist workflows)

  • Social media messaging

Fragmented automation causes inconsistent customer experience and duplicated support workflows.

Decision

Fin will be deployed as a single AI resolution layer across:

  • Web chat

  • Email

  • Messaging channels

  • Voice-assist workflows (via integration layer)

Consequences

Positive

  • Unified customer experience

  • Shared training data across channels

  • Higher automation coverage

Negative

  • Channel-specific nuance may require custom training

  • Increased monitoring complexity across channels

ADR-0003 — Helpdesk Integration (No Migration Strategy)

Status: Accepted

Context

Current support stack includes:

  • Existing helpdesk ticketing workflows

  • Agent assignment rules

  • Reporting pipelines

  • CRM integrations

Full migration to a single vendor platform is not feasible short-term.

Decision

Fin will integrate with existing helpdesks (e.g., Intercom, Zendesk, Salesforce Service Cloud) using:

  • Native integration connectors

  • Existing routing and escalation rules

  • Existing reporting dashboards

Consequences

Positive

  • Minimal operational disruption

  • Faster rollout

  • Lower training requirements for support teams

Negative

  • Potential limitations vs full native platform features

  • Requires integration monitoring

ADR-0004 — Outcome-Based Pricing Model Adoption

Status: Accepted

Context

Traditional SaaS pricing models charge per seat or per conversation.
Fin pricing is based on per successful resolution (~$0.99 per resolution).

Decision

We will adopt outcome-based pricing and align Fin usage with:

  • Resolution quality metrics

  • Automation rate targets

  • Cost-per-resolution thresholds

Consequences

Positive

  • Direct ROI measurement

  • Aligns cost to business value

  • Encourages optimisation of automation quality

Negative

  • Requires strong resolution definition governance

  • Requires cost forecasting models

ADR-0005 — Security and Identity Verification Using JWT

Status: Mandatory

Context

Customer identity verification is required to prevent:

  • Account takeover

  • Cross-user data leakage

  • Session spoofing

Legacy identity verification approaches are deprecated.

Decision

We will use JWT-based identity verification for Intercom Messenger:

  • Signed user ID and email

  • Short token expiry

  • Token rotation via backend auth service

Consequences

Positive

  • Strong identity validation

  • Reduced impersonation risk

  • Industry standard implementation

Negative

  • Requires secure token issuance service

  • Requires key rotation governance

ADR-0006 — AI Governance and Responsible AI Compliance

Status: Accepted

Context

Customer-facing AI introduces risks:

  • Hallucinations

  • Brand reputation risk

  • Unsafe or misleading responses

  • Prompt injection or adversarial queries

Decision

We will align operations with:

  • Intercom AI security protections

  • AIUC-1 certification framework controls

  • Internal AI governance review process

  • Continuous output monitoring

Consequences

Positive

  • Demonstrable responsible AI posture

  • Strong enterprise procurement alignment

  • Reduced brand risk

Negative

  • Requires ongoing audit reviews

  • Requires internal AI risk reporting process

ADR-0007 — Data Residency and Privacy Compliance Strategy

Status: Accepted

Context

Operations span:

  • EEA

  • UK

  • USA

Must comply with:

  • GDPR

  • UK GDPR

  • CCPA (where applicable)

Decision

We will:

  • Use Intercom approved data hosting frameworks

  • Use legal transfer mechanisms (e.g., DPF or equivalent where applicable)

  • Provide transparency via privacy policy updates

  • Implement data minimisation in customer payloads

Consequences

Positive

  • Regulatory compliance support

  • Lower legal exposure

  • Easier enterprise sales approvals

Negative

  • Requires ongoing vendor compliance review

  • May limit certain data uses

ADR-0008 — Prohibited Sensitive Data Handling

Status: Mandatory

Context

Standard support tooling is not certified for:

  • PCI DSS payment storage

  • Certain regulated health data processing

  • Special category sensitive data

Decision

Fin will be configured to:

  • Avoid requesting or storing sensitive personal data

  • Redirect sensitive workflows to secure systems

  • Use detection rules to block restricted data categories

Consequences

Positive

  • Reduces compliance exposure

  • Protects customers and organisation

Negative

  • Requires fallback flows for sensitive workflows

  • May increase handoffs to human agents

ADR-0009 — Continuous Optimisation Using Fin Analytics and Testing

Status: Accepted

Context

AI performance degrades without:

  • Content updates

  • Training refresh

  • Behaviour monitoring

Decision

We will use Fin tooling to:

  • Analyse support trends

  • Identify content gaps

  • Run answer testing experiments

  • Optimise tone and localisation (45+ languages capability)

Consequences

Positive

  • Continuous automation improvement

  • Higher long-term resolution rates

Negative

  • Requires content operations ownership

  • Requires performance review cadence

Strategic ADR — Fin vs Build-Your-Own LLM Customer Support Platform

ADR-STRAT-0001 — Strategic Direction for AI Customer Service Platform

Status: Accepted (Example)
Owner: CTO / Chief Digital Officer
Decision Type: Strategic / Board-Level
Date: 2026-02-06

Context

The organisation is investing in AI-driven customer support automation to achieve:

  • Higher resolution rates

  • Lower cost per contact

  • 24/7 global support coverage

  • Scalable multilingual support

  • Improved customer experience

Two strategic options were evaluated:

Option A — Adopt Fin (Intercom Managed AI Agent)

A vendor-managed AI support agent providing:

  • No-code configuration

  • Multi-channel deployment

  • Outcome-based pricing

  • Enterprise-grade safety controls

  • Pre-built integrations

Option B — Build Proprietary LLM Support Platform

Internally built AI support platform including:

  • Custom LLM orchestration

  • Internal RAG pipelines

  • Internal safety guardrails

  • Custom evaluation frameworks

  • Full data and model control

Decision

The organisation will adopt Fin (Intercom) as the primary AI customer support platform for the next 24–36 months, while maintaining a strategic capability assessment for potential future internal AI platform development.

Strategic Drivers Behind Decision

1. Time to Value

Fin deployment: weeks
Internal build: 12–24 months to mature production capability

2. Risk Reduction

Vendor solution provides:

  • Mature production usage

  • Pre-built safety controls

  • Certified responsible AI controls

  • Enterprise-grade reliability

3. Talent and Capability Reality

Building requires:

  • LLM platform engineering

  • AI safety engineering

  • ML infrastructure (MLOps / LLMOps)

  • Continuous evaluation teams

This capability is currently limited internally.

4. Cost Predictability

Fin:

  • Variable cost aligned to outcomes

  • Lower upfront capital investment

Build:

  • High upfront platform cost

  • Ongoing model ops + infra + team cost

Alternatives Considered

Full Internal LLM Platform

Rejected for now due to:

  • High upfront cost

  • Talent acquisition risk

  • Longer time to business value

  • Higher regulatory and safety ownership burden

Hybrid (Internal LLM + Vendor Tools)

Deferred until internal AI maturity increases.

Consequences

Positive Consequences

1. Faster Business Impact

Immediate automation improvements and measurable ROI.

2. Lower Execution Risk

Vendor absorbs:

  • Model maintenance

  • Safety tuning

  • Threat mitigation

  • Scaling infrastructure

3. Reduced Regulatory Burden

Vendor certification and governance frameworks reduce internal compliance workload.

4. Predictable Cost Model

Outcome pricing aligns cost with customer value delivered.

Negative Consequences

1. Vendor Dependency

Risk of:

  • Pricing changes

  • Feature roadmap control outside organisation

  • Contract lock-in

Mitigation:

  • Data portability strategy

  • Periodic market re-evaluation

  • Exit architecture maintained

2. Reduced Differentiation

Competitors could use same vendor platform.

Mitigation:

  • Differentiate through:

    • Proprietary knowledge

    • Customer workflows

    • Integration depth

    • Customer experience design

3. Long-Term Strategic Capability Gap

Risk of falling behind in internal AI capability.

Mitigation:

  • Maintain internal AI architecture review group

  • Run small internal experimentation programme

  • Reassess build vs buy every 12 months

Financial Trade-Off Summary

DimensionFinBuild InternalUpfront CostLowVery HighOngoing CostVariableHigh Fixed + VariableTime to MarketFastSlowTalent RequirementLowVery HighControlMediumFullRiskLowerHigher (initially)

Risk Register (Strategic)

RiskImpactLikelihoodMitigationVendor pricing increaseMediumMediumMulti-year contract + exit planVendor roadmap misalignmentMediumMediumQuarterly roadmap governanceVendor outage riskHighLowHuman fallback workflowsAI regulatory changesHighMediumVendor compliance + internal review

Exit Strategy

The architecture will ensure:

  • Conversation data export capability

  • Knowledge base portability

  • Integration abstraction layer

  • Internal evaluation capability retained

This enables future migration if:

  • Costs become unfavourable

  • Strategic differentiation requires internal AI

  • Regulatory requirements change

Strategic Review Cadence

Re-evaluate Build vs Buy when:

  • Internal AI capability reaches maturity

  • Support automation becomes core competitive differentiator

  • Vendor costs exceed internal TCO

  • Regulatory environment changes

Formal review interval: 12 months

Board-Level Y-Statement

In the context of scaling global customer support automation while managing cost, regulatory exposure, and delivery risk,
we decided to adopt Fin (Intercom) as the primary AI support agent,
to achieve rapid time-to-value, predictable operational cost, and enterprise-grade safety and reliability,
accepting vendor dependency and reduced low-level AI customisation in the short to medium term.


Below is a Security / Board + Risk Committee level ADR focused specifically on LLM threat modelling.
This sits alongside strategic ADRs and would typically be owned jointly by Security, Platform, and AI Governance.

Security ADR — LLM Threat Model and Mitigation Strategy

ADR-SEC-0004 — LLM Threat Model (Prompt Injection, Data Exfiltration, Jailbreak Resistance)

Status: Accepted (Example)
Owner: CISO / Head of AI Security
Decision Type: Security Architecture / Enterprise Risk
Date: 2026-02-06

Context

The organisation is deploying LLM-powered customer-facing and internal AI systems (e.g., AI support agents, copilots, automation assistants).

LLMs introduce new attack classes not fully covered by traditional AppSec models:

  • Prompt Injection

  • Data Exfiltration via Model Context

  • Jailbreaks / Policy Evasion

  • Tool Abuse (if agentic capabilities exist)

  • Training Data Leakage

  • Cross-tenant Context Leakage

These risks can impact:

  • Customer data confidentiality

  • Regulatory compliance (GDPR, PCI, etc.)

  • Brand reputation

  • Operational trust in AI systems

Decision

We will implement a Layered LLM Security Model covering:

  1. Input Security Controls

  2. Context Isolation and Data Minimisation

  3. Output Validation and Policy Enforcement

  4. Runtime Behaviour Monitoring

  5. Vendor Security Assurance (where applicable)

  6. Continuous Red Teaming and Testing

Threat Model Scope

In Scope

  • External user prompts

  • Retrieved knowledge base content

  • Tool invocation payloads

  • LLM responses

  • System prompts and hidden instructions

  • Integration data (CRM, ticketing, internal APIs)

Out of Scope

  • Non-LLM traditional application attacks (handled by existing AppSec framework)

Primary Threat Classes and Controls

Threat Class 1 — Prompt Injection

Threat Description

Attackers attempt to override system instructions or force the model to:

  • Reveal hidden prompts

  • Ignore policy controls

  • Perform unintended actions

  • Access restricted data

Example attack patterns:

  • “Ignore previous instructions…”

  • “You are now in debug mode…”

  • Embedded instructions in uploaded documents or URLs

Controls

Technical Controls

  • Input classification and risk scoring

  • Context boundary enforcement

  • Instruction hierarchy (system > policy > user)

  • Prompt template hardening

  • Retrieval content sanitisation

Operational Controls

  • Red team prompt injection testing

  • Prompt change approval workflow

Residual Risk

Medium — cannot be fully eliminated, only reduced.

Threat Class 2 — Data Exfiltration

Threat Description

Attackers attempt to extract:

  • PII

  • Customer records

  • Internal knowledge base content

  • System prompts

  • Cross-user session data

Controls

Technical Controls

  • Strict tenant isolation

  • Data minimisation in prompt context

  • Retrieval filtering (need-to-know basis)

  • Output PII scanning

  • Token-level response filtering

Architecture Controls

  • No direct raw database access from LLM

  • Mediated tool access via policy layer

  • Zero-trust service access

Residual Risk

Low–Medium depending on data exposure surface.

Threat Class 3 — Jailbreaks

Threat Description

Attempts to bypass safety or policy constraints through:

  • Multi-step reasoning attacks

  • Role-play attacks

  • Encoding / obfuscation attacks

  • Language switching attacks

Controls

Technical Controls

  • Safety classifier layer before response release

  • Adversarial prompt detection

  • Multi-model verification for high-risk outputs

  • Refusal and safe completion patterns

Process Controls

  • Continuous jailbreak dataset testing

  • Vendor model update review

Residual Risk

Medium — evolves with new attack patterns.

Threat Class 4 — Tool / Agent Abuse (If Enabled)

Threat Description

If LLM can:

  • Call APIs

  • Trigger workflows

  • Access data systems

Attackers may attempt:

  • Privilege escalation

  • Fraudulent transactions

  • Data scraping

Controls

  • Tool permission scoping

  • Transaction confirmation gates

  • Policy engine approval layer

  • Rate limiting and anomaly detection

Residual Risk

Low with strict policy enforcement.

Security Architecture Pattern

Required Control Layers

  1. Input Security Layer

    • Injection detection

    • Content classification

  2. Context Security Layer

    • Data filtering

    • Tenant isolation

    • Least privilege retrieval

  3. Model Safety Layer

    • Guardrails

    • Safety tuning

    • Prompt templates

  4. Output Security Layer

    • PII detection

    • Policy validation

    • Response risk scoring

  5. Monitoring Layer

    • Behaviour anomaly detection

    • Security event logging

Governance Requirements

Mandatory

  • LLM threat modelling review per new deployment

  • Quarterly red team exercises

  • Security logging for all LLM requests

  • Incident response playbooks for AI incidents

Vendor Requirements (If Using External AI Providers)

Vendors must provide:

  • Security architecture transparency

  • Model safety documentation

  • Data handling guarantees

  • Incident notification SLAs

  • Independent certification where available

Consequences

Positive

  • Reduces probability of high-impact AI incidents

  • Supports regulatory and audit requirements

  • Builds customer trust in AI deployments

Negative

  • Increases development complexity

  • Adds latency via safety layers

  • Requires dedicated AI security capability

Strategic Risk Acceptance Statement

The organisation acknowledges that:

  • LLM systems cannot be made perfectly secure

  • Risk must be continuously managed and monitored

  • Security posture must evolve with threat landscape

Board-Level Y-Statement

In the context of deploying LLM-powered systems that process customer and enterprise data,
we decided to implement a layered LLM threat defence architecture,
to achieve protection against prompt injection, data exfiltration, jailbreak attacks, and tool abuse,
accepting increased engineering complexity, additional latency, and ongoing operational security investment.

If you want, I can go even deeper into very real-world security architecture, for example:

  • LLM Secure Reference Architecture (with component diagram in text form)

  • AI Incident Response Playbook ADR

  • Red Team Testing Programme ADR (with test categories and cadence)

  • Model Supply Chain Security ADR (training data → model → deployment)

Just tell me which direction you want next.
Operations + Risk ADR — Human Escalation Policy for AI Agents

ADR-OPS-0010 — Mandatory Human Escalation Framework for AI Customer Interactions

Status: Accepted (Example)
Owner: Head of Customer Operations + AI Governance + Risk
Decision Type: Operational Safety / Customer Risk / AI Governance
Date: 2026-02-06

Context

The organisation is deploying AI agents (e.g., LLM-powered support assistants) to automate customer interactions across digital channels.

While AI systems can resolve a large percentage of queries, risks remain:

  • Incorrect or hallucinated answers

  • Sensitive or high-risk customer scenarios

  • Legal or regulatory exposure

  • Customer frustration during complex interactions

  • Situations requiring empathy or judgement

Regulators and enterprise customers increasingly expect human override capability for AI systems affecting customers.

Decision

We will implement a Mandatory Human Escalation Framework ensuring that AI interactions are transferred to human agents when defined risk, complexity, or confidence thresholds are met.

This framework will be enforced via:

  • Automated risk classification

  • Confidence-based escalation

  • User-requested escalation at any time

  • Mandatory escalation categories

Escalation Triggers

1. Customer-Initiated Escalation (Always Allowed)

Customers can request a human agent at any time using:

  • Natural language (“talk to a human”, “agent please”)

  • UI escalation buttons

  • Repeated failed resolution attempts

Policy:
Must never block or delay customer-requested escalation.

2. Risk-Based Automatic Escalation

Escalate when content includes:

Regulatory / Legal Risk

  • Complaints about compliance breaches

  • Legal threats or disputes

  • Regulatory reporting scenarios

Financial Risk

  • Billing disputes

  • Refund escalation requests above threshold

  • Fraud suspicion

Safety / Wellbeing Signals (If Relevant Domain)

  • Emotional distress

  • Medical or safety advice requests

  • Crisis language

3. AI Confidence / Quality Escalation

Escalate when:

  • Model confidence below defined threshold

  • Multiple failed resolution attempts

  • Conflicting retrieved information

  • Out-of-distribution queries

4. Sensitive Data or Account Risk

Escalate when:

  • Identity verification uncertainty

  • Account access anomalies

  • Sensitive data detected in conversation

Escalation Levels

Level 1 — Assisted AI → Human

AI gathers context → Transfers to human agent
Goal: Reduce handling time

Level 2 — Immediate Human Handoff

AI stops responding → Transfers immediately
Used for high-risk categories

Level 3 — Specialist Escalation

Routing to:

  • Tier 2 support

  • Compliance team

  • Security team

  • Legal team

Operational Requirements

Conversation Context Transfer

When escalating, system must pass:

  • Conversation history

  • Customer metadata

  • AI confidence signals

  • Risk classification reason

Response Time Targets

Define SLA for:

  • Standard escalation

  • High-risk escalation

  • Safety-critical escalation

Customer Transparency

Customers must know:

  • They are interacting with AI

  • They can request human help anytime

Governance Requirements

Mandatory Monitoring

Track:

  • Escalation rate

  • False positive escalations

  • Missed escalation incidents

  • Customer satisfaction post-escalation

Audit Requirements

Log:

  • Escalation trigger reason

  • AI confidence score

  • Risk classification output

Review Cadence

Monthly review of:

  • Escalation thresholds

  • Missed risk incidents

  • Customer sentiment trends

Technology Requirements

Systems must support:

  • Real-time risk scoring

  • Confidence scoring

  • Intent classification

  • Escalation routing APIs

  • Agent queue prioritisation

Failure Mode Policy

If AI safety systems fail or degrade:

  • Default to increased human routing

  • Disable high-risk automation flows

  • Trigger incident response process

Consequences

Positive

  • Protects customers from unsafe AI decisions

  • Supports regulatory expectations

  • Maintains brand trust

  • Enables safe automation scaling

Negative

  • Increased human support workload

  • Requires monitoring and tuning

  • Potential increased cost during early rollout

Strategic Risk Acceptance Statement

The organisation acknowledges:

  • AI cannot safely handle 100% of customer scenarios

  • Human judgement remains critical for edge cases

  • Escalation is a safety feature, not a failure

Board-Level Y-Statement

In the context of deploying AI agents to automate customer interactions at scale,
we decided to implement a mandatory human escalation framework,
to achieve safe, trustworthy customer support with regulatory alignment,
accepting increased operational cost and system complexity.


Finance / Platform ADR — AI Cost Governance (Resolution Cost Guardrails)

ADR-FIN-0012 — Resolution Cost Guardrails for AI Customer Support Automation

Status: Accepted (Example)
Owner: CFO + CTO + Head of Customer Operations
Decision Type: Financial Governance / Platform Economics
Date: 2026-02-06

Context

The organisation is deploying AI customer support automation using outcome-based pricing (e.g., cost per successful resolution).

Key risks:

  • Uncontrolled cost growth due to increased automation usage

  • Low-quality AI resolutions driving hidden downstream cost (reopens, complaints, churn)

  • Lack of cost attribution by channel, product, or geography

  • Vendor pricing model dependency

  • Difficulty forecasting cost under usage spikes

Unlike seat-based SaaS pricing, resolution-based pricing is variable OPEX tied to behaviour and volume.

Decision

We will implement Resolution Cost Guardrails across four control layers:

  1. Cost Per Resolution Targets

  2. Volume and Spend Caps

  3. Quality-Adjusted Cost Monitoring

  4. Vendor and Model Usage Optimisation

These guardrails will be enforced through monitoring, alerting, and automated throttling where required.

Cost Governance Framework

Layer 1 — Cost Per Resolution Targets

Define acceptable cost bands:

TierCost Per ResolutionTarget≤ Baseline Support Cost EquivalentWarning10–20% above baselineCritical> 20% above baseline

Baseline = historical human support cost per ticket.

Layer 2 — Volume and Spend Guardrails

Implement:

  • Monthly AI support budget allocation

  • Channel-level cost caps

  • Product-line cost attribution

  • Automatic alert thresholds at 70%, 85%, 95% spend

Optional:

  • Soft throttling during non-critical spikes

  • Human fallback routing if cost risk triggered

Layer 3 — Quality-Adjusted Cost Control

True cost must include:

  • Reopen rate cost

  • Escalation to human cost

  • Refund / compensation cost triggered by AI errors

  • Customer churn signals

Metric example:

True Resolution Cost =
(AI Resolution Cost)

  • (% Reopen × Human Cost)

  • (% Escalation × Human Cost)

  • (Error Remediation Cost)

Layer 4 — Vendor and Model Optimisation

Implement:

  • Channel-specific automation strategy

  • High-confidence automation only in high-volume low-risk flows

  • Continuous vendor pricing review

  • Multi-vendor optional architecture (long-term)

Mandatory Reporting

Executive Dashboard Metrics

  • Cost per AI resolution

  • True cost per resolution (quality adjusted)

  • Automation rate

  • Cost vs human baseline

  • Cost per channel

  • Cost per customer segment

Finance Reporting Cadence

  • Weekly operational cost review

  • Monthly Finance + Operations review

  • Quarterly Board-level cost trend review

Guardrail Enforcement Actions

Warning Level

Actions:

  • Increase monitoring

  • Review automation scope

  • Review resolution definition

Critical Level

Actions:

  • Pause expansion of automation coverage

  • Increase human routing

  • Trigger vendor cost review

  • Trigger architecture review

Technology Requirements

Systems must support:

  • Real-time cost telemetry

  • Resolution-level cost tagging

  • Channel attribution

  • Experiment tracking

  • Automated alerting

Forecasting Requirements

Finance models must include:

  • Seasonal volume spikes

  • Product launch impact

  • Customer growth scenarios

  • Vendor pricing change scenarios

Failure Mode Policy

If cost monitoring systems fail:

  • Default to conservative automation coverage

  • Freeze automation expansion

  • Notify Finance and Platform leadership

Consequences

Positive

  • Predictable AI operating cost

  • Prevents silent cost creep

  • Supports CFO confidence in AI scaling

  • Enables ROI-based automation expansion

Negative

  • Requires strong telemetry infrastructure

  • May temporarily slow automation rollout

  • Requires cross-functional cost ownership

Strategic Risk Acceptance Statement

The organisation acknowledges:

  • AI automation cost is variable by design

  • Perfect cost predictability is not possible

  • Cost governance must be continuous, not one-time

Board-Level Y-Statement

In the context of scaling AI-driven customer support using resolution-based pricing models,
we decided to implement resolution cost guardrails and quality-adjusted cost monitoring,
to achieve predictable AI operating cost and sustainable automation ROI,
accepting increased monitoring complexity and potential automation throttling during cost spikes.


Data + Governance ADR — AI Resolution Evaluation Framework

ADR-DATA-0015 — Standardised AI Resolution Measurement Framework

Status: Accepted (Example)
Owner: Head of Data + Customer Operations + AI Governance
Decision Type: Measurement Governance / Vendor Validation / Cost Accuracy
Date: 2026-02-06

Context

AI customer support vendors often report resolution metrics (e.g., “65% automated resolution rate”).
However, resolution definitions vary widely and can be misleading.

Common vendor metric risks:

  • Counting short conversations as resolved

  • Ignoring reopen events

  • Ignoring downstream human escalation

  • Ignoring customer dissatisfaction

  • Measuring only short-term interaction outcomes

Resolution metrics directly affect:

  • Cost calculations

  • Vendor performance evaluation

  • Automation expansion decisions

  • Executive reporting

Decision

We will implement a Multi-Dimensional Resolution Evaluation Framework consisting of:

  1. Strict Resolution Definition

  2. Time-Based Validation Windows

  3. Quality-Adjusted Resolution Metrics

  4. Customer Outcome Validation

  5. Vendor Metric Normalisation

Vendor-reported metrics will not be used directly without internal validation.

Core Measurement Definitions

Primary Metric — True AI Resolution Rate (TARR)

A case counts as resolved only if:

  • Customer issue is closed

  • No reopen within defined validation window

  • No forced human intervention required

  • Customer satisfaction meets threshold

Formula

True AI Resolution Rate =
Valid AI Resolutions
--------------------------------
Total Eligible AI Conversations

Resolution Validation Windows

Short-Term Window

24–48 hours after resolution
Detects immediate reopens or follow-up contact

Medium-Term Window

7 days after resolution
Detects unresolved issue recurrence

Long-Term (Optional)

30 days for high-risk or high-value flows

Quality-Adjusted Resolution Metrics

Adjusted Resolution Score (ARS)

Adjusts raw resolution rate using:

  • Reopen penalty

  • Escalation penalty

  • Refund / compensation penalty

  • Complaint penalty

Example

ARS =
Raw Resolution Rate
– (Reopen Rate × Weight)
– (Escalation Rate × Weight)
– (Complaint Rate × Weight)

Customer Outcome Validation

Customer Satisfaction Threshold

Resolution only valid if:

  • CSAT ≥ defined threshold
    OR

  • No negative sentiment detected
    OR

  • No complaint generated

Customer Effort Validation

Optional metric:

  • Repeat contact required?

  • Multiple channels used for same issue?

Vendor Metric Normalisation

Vendor claims must be adjusted to internal definitions using:

  • Matching conversation eligibility criteria

  • Matching validation window

  • Matching escalation definitions

  • Matching customer outcome thresholds

Segmentation Requirements

All resolution metrics must be segmented by:

  • Channel (chat, email, voice, social)

  • Customer tier

  • Geography

  • Product or service line

  • Risk category

Resolution Exclusion Rules

Exclude from denominator:

  • Spam / bot interactions

  • Duplicate tickets

  • Non-support interactions

  • Misrouted conversations

Audit Requirements

Every AI resolution must log:

  • Confidence score

  • Risk classification

  • Knowledge sources used (if RAG)

  • Escalation eligibility status

  • Customer sentiment score

Governance Cadence

Weekly

Operational monitoring of resolution quality trends

Monthly

Vendor performance validation

Quarterly

Executive-level automation performance review

Failure Mode Policy

If measurement system fails:

  • Freeze automation expansion decisions

  • Fall back to conservative vendor metrics (flagged as unverified)

  • Trigger data pipeline incident review

Consequences

Positive

  • Prevents inflated automation performance reporting

  • Enables accurate cost modelling

  • Enables fair vendor comparison

  • Supports regulatory and audit review

Negative

  • Requires advanced data instrumentation

  • May reduce headline automation numbers initially

  • Requires cross-team metric alignment

Strategic Risk Acceptance Statement

The organisation acknowledges:

  • Resolution is a business outcome, not a conversation event

  • Vendor metrics will always differ from internal metrics

  • Measurement must evolve as AI capabilities evolve

Board-Level Y-Statement

In the context of evaluating AI customer support performance and vendor claims,
we decided to implement a strict multi-dimensional resolution evaluation framework,
to achieve accurate automation ROI measurement and customer outcome protection,
accepting increased data engineering and analytics complexity.

RedCloud — Extended Strategic + Platform ADR Set

ADR-007 — Open Commerce Network Architecture (Multi-Sided Trust Graph)

Status: Accepted
Owner: CTO + Chief Product Officer

Context

RedCloud is not just a marketplace — it is a trust network across:

  • Brands

  • Distributors

  • Micro-retailers

  • Financial partners

  • Logistics providers

Traditional marketplaces optimise transactions.
RedCloud must optimise trust-weighted trade flows.

Decision

The platform will model trade as a Trust Graph:

Nodes:

  • Trading entities

  • Warehouses

  • SKUs

  • Payment identities

Edges:

  • Trade relationships

  • Payment reliability

  • Fulfilment reliability

  • Data quality contribution

Rationale

In emerging markets:
Trust replaces formal infrastructure.

Trust must be:

  • Computable

  • Observable

  • Monetisable

Consequences

Positive:

  • Enables risk-aware recommendations

  • Enables financial scoring products

  • Enables fraud and anomaly detection

Negative:

  • Requires complex graph infrastructure

  • Requires strong entity resolution capability

ADR-008 — Offline-First Architecture for Emerging Market Connectivity

Status: Accepted

Context

Users operate in:

  • Intermittent connectivity environments

  • Low-cost Android devices

  • Variable power reliability

Decision

Red101 mobile platform will be Offline-First:

  • Local caching of product catalog

  • Local order queuing

  • Deferred sync

  • Conflict resolution engine

Rationale

If the system requires constant connectivity → it fails in core markets.

Consequences

Positive:

  • Higher adoption

  • More reliable order capture

  • Higher data completeness

Negative:

  • Sync conflict complexity

  • Requires distributed consistency model

ADR-009 — Payments Embedded Into Trade Flow (RedPay as Native Rail)

Status: Accepted

Context

Cash-heavy trade causes:

  • Theft risk

  • Settlement delay

  • Credit opacity

Decision

RedPay will be embedded into:

  • Order confirmation

  • Delivery verification

  • Trade credit scoring

Rationale

Payments data is the strongest trust signal in informal economies.

Consequences

Positive:

  • Enables credit scoring

  • Enables working capital products

  • Improves trade velocity

Negative:

  • Regulatory licensing complexity

  • Higher compliance overhead

ADR-010 — Market Intelligence Data Monetisation Boundary

Status: Accepted

Context

Trade data can be monetised via:

  • Demand indices

  • Price indices

  • Supply chain risk scoring

But must protect:

  • Individual retailer identity

  • Competitive distributor data

Decision

Market intelligence products will only use:

  • Aggregated

  • Anonymised

  • Statistically thresholded datasets

Rationale

Trust is the core product.
Data monetisation cannot erode ecosystem trust.

Consequences

Positive:

  • Enables financial partnerships

  • Enables data revenue streams

Negative:

  • Limits raw data monetisation speed

ADR-011 — Trust Score as First-Class Platform Primitive

Status: Accepted

Context

Trust cannot be an afterthought metric.
It must be embedded in:

  • Data

  • Recommendations

  • Payments

  • Identity

Decision

Every core entity will carry Trust attributes:

For Data:

  • Freshness

  • Coverage

  • Source reliability

For Entities:

  • Payment behaviour

  • Fulfilment reliability

  • Trade consistency

Rationale

In volatile markets, confidence > prediction accuracy.

Consequences

Positive:

  • Enables explainable AI

  • Enables safer automation

Negative:

  • Requires heavy metadata infrastructure

ADR-012 — RAG + Signal Verification Chain (Anti-Hallucination AI Architecture)

Status: Accepted

Context

Emerging market trade has high noise:

  • Missing transactions

  • Partial stock reporting

  • Informal substitutions

Decision

All AI assistants must use:

Step 1 — Retrieve canonical data
Step 2 — Verify signal completeness
Step 3 — Generate answer
Step 4 — Attach Trust explanation

Rationale

LLM = Translator of trade signals, not predictor of truth.

Consequences

Positive:

  • Fully auditable AI

  • High regulatory readiness

Negative:

  • Higher latency vs raw generation

ADR-013 — Decision Acceleration vs Decision Automation Boundary

Status: Accepted

Context

Full automation is dangerous in:

  • Cash-constrained retailers

  • Volatile supply chains

Decision

System defaults to:
Decision Suggestion → Human Confirmation → Optional Auto Mode (High Trust Only)

Rationale

RedCloud builds confidence amplification, not blind automation.

Consequences

Positive:

  • Higher adoption

  • Lower catastrophic decision risk

Negative:

  • Slower automation scaling

ADR-014 — Trade Identity and Entity Resolution Strategy

Status: Accepted

Context

Emerging market trade includes:

  • Multiple names per shop

  • Shared phone numbers

  • Informal distributor networks

Decision

Implement probabilistic entity resolution using:

  • Transaction patterns

  • Device fingerprint

  • Payment behaviour

  • Geo clustering

Rationale

Identity = foundation of trust scoring.

Consequences

Positive:

  • Stronger fraud detection

  • Better credit scoring

Negative:

  • Requires continuous model retraining

ADR-015 — Heuristic → ML → Autonomous Progression Model

Status: Accepted

Context

ML without trustable signals reduces adoption.

Decision

AI product maturity ladder:

Phase 1 — Heuristic Insight
Phase 2 — ML Augmented Insight
Phase 3 — ML Recommendation
Phase 4 — Assisted Automation
Phase 5 — Conditional Automation

Rationale

Adoption follows trust curve, not model sophistication.

Consequences

Positive:

  • Faster market penetration

  • Easier explainability

Negative:

  • Slower theoretical AI capability growth

ADR-016 — Regional Model Localisation Strategy

Status: Accepted

Context

Trade behaviours differ drastically by region:

  • Credit norms

  • Inventory risk tolerance

  • Pricing volatility

Decision

Models trained and calibrated per:

  • Country

  • Region

  • Trade cluster

Rationale

Global models erase local signal nuance.

Consequences

Positive:

  • Higher prediction quality

  • Higher adoption

Negative:

  • Higher MLOps overhead

Board-Level Meta Y-Statement

In the context of digitising fragmented global FMCG trade in emerging markets,
we decided to build a trust-first Open Commerce platform combining canonical data, explainable AI, and embedded payments,
to achieve reliable, scalable, and monetisable digital trade infrastructure,
accepting higher platform complexity and slower short-term AI automation velocity.