Questions to ask before you can start building Multi-Agent Systems (MAS)

Stakeholders and Senior Management

What is the specific business domain or problem we are solving? Management must decide if the system should be a "generalist" or a "Vertical Agent"—a highly specialised AI worker optimised for domain-specific tasks like finance, healthcare, or compliance.

What is the budget for security and governance? The sources suggest that enterprise-grade MAS often requires a 20–30% budget allocation specifically for security and compliance.

What are the required compliance frameworks? Decision-makers must define which regulatory standards the agents must adhere to, such as GDPR, SOC2, HIPAA, or ISO 42001.

What is the expected ROI? Management needs to evaluate if the potential for reduced downtime (e.g., 42% in manufacturing) or increased efficiency justifies the deployment costs.

Product Managers (PMs)

What is the evaluation dataset? PMs must define the specific questions users will ask to create an evaluation dataset, which is the cornerstone for designing and testing the system.

How should we partition the data logically? PMs must decide how to break down databases (e.g., separating Finance, HR, and Marketing) so that a router-based system can reduce the search space and improve accuracy.

Who should have access to what information? PMs must define the Role-Based Access Control (RBAC) policies to prevent sensitive documents from being available to all users within a vector database.

When should a human step in? PMs need to design Human-in-the-Loop workflows for high-stakes decisions, approvals, or when agents encounter decision conflicts.

Technical Architects and Engineers

Which orchestration pattern is most appropriate? Architects must choose between Centralised (best for control/governance), Decentralised (best for scale/fault tolerance), or Hierarchical architectures.

Which communication protocol will be used? The team must decide on standards like Google’s Agent-to-Agent (A2A) Protocol for interoperability or Anthropic’s Model Context Protocol (MCP) for data connections.

How will we manage memory at scale? Engineers must design a system that balances short-term memory (for immediate tasks) and long-term memory (for persistent context) without overwhelming the system's context window.

How do we resolve decision conflicts? Technical teams need a protocol for when agents produce contradictory outputs, such as using a majority vote, a meta-agent reviewer, or human escalation.

What is the observability strategy? Since agents are non-deterministic, architects must implement distributed tracing (e.g., Jaeger) and centralised logging to monitor health and detect anomalies.

Security and Compliance Officers

How do we defend against novel threat vectors? Officers must plan mitigations for prompt injection, memory poisoning, and model poisoning.

How do we ensure PII protection? The system must have built-in mechanisms for PII detection and masking before data is processed by the LLM or stored in logs.