Questions to ask before you can start building Multi-Agent Systems (MAS)
Stakeholders and Senior Management
• What is the specific business domain or problem we are solving? Management must decide if the system should be a "generalist" or a "Vertical Agent"—a highly specialised AI worker optimised for domain-specific tasks like finance, healthcare, or compliance.
• What is the budget for security and governance? The sources suggest that enterprise-grade MAS often requires a 20–30% budget allocation specifically for security and compliance.
• What are the required compliance frameworks? Decision-makers must define which regulatory standards the agents must adhere to, such as GDPR, SOC2, HIPAA, or ISO 42001.
• What is the expected ROI? Management needs to evaluate if the potential for reduced downtime (e.g., 42% in manufacturing) or increased efficiency justifies the deployment costs.
Product Managers (PMs)
• What is the evaluation dataset? PMs must define the specific questions users will ask to create an evaluation dataset, which is the cornerstone for designing and testing the system.
• How should we partition the data logically? PMs must decide how to break down databases (e.g., separating Finance, HR, and Marketing) so that a router-based system can reduce the search space and improve accuracy.
• Who should have access to what information? PMs must define the Role-Based Access Control (RBAC) policies to prevent sensitive documents from being available to all users within a vector database.
• When should a human step in? PMs need to design Human-in-the-Loop workflows for high-stakes decisions, approvals, or when agents encounter decision conflicts.
Technical Architects and Engineers
• Which orchestration pattern is most appropriate? Architects must choose between Centralised (best for control/governance), Decentralised (best for scale/fault tolerance), or Hierarchical architectures.
• Which communication protocol will be used? The team must decide on standards like Google’s Agent-to-Agent (A2A) Protocol for interoperability or Anthropic’s Model Context Protocol (MCP) for data connections.
• How will we manage memory at scale? Engineers must design a system that balances short-term memory (for immediate tasks) and long-term memory (for persistent context) without overwhelming the system's context window.
• How do we resolve decision conflicts? Technical teams need a protocol for when agents produce contradictory outputs, such as using a majority vote, a meta-agent reviewer, or human escalation.
• What is the observability strategy? Since agents are non-deterministic, architects must implement distributed tracing (e.g., Jaeger) and centralised logging to monitor health and detect anomalies.
Security and Compliance Officers
• How do we defend against novel threat vectors? Officers must plan mitigations for prompt injection, memory poisoning, and model poisoning.
• How do we ensure PII protection? The system must have built-in mechanisms for PII detection and masking before data is processed by the LLM or stored in logs.