AI Evaluation Metrics - Latency
Definition:
Latency is the time the AI system takes to respond to a user’s query or request.
Guide for Compliance Team and Engineers:
Purpose:
Ensure timely responses to maintain a good user experience and, for pharmacies, enable quick access to critical health information.
For Compliance Team:
Set Latency Targets: Define maximum acceptable response times (e.g., ≤ 2 seconds for wellness; ≤ 1 second for pharmacies).
Monitor User Impact: Collect feedback on response times and its effect on satisfaction or safety.
Compliance Checks: Ensure latency requirements meet regulatory guidelines for urgent health advice, where applicable.
For Engineers:
Performance Optimization: Optimize model size, inference code, and infrastructure to reduce response times.
Scalable Architecture: Use scalable backend infrastructure (e.g., auto-scaling servers, CDN) to handle peak loads.
Caching Strategies: Cache common queries or partial results to speed up frequent requests.
Latency Monitoring: Implement real-time monitoring with alerts for latency spikes.
Fallback Procedures: Design graceful degradation or “please wait” messages if processing exceeds limits.