AI Evaluation Metrics - Latency

Definition:
Latency is the time the AI system takes to respond to a user’s query or request.

Guide for Compliance Team and Engineers:

Purpose:
Ensure timely responses to maintain a good user experience and, for pharmacies, enable quick access to critical health information.

For Compliance Team:

  • Set Latency Targets: Define maximum acceptable response times (e.g., ≤ 2 seconds for wellness; ≤ 1 second for pharmacies).

  • Monitor User Impact: Collect feedback on response times and its effect on satisfaction or safety.

  • Compliance Checks: Ensure latency requirements meet regulatory guidelines for urgent health advice, where applicable.

For Engineers:

  • Performance Optimization: Optimize model size, inference code, and infrastructure to reduce response times.

  • Scalable Architecture: Use scalable backend infrastructure (e.g., auto-scaling servers, CDN) to handle peak loads.

  • Caching Strategies: Cache common queries or partial results to speed up frequent requests.

  • Latency Monitoring: Implement real-time monitoring with alerts for latency spikes.

  • Fallback Procedures: Design graceful degradation or “please wait” messages if processing exceeds limits.