AI Evaluation Metrics - Recall (Sensitivity)
Definition:
Recall measures the model’s ability to identify all relevant positive cases—how well the AI captures all true instances it should recognize (e.g., symptoms, relevant health concerns).
Guide for Compliance Team and Engineers:
Purpose:
Ensure the chatbot does not miss important health symptoms or relevant user intents, which could lead to underdiagnosis or failure to advise appropriately.
For Compliance Team:
Define Recall Targets: Set recall thresholds (e.g., ≥ 80% for wellness, ≥ 95% for pharmacies) based on risk tolerance and regulatory expectations.
Critical Case Oversight: Monitor cases where AI misses important symptoms or advice, especially in pharmacy contexts with higher risk.
Incident Review: Establish procedures to review and escalate missed cases flagged by users or healthcare professionals.
Compliance Documentation: Include recall performance data in compliance reports and maintain evidence of ongoing improvements.
For Engineers:
Data Diversity: Train on diverse datasets covering a broad range of symptoms and user inputs to maximize recall.
Balanced Optimization: Carefully balance recall with precision to avoid excessive false positives while minimizing misses.
Test with Edge Cases: Create and maintain test cases with rare or complex symptoms to stress-test recall.
Alerting: Set up alerting for sudden drops in recall metrics in production environments.
Continuous Improvement: Use user feedback and flagged missed cases to retrain and improve the model iteratively.