A new study of the model used to distribute resources and care to over five million US veterans reveals that the algorithm became less accurate over time at identifying high-risk patients and created more than 18,000 "false alarms" that drained resources during the COVID-19 pandemic.
The research adds that the US Veterans Health Administration (VA) experience should be a warning sign to health systems that are quickly adopting artificial intelligence (AI)-informed models to keep a sharp focus on "model drift" that can upend resource allocation or decision-making.
Researchers analyzed performance data from the VA's nationally deployed population health model, the Care Assessment Needs (CAN) algorithm, which predicts 90-day hospitalization and mortality risk for over 5 million veterans annually.
The study, published in JAMA Health Forum, tracked the algorithm's performance from 2016 to 2021, covering both pre-pandemic operations and the COVID-19 disruption period.
The findings reveal significant performance deterioration. Over the five-year study period, the algorithm's ability to correctly identify high-risk patients declined by 4.0%, while its overall performance score dropped by 4.6%. The model also generated 0.34% more false alarms, resulting in approximately 18,300 additional patients being incorrectly flagged as high-risk when they were actually low-risk.
Most concerning was the acceleration of performance decline during the pandemic period. The algorithm's positive predictive value showed marked deterioration between 2019 and 2020, coinciding with systematic changes in healthcare delivery patterns.
Model Drift and AI Warnings
The research identified two primary drivers of algorithm drift.
First, fewer veterans were actually being hospitalized or dying—the rates dropped from 3.8% to 3.0% over the study period.
Second, the data that the algorithm relies on to make predictions changed significantly in 19 important areas, especially in patient demographics, how veterans used healthcare services, and lab test results.
The pandemic's impact was particularly pronounced. "Covariate shifts were observed in 19 covariates, with demographic characteristics, health care utilization, and laboratory covariates exhibiting the largest shifts." These shifts reflected COVID-19's disruption of normal healthcare patterns, including changes in telehealth usage, hospital admissions, and routine care delivery.
The authors say the study demonstrates how algorithm drift, especially during a period of increased AI adoption in healthcare models, undermines quality measurement reliability.
"Close surveillance of clinical risk algorithms and quality metrics derived from algorithm-generated risk scores could mitigate suboptimal resource allocation or decision-making," the research says.
The accompanying commentary emphasizes broader policy implications, noting that model drift represents a key challenge for healthcare AI deployment. The authors recommend establishing clear governance frameworks, including minimum performance monitoring standards and regular auditing requirements.
Organizations should implement robust monitoring frameworks that track both statistical performance metrics and operational outcomes. The study suggests different mitigation strategies based on drift characteristics: recalibration for gradual changes versus complete retraining for substantial shifts in underlying relationships.