The Digester

AI reading of H&E slides is often biased by linked biomarkers, grade and mutation burden

Mar 3rd 2026

A multicohort study of 8,221 cancer patients shows that deep learning models predicting molecular biomarkers from routine H&E whole-slide images frequently learn composite signals tied to other mutations, tumor grade or mutation burden, causing biased subgroup performance and limited generalizability and calling for bias-aware evaluation and causal methods.

  • Models can show high overall accuracy but their AUROC often drops substantially within subgroups defined by co-occurring or mutually exclusive biomarkers.
  • Interdependencies among biomarkers, histological grade and tumor mutational burden act as confounders that models exploit as proxies for the target biomarker.
  • Association patterns between biomarkers and confounders vary across datasets, so external validation can be misleading about real-world generalizability.
  • For several targets a simple classifier based on pathologist grade approaches the performance of deep learning models, limiting their added clinical value.
  • The authors propose bias-aware evaluation using stratified metrics, permutation testing and comparisons with clinical baselines to detect reliance on confounders.
  • Until dependency-aware, causal approaches and richer datasets are developed, WSI-based predictors are best used for triage, screening or research support rather than replacing molecular tests.

Sources

nature.com