The Digester

Evo 2: Open source AI trained on trillions of DNA bases reads complex genomes

Mar 5th 2026

Researchers released Evo 2, an open source genome model trained on an 8.8 trillion-base dataset that recognizes regulatory DNA, splice sites, protein features, and predicts mutation effects across bacteria, archaea, and eukaryotes.

  • Evo 2 was trained in two stages with the StripedHyena 2 architecture using short 8,000-base and long 1,000,000-base contexts.
  • Two models were released: a 7 billion parameter model trained on 2.4 trillion bases and a 40 billion parameter model trained on the full 8.8 trillion-base OpenGenome2 dataset.
  • The OpenGenome2 release and all code and model parameters are public, while viruses that infect eukaryotes were excluded to reduce misuse risk.
  • Evo 2 learned to identify protein coding regions, intron boundaries, regulatory sequences, protein structural motifs, mobile genetic elements, and species-specific genetic codes.
  • In zero-shot tests Evo 2 detected the impact and severity of single-base mutations and improved BRCA2 variant evaluation after additional training.
  • Design tests were mixed: Evo 2 produced gene-like sequences and cell-type specific regulatory DNA but biological validation was limited, with 17 percent of designed regulatory sequences showing twofold activity differences.