Let’s talk about the language we are made of. Your DNA is basically a four-letter alphabet made up of A, T, C, and G. Researchers at the University of Oregon found a way to use artificial intelligence to read these genetic codes. It works a lot like how a chatbot reads text.
Reading DNA With AI


Andrew Kern and his lab looked to AI to make sense of mutations in DNA. They took an older machine learning architecture, GPT-2, and trained it on simulations of genetic evolution across different species like bacteria, rodents, mosquitoes, and primates.
“We can’t repeat evolution, so one of the key workflows we have is developing simulations,” said Kevin Korfmann, lead author of the study. “The simulations mimic evolutionary processes, and then we use the outcomes as training data for our deep learning models.”
The tool looks at mutations in the genetic code to trace genes back to their last common ancestor.
“Advances in generative AI and the architectures behind them are potentially useful to a number of fields outside a chatbot,” said Andrew Kern, an Evergreen professor of biology.
In tests, the AI model performed just as well as classical statistical methods. While traditional math-based methods can take hours or even days to decode a single mosquito chromosome, the new tool does it in minutes.
Advertisement
“Compared to classical inferential approaches, the AI tool doesn’t have to reason about every mutation individually,” Korfmann added. “It just reads the patterns because all of the expensive statistical work was done up front, during training, which sidesteps the bottleneck.”
“You never really know what’s going to work when you’re essentially borrowing techniques from a totally different world and applying them to a new problem,” Kern said. “But this was a case where things worked really well.”
Impacts on Disease Control
This kind of tool can help scientists figure out when species developed certain traits or when disease-resistance genes emerged. Take malaria, for example. For years, scientists used insecticides to control mosquito populations until the mosquitoes started developing resistance.
Kern explained, “Insecticide resistance is being observed in all of these mosquito populations today.”
“A major challenge in preventing the spread of malaria has been understanding the evolution of insecticide resistance,” Kern added. “Now, we can go in with our AI model, ask how long ago these resistance genes arose in the population, and learn about the evolutionary history of this critical carrier of malaria.”
Furthermore, the model works with incomplete DNA datasets, which is a common problem for researchers. Looking ahead, Kern and Korfmann want to use machine learning to build full genealogical trees across multiple lineages.
“There’s so much going on in the machine learning field that we haven’t applied yet in our field,” Korfmann noted. “There’s tons of translational work to do to get these novel algorithms working in biology.”


