DNA sequencing has generated an abundance of critical genetic information. Specifically, next-generation sequencing (NGS) has contributed to novel medical research, as shown in its role in the rapid global monitoring of the SARS-CoV-2 genome. However, the massive data volume in databases has made it incredibly difficult to search through it efficiently.
Until now, scientists required massive computing power to download and sift through entire datasets. As a result, making large-scale genetic comparisons became slow and costly.
A Search Engine For DNA and RNA

Computer scientists at ETH Zurich solved this data roadblock with a digital tool called MetaGraph. According to researchers, it functions like a DNA search engine. Rather than relying on incomplete metadata or time-consuming downloads, the tool does a full-text search on raw DNA or RNA sequences.
As one of the data scientists, Professor Gunnar Rätsch, simply said, “It’s a kind of Google for DNA.” Researchers can now enter a sequence of interest and instantly learn where it has appeared.
MetaGraph’s data management is key to its speed. Researchers say the tool indexes the data and compresses it using complex mathematical graphs, similar to a book summary that outlines the major plot points with limited words. Compressing the data allows entire public sequences to fit onto a few computer hard drives.
Dr. André Kahles, a member of the Biomedical Informatics Group at ETH Zurich, said, “We are pushing the limits of what is possible in order to keep the data sets as compact as possible without losing necessary information.”
According to ETH Zurich researchers, efficiency directly translates to cost savings. They say larger queries cost around $0.74 per megabase, which is a unit of measurement to measure the length of DNA or the genome. Because it’s also scalable, the MetaGraph requires significantly less additional computing power as searchable data continuously grows.
Biomedical research’s impact is significant. For example, researchers believe Metagraph is poised to accelerate research into antibiotic resistance by quickly identifying resistance genes or searching databases for useful viruses that target and destroy bacteria. It’s also a potent tool for tracking minimally researched pathogens.
Researchers say MetaGraph is already available as an open-source platform. According to Professor Rätsch, the team aims to index the remaining half of the world’s sequence data by the end of the year. As a result, this would make the entire global genetic library immediately searchable.