WID Scientist Gives ‘Tree of Life’ New Computational Roots

Ever since Charles Darwin proposed the “Tree of Life” to explain the theory of evolution and organisms’ relatedness, images of tree branches have been iconic in depicting the uncanny similarities across species.

More recently, researchers have constructed DNA “trees” to represent how genes — and the cellular instructions they create — are related across organisms. Still, scientists have realized that the biological processes that control what genes are activated when are just as important. Drawing meaning from these complicated “networks”  that control genes has been challenging because of the difficulty measuring them.

Today, Sushmita Roy, WID Systems Biology researcher and an assistant professor of biostatistics and medical informatics at UW–Madison, has provided a computational method to strengthen the arboreal metaphor with a new algorithm that helps biologists to compare thousands of gene activity levels among species.

Sushmita Roy
Sushmita Roy

Suitably named “Arboretum,” the algorithm takes what researchers know about how individual genes are related based on their DNA sequence and finds modules — groups of genes associated with a specific behavior or trait — to map out among species. So far, the method shows promise in reconstructing evolutionary theories and origins of organisms’ specific traits in the past.

“Only very recently, maybe in the past five or 10 years, we’ve realized that this ‘dark matter’ is all fundamentally important in controlling a lot of functionality in the genome.”

— Sushmita Roy

Developed during Roy’s time as a postdoctoral scientist at the Broad Institute, the algorithm proved effective in tracking the evolution of heat stress responses across species of yeast. The work recently appeared as a paper in the journal Genome Research.

Most research until now has focused on the coding part of the genome, or the genes producing proteins that carry out specific actions within cells. But scientists like Roy are looking at the part of the genome not typically involved in making proteins, rather more involved in controlling when and where they should be made.

Like “dark matter” in the universe, this regulatory genome is often called the “dark matter” of genetics.

“Only very recently, maybe in the past five or 10 years, we’ve realized that this ‘dark matter’ is all fundamentally important in controlling a lot of functionality in the genome,” Roy says. “Before, we were only focusing on genes that make proteins. Only now are we realizing that there is a lot of important information that is stored in the non-coding parts.”

Roy says her work draws inspiration from the greater mysteries of biological networks: How the delicate interplay among a cell’s molecules shapes a species’ interaction with its environment and ability to adapt and survive.

“You can never really measure the network, but what you can measure is the output of the network — in this case gene expression,” she says. “We’re reverse engineering these hidden networks by searching for the best network that could have produced these outputs.” Roy’s next challenge is to transition from “modules” to “networks.”

With Arboretum now in action, Roy is teaming up with WID and Morgridge Institute for Research collaborators in the Discovery building to examine properties in other biological networks. Current projects examine patterns that could help scientists understand why some cells turn into stem cells or change into cancerous cells.

That next pivotal step, Roy says, would be the ability to tweak networks to test outcomes, illuminating the complex, “black box” nature of biological networks.

Marianne English