Tools for Discovery is a monthly profile series that inspects the computer programs, gadgets and methods behind WID’s ideas and discoveries.
What do you work on at WID?
I am interested in understanding information processing in cells and the regulatory networks that control it. Living cells are complex and function via an intricate network of many molecular entities (such as genes, proteins and metabolites). These networks are central to information processing in cells.
But what do I mean by information processing? All cells must know what to make when, and cells do this through processing extracellular and environmental cues to make decisions such as what genes need to be activated, what proteins need to be made and so on. All these decisions then determine the overall state of cell: for example, a “normal” versus a “disease” state. Beyond examining networks from cells in an individual species, I am interested in design principles of regulatory networks that govern network architecture in living cells. To get to this question, we need to be able to identify and compare networks across multiple organisms.
There are three high-level questions I am interested in, including what are the networks in a specific condition, how a network actually changes the dynamics of other networks, and how these networks affect the overall behavior of an organism. These are very difficult problems because the networks are hard to measure and must be “reverse engineered.” To address these questions, I am using methods from statistical machine learning, adapting them to reconstruct and analyze networks in living cells. For instance, we’ve looked at the networks activated in yeast stress response during cellular differentiation and also across evolutionary time scales.
Tools for analysis?
Most of my work entails developing, implementing, and applying statistical and machine learning methods on genomic datasets. These analyses are driven by key biological questions my collaborators and I are interested in.
Most of my serious code development happens in C++, and I use MATLAB to test existing implementations of algorithms or visualize matrices. To do high-throughput computing, I use HTCondor. To visualize results, usually from a network or a matrix, I have scripts in MATLAB and awk scripts in Cytoscape, which is a powerful network visualization software.
“A picture speaks a thousand words, and the ability to combine art and science to communicate ideas can be very effective.”
— Sushmita Roy
Tools for writing?
I use TexShop to write technical papers. Sometimes, for documents that do not have a lot of equations, I use Word. To manage references I use citeulike, where I have my online citation references, and I have recently started using Papers. I keep going back and forth between Papers and citeulike, which is a terrible idea because the two libraries are a bit out of sync. I think citeulike is good for scouring the web for articles of interest, and Papers is good when you are working on a manuscript and need an easy way to create a bibliography. I also use Google Docs to share ideas and track progress on projects I’m working with multiple people. We have started to experiment with Elog, and I was quite pleased with the initial reports a postdoc in my lab created. I think this might be what we will use in the long run.
Tools for collaboration?
Dropbox, Skype, caffeine, Google Docs, Elogs that enable shared editing. I have a blog called “Just Compiling,” and I’d like people to actually use that as a way to share thoughts. We also have shared lab meetings, group meetings and group discussions with [other WID colleagues] Rupa Sridharan’s and Kris Saha’s groups. We also have theme-wide group meetings, where we talk about our research in formal and informal ways. To share results with collaborators, I usually upload results off my website.
Your ultimate tool for discovery?
If I had to pick one, I think a tool that would enable me to make targeted perturbations — or small changes — in living systems and measure overall system behavior would be very useful. These changes would come from a predictive model that prioritizes each experimental change based on its expected outcome. Such predictions could be tested and provide real-time feedback on whether the model is working well. A tool that delivers such targeted perturbations would really close the gap between the experimental and computational sciences in understanding living systems. It would really accelerate scientific discovery.
If I had to pick a second tool, I would like an “e-artist” that takes a concept described as a hand sketch or even words and make an illustration out of it. A picture speaks a thousand words, and the ability to combine art and science to communicate ideas can be very effective.