David Page remembers the stress he was under as a young assistant professor trying to juggle various responsibilities like establishing a new research program and planning lectures. This continuous stress landed him in the hospital with atrial fibrillation, a form of cardiac arrhythmia or irregular heartbeat; he was only in his early 30s.
“It sure would have been nice to know that was coming because I could have taken a week’s vacation, or cut out caffeine for a while and then I probably wouldn’t have had it,” says Page, who is a Professor of Biostatistics and Medical Informatics at UW-Madison, and a Discovery Fellow. “That I didn’t have any warning inspires me to try and let people know what’s ahead, medically speaking, and make a difference in their lives.”
To predict who may be at risk for medical conditions, such as heart attacks, Page uses Electronic Health Record, or EHR, data to ‘train’ what are called supervised machine learning algorithms.
“You could define learning as improving your performance in a task over time through experience,” says Page. A supervised learning algorithm is first trained on a dataset where known variables produced specific outcomes. For example, think of a spreadsheet where each patient’s data are in a row with variables like weight and blood pressure in the columns. Based on existing information about which patients have previously suffered heart attacks, supervised machine learning algorithms can be trained to predict when new patients are at risk for heart attacks in the future.
Of course, EHRs are more complex than a single spreadsheet. Most of the data in the real world are in relational databases; this means, for example, that real patient data are distributed over many tables, or spreadsheets, based on categories such as diagnoses, lab results, and prescriptions. Every time a patient comes in to see a doctor there are one or more entries made in several of the tables for that patient.
“But almost all machines learning algorithms assume that the data are in a single table,” says Page, “and there is this big mismatch between what machine learning algorithms are designed for and what a lot of the world’s real data look like.” So Page has been working on algorithms that can handle the more complex relational databases. These algorithms are now being tested in several biomedical areas such as clinical trials and toxicity studies.
In fact, Page is collaborating with a number of researchers at UW-Madison, such as James Thomson, Bill Murphy, and Michael Schwartz to develop a machine learning algorithm that can predict whether certain chemicals are potential developmental neurotoxins.
“Anytime anybody has an interesting biological problem where machine learning might be useful, I get excited and want to work on it.”
Page has been training a machine learning algorithm by exposing a laboratory-engineered neural tissue system – that somewhat mimics the developing brain – to known neurotoxins and non-toxins, and monitoring changes at the molecular level. The idea is to “use this computer model to predict if a new, previously untested compound is neurotoxic or not,” says Page. He envisages such an algorithm being used by various industries, such as pharmaceutical companies, and also by environmental agencies like the Department of Natural Resources or the Environmental Protection Agency.
Looking ahead, the goal is to develop accurate predictive models that are also causally faithful, says Page. “What that means in the context of EHR data, for example, is while I want to predict whether I will have a heart attack next month, I also want to be able to predict how taking an aspirin everyday would change the risk of heart attack.”
Recent developments of high throughput techniques have transformed the field of machine learning, according to Page, bringing personalized causally faithful algorithms within reach. “In the next five years I think we are going to reach the point where everyone can have their whole genome sequenced, if they want, and have it be part of their medical record,” he says. That would open up new opportunities to train and use machine learning algorithms, possibly to further the nascent field of precision medicine.
For someone whose research is focused on predicting the future, Page is thrilled with the transdisciplinary direction in which his work is headed. “Anytime anybody has an interesting biological problem where machine learning might be useful, I get excited and want to work on it!” he says. At UW-Madison, Page has the opportunity to work with collaborators from all over campus, across the state with organizations like the Marshfield Clinic, and around the world.
Page thinks becoming a Discovery Fellow at the Wisconsin Institute for Discovery will provide increased opportunities for him to work with an interdisciplinary group of researchers. “I look forward to increasing collaborations with other researchers from different fields, while maintaining the relationships I already have,” he says.