Michael Gao
Data Can Help Us Solve Healthcare Challenges…
As a core member of DIHI’s Data Science team, I have been both fortunate and unfortunate enough to have worked with electronic health record (EHR) data. Fortunate in that the potential for EHR data to revolutionize the way that healthcare is delivered and optimized is truly exciting. Unfortunately, in their current state, EHRs are messy and take an enormous effort to extract value. But, the rewards for these efforts have the potential to meaningfully impact today’s patient care challenges. Specifically, our team has coupled EHR data with machine learning models to try to predict readmissions, recommend palliative care consults, and detect sepsis in the hospital before it occurs. And this is just the tip of the iceberg; we are working to expand our capabilities each and every year.
At their core, each of these machine learning models works by finding patterns in the data. The machine learning algorithms we employ are able to sift through a vast quantity of Duke patient data to find signs that may help predict future events. Although these methods are immensely powerful, they are inherently limited by the quality of the data. The saying “garbage in, garbage out” holds true in this setting. If you develop models with messy data, you’ll get messy results. This means that the integrity of our EHR data and the way that we structure it directly affects the impact that our sophisticated technological approaches have on informing clinical care.
…But Only If We’re Willing to Work With It
In addition to the problems of missingness (the manner in which data are missing from a population sample), lack of structure, convoluted relationships, and lack of standard entry mechanisms, the EHR is a dynamic and living system. Every week, data gets entered into our EHR that does not conform to anything that we have seen previously. Perhaps a new medication has just been put into practice, or maybe a new lab test has been ordered for the first time. However, due to the unstructured nature of the data, even a new dosage for an existing medication or a new name for the same laboratory test can look different once it gets entered into the record. How, then, are we supposed to develop algorithms that are robust to these changes? The answer, we believe, lies in monitoring the data.
To use a concrete example, let’s say that one data element that is in use in one of our models is a blood culture. The first step in creating the machine learning model is to combine all of the different ways that blood cultures are represented in the system so that the next time the model sees data on a blood culture, the model knows how to identify it. However, what if the name of a blood culture changes in the system? In fact, we can tell through retrospective analysis that this exact scenario has happened, and if it were to happen in our sepsis model, it may drastically affect its performance.
Learning and Progression
Here at DIHI, we believe that in order to bring cutting-edge technology research into healthcare, we have to borrow best practices from other industries. The problem of monitoring data is not something new — many other industries have been tackling this for years. One of these industries which we have used as inspiration is quantitative trading. At its core, quantitative trading is all about trying to predict changes in time series data, such as a stock price of a company over time. If you picture the quantity of laboratory tests and medications ordered over all of Duke Health on a daily, weekly, and monthly basis, you might see that the resulting graphs look exactly like stock prices. There are some days where we might order more of one medication over another, and intra-weekly trends are easily noticeable. In addition, we might see trends that operate over larger time scales; maybe the institution makes a decision to consolidate laboratory providers and the names of laboratory values provided by that provider spikes suddenly. Using quantitative trading, signal processing, and other algorithms, our goal is to detect these changes as they occur and to take appropriate action. If we can catch when these anomalies occur, and get to the root of the problem, we can make sure that our models are robust to inherent dynamics of our EHR.
As with everything we do, we don’t want to just stop there. Our data science team is constantly thinking of new ways to make sure that we are being responsible with new technologies that are making their way into the care delivery setting. Monitoring trends in data is just one of the many methods we employ to ensure the success of our pilot programs, many of which involve making predictions that can ultimately save the lives of our patients. The mission for the Duke Institute for Health Innovation has always been to catalyze change within Duke Health. I believe that the problems that our team have tackled and continue to tackle will ultimately lay the groundwork for a health system that is able to leverage the promise of machine learning, artificial intelligence, and all manners of cutting edge technology to deliver better quality and more efficient care to our patients.