Lauren Chaparro and I were honored to be among the speakers at Strata RX 2012, O’Reilly’s conference on the use of big data in health care/medical field. Our talk was called “Doing Big Data All By Yourself: Interactive Data Driven Decision Making by Non-Programmers“.
I gave the first half of the talk, delving into the stark realities of big data implementations. The second half of the talk featured Lauren (a Palantir Forward Deployed Engineer on the health applications team) giving a live demonstration of workflows in a system that we recently implemented on top of Palantir Gotham, one of our data fusion platforms.
There is a lot of excitement in the marketplace around the availability and capability of big data processing technologies. But most of these technologies are building blocks, not complete solutions. Before health care providers can realize the promise of big data, they need to overcome at least three major challenges associated with implementing these technologies:
- Systems Engineering. The integration work required to deploy this technology at scale is non-trivial.
- Data Science. Someone needs to design the statistical analysis that can derive actionable insights from the raw data.
- User Experience Design: Big data implementations require a user interface that makes it easy for subject matter experts to ask questions of the data.
In the health care space, the end users are clinicians and researchers. Many big data implementations, however, feature interfaces best suited to data scientists and programmers. As a result, this technology is mostly used for doing aggregation and static dashboarding. This is a good start, but it falls short of the goal of putting the power to learn from big data into the hands of clinicians.
The punchline here is that the scarce resource in the big data domain (regardless of vertical) is talent – the talent to (a) do the complex system engineering and data science necessary to derive insights from data and (b) build the last mile of familiar, expressive, and interactive interfaces needed to truly take advantage of all that the data has to offer.
The second half of the talk focused on work we did in association with Center of Public Integrity. We put together a Palantir Gotham instance that integrated anonymized data from Medicare and various other data sources to show the potential of a fully integrated, interactive system.
The datasets involved were:
- Medicare data representing 100 million claims, 1 billion medical procedures, 30 million individual beneficiaries, and 700,000 physicians.
- Data from the National Plan Provider Enumeration System, used to standardize identifiers across payers and providers.
- Data from the Dartmouth Atlas Project – a well-curated collection of hospital-specific performance data.
- Data from PubMed, representing 22 million biomedical journal articles.
- Data from the Department of Health Human Services Office of the Inspector General composed of entities excluded from participation as Medicare providers due to past fraudulent behavior.
- Data from the US Census showing demographic trends across the country
Since patient privacy concerns are paramount with this sort of data, Lauren Chaparro used simulated rather than real data to give a live demonstration of how different subject matter experts could perform a number of different workflows inside a single system:
- Policy makers can explore answers to high-level questions around the supply and demand of hospice care given the reality of an aging population.
- Investigators can detect patterns of fraud in billing for hospice care and find individual bad actors.
- Doctors can look into the optimal treatment options and providers for a given patient.
We believe answering questions like those addressed in this demo is key to curbing waste, fraud, and abuse in our nation’s healthcare system and improving healthcare delivery. Through the integration of a variety of datasets at massive scale, our software can empower insurers, policy-makers, and physicians to pursue these kinds of hypotheses and derive actionable insights today, without turning to data scientists.