From Hours to Milliseconds
Moving from SQL to Elasticsearch allowed Influence Health to run the same queries that used to take hours in 7-8 milliseconds.
Meeting HIPAA Requirements
The ability to encrypt data in transit with the security features of X-Pack allowed Influence Health to confidently and easily meet HIPAA-related security and privacy requirements.
Ensuring Success with Elastic Support
Influence Health leverages their relationship with Elastic support to accelerate product development and make sure their Elasticsearch cluster is optimized for success.
Removing Burden off Key Engineers
By moving off SQL, Influence Health's data services team can focus on supporting the company's infrastructure instead of serving as human-to-SQL translators.
Company Overview
Influence Health's consumer engagement platform helps healthcare organizations match current and potential patients with specific services they need most, and reach out to them through targeted email, mobile, social, search, and direct mail campaigns. This helps them prevent readmissions for health issues that can be treated, or even see patients before they get sick.
As of 2016, 250 clients use Influence Health's products in 46 states and multiple provinces in Canada, which represents 1,100 hospitals managing more than 80 million patient records.
Powering the Search for Better Healthcare
With a tagline of “Changing Healthcare. Changing Lives,” Influence Health wants to help individuals and families be wiser, healthier, and stronger by making healthcare more efficient, and in turn, more cost effective. They do this through their consumer engagement platform, which helps healthcare organizations make the most out of their data by allowing them to match current and potential patients with specific services they need most, getting them to hospitals and doctors at the right time.
When their SQL database was preventing customers from having the ability to quickly find patients based on specific criteria, they turned to Elasticsearch – transforming a two-week process that involved custom, handcrafted SQL queries into Elasticsearch queries that take 7-8 milliseconds to run. And, through their Elastic subscription, they’re able to keep sensitive patient data secure and meet HIPAA requirements through X-Pack, as well as utilize the Elastic support team to optimize their project.
Influence Health's Journey with Elastic
The Who: Nathan Stott Joins Influence Health
A self-described “database geek,” Nathan Stott, now Vice President of Architecture and Operations, joined Influence Health in 2014 after consulting the company for a few years. Part of what drew him there was the technology transition they were going through.
Influence Health was starting to make some bold moves, innovating very quickly,” Nathan recalls. “They had just moved all of their code from TFS to GitHub. They started using Node.js to supplement their C# infrastructure. I could see that things were changing. It was the type of place I wanted to be.
The What: Encountering SQL Slowness
The first project Nathan worked on at Influence Health was revamping the company's marketing segmentation tool, called Audience Insights, which allows Influence Health's customers to create lists of current and potential patients based on specific criteria in order to run targeted campaigns on services they may need.
When you're building these lists, it's a process. You can have up to 40 facets that you go through, which include a variety of socio-demographic factors – encounters patients have had in a hospital, medical codes that represent different diseases, age, gender, income, location.
At the time, Influence Health didn't have an easy way for its customers to identify subsets of their patients that may need specific healthcare services. Their consumer engagement platform was built on a SQL database, which meant every time a customer wanted to build a list for a campaign, Influence Health's data services team had to do an offline list pull by building a custom SQL query – which would take hours to handcraft, and an additional hour or two to run. They ultimately sent pivot tables to customers for approval which could result in additional revisions to the query – a process that could take up to two weeks.
By revamping Audience Insights, not only would this free up Influence Health's data services team to spend time on their main focus – continuing to develop the product – but it also ties into the company's overarching goal of making healthcare more efficient and cost effective. By enabling customers to easily identify patients that need specific services, they are able to get people in for preventative care and more efficiently use hospital resources by preventing readmissions, amongst other goals.
Influence Health started researching SQL alternatives. They looked into MongoDB, but the way Mongo did its sharding and indexing did not provide the real-time aggregation they needed, with queries taking up to 8 seconds.
Based on his past experience, Nathan thought Lucene would be a better fit, and began exploring the current state of the ecosystem. That's when he found Elasticsearch, and after checking out its RESTful APIs and how easy they made everything, he immediately developed a Proof of Concept (POC) to compare it to MongoDB. Nathan discovered it was much easier to make the queries in Elasticsearch. Plus, they were performing a lot better.
We evaluated lots of different technologies, such as MongoDB...But ultimately, Mongo didn't support the real-time queries our business required. Elasticsearch gave us a lot of capabilities out of the box to get the speed we needed.
The Why: Meeting HIPAA and Moving Fast
After the POC showed that Elasticsearch was the right choice, the project moved into development – and more of Influence Health's engineers started getting familiar with Elasticsearch. They also began to explore how to meet HIPAA-related security and privacy requirements.
These factors led Nathan to contact Elastic to get more information on what's included in Elastic subscriptions. After learning about the ability to encrypt data in transit with the security features in X-Pack and the deep expertise and partnership the Elastic Customer Care team provides beyond traditional break-fix-only support, Influence Health knew an Elastic subscription would give them the additional product functionality they needed to meet HIPAA-related security requirements, as well as help them accelerate the development of their project as they moved into production.
As we continued to bring more people up to speed on Elasticsearch, we definitely thought the support would be useful...And when you're dealing with sensitive patient data in the healthcare industry, you really would prefer to use the most mature, complete offering from the company behind the product – that's why we went with X-Pack.
The How: A Search Engine for Persons
Influence Health uses Elasticsearch for population segmentation. They currently host their Elasticsearch cluster on Azure, partitioned by client so they can't see or expose each other's data.
They initially gather patient data from customers in the form of flat files or HL7 messages (a set of international standards for exchanging and developing clinical and healthcare administrative data). They also enrich the patient data they receive from customers with third-party consumer and demographic data.
Next, everything is ingested via Spark into their Cassandra database. As new patient records come in, they cleanse them by matching what they already know about the patients from previous data dumps, and update the deltas to a change capture feed.
They then run a scheduled sync of this data from Cassandra to their Elasticsearch cluster via ES-Hadoop using its native integration with Apache Spark – which they've found to be quite quick. For example, for a customer that has 3 million patients with 10 million clinical encounters, it takes about 15 minutes to index the data from Cassandra into Elasticsearch.
In order to build specific patient lists for their customers, Influence Health heavily utilizes the aggregations features of Elasticsearch – especially the cardinality aggregation and the scripted metric aggregation, the latter of which allows them to do grouped counts for household-level metrics.
X-Pack is a also vital part of Influence Health's infrastructure. Under HIPAA rules, all Protected Health Information (PHI) must be encrypted in rest and in transit. The in rest part is accomplished via Transparent Data Encryption of the hard drives of the Elasticsearch nodes. The security features of X-Pack give Influence Health the needed encryption in transit ensuring that data is never sent in clear text between nodes.
The Results: No More Human-to-SQL Translators
Since implementing Elasticsearch, the data services team at Influence Health no longer needs to be involved in turning around list requests for clients – the client services team can create them themselves. This transforms a process that used to take two weeks into Elasticsearch queries that run in 7-8 milliseconds, allowing a list to be put together in 10 minutes or less.
Our data services team has a lot more time to work on our product instead of being human-to-SQL translators.
The Elastic support team has also been there every step of the way to help guide Influence Health on its Elastic journey – everything from query optimization, planning and navigating upgrades, knocking out break-fix issues, resolving high-severity problems, and offering sound advice on any topic they face together.
We've engaged with the Elastic support team quite a bit: when we upgraded from Elasticsearch 1.5.2. to 2.0, when we had some queries running slowly we asked for ideas about how to make them better – even just to ask random questions when other colleagues aren't available...No matter the question, big or small, the Elastic support team provides useful and quick responses. It's been great.
The Influence Health Cluster
- Clusters1
- Indexes40
- Nodes10
- Query Rate80 per second
- Hosting EnvironmentMicrosoft Azure
- Replicas2
- Documents>1 billion
- Time-based Indices40
- Total Data Size5 TB
- Node SpecificationsSSD, 64 GB RAM, 8 CPUs
- Daily Ingest Rate250,000 documents