Do Healthcare Providers Need a Data Warehouse?

By Bruce Johnson

Most healthcare providers are active building, installing, or enhancing data capture systems.  Many are focused on developing, implementing, or revising an Electronic Medical Record (EMR) that can be a composite repository of much of the data accumulated for any individual patient.  At the same time, most organizations are starving for access to information that they can use to improve their business, processes, or techniques. 

The common technical solution across all other industries for providing data for reporting and analytics is a data warehouse.  Most enterprise data warehouses are internally developed within organizations.  Components or specific solutions can be developed or canned products provided by vendors.  With the need to provide access to data for analysis purposes, many healthcare institutions are looking at technology and solutions to determine how best to fill this gap.  Many vendors are preaching alternative technologies to help address this issue.  Is there an obvious answer, does healthcare need a data warehouse?

In order to answer that, we’ll need to start by taking a look at some of the analytical needs found in healthcare.


Analytical Needs

Healthcare is faced with a much broader need to analyze data than any other industry I have encountered.  Here are a few of the most visible needs: 

  1. Complex operations:  Their business and financial operations are much more complex than a standard product or services company.  The breadth of this data is incredible.  In order to process billing adequately and satisfy requests for payments from a Payor, healthcare has an incredible volume of information to capture and tie to the necessary standards.  How well they do this can have a significant impact on their financial return.
  2. External reporting requirements:  This is a significant challenge today as many organizations do not have access to the appropriate data to produce these reports in an automated fashion.  Oftentimes this results in the alignment of resources to gathering and sorting data to produce simple reports at great financial expense.  It should be noted, the volume of external reports required today seems to be significant, but the growth of external reporting has a long ways to go.
  3. The need for a proactive business environment:  A large healthcare organization in my home state recently reported financials that had them greatly worried.  They found that the number of patients with no insurance had gone up significantly and those with only Medicare had also risen a lot.  This meant a great impact to their financial numbers.  Unfortunately, without access to track this information through dashboards and scorecards that would keep executives aware of their status, organizations are limited to figuring out where they stand at the end of the year.  This puts them in a reactive mode.  In turn that makes budgeting and planning extremely complex.  It also puts any company into a state where they can no longer build the types of solutions that will have great long term value due to immediate cost concerns.  Somewhat of a catch-22 type scenario.  
  4. Operations management is greatly enhanced by the ability of any organization to have access to performance criteria.  Having measurements to watch enables a business to see how they are doing, make adjustments, and continue process improvement.  Whether this relates to patient flows, nursing staffing, or management of patient meals, efficiency and effectiveness effect profitability and patient satisfaction.
  5. Medical and clinical research are a critical part of continuing to improve the practice of healthcare, the overall health of our population, and addressing the escalating costs of providing healthcare. 


Having this knowledge of how the data would be used, let’s take a look at the role that the EMR plays.


Role of the EMR

The EMR is one large system that is a capture and consolidation repository of patient data.  While the promise of the EMR is that all patient data is contained there, too often that isn’t realized.  Most organizations capture some percentage of their data via their EMR.  Additionally, some of their other data capture systems flow into the EMR so that it can house that data and provide a consolidated view of a patient to a provider.  Like bringing in data from labs and registration, etc…  Thus, the EMR requires the ability to capture, view, and process individual transactions with crucial system response time.  It wasn’t designed for reporting and analytics.  This means that we can’t have resources querying or running reporting against the operational EMR while the doctor is using the EMR with a patient.  A complex or long running query could negatively effect the time it takes to capture or access patient data while they are in the office with a provider (remember, this is the number one reason a data warehouse exists to being with).

Since the EMR is not an option for providing analytics to healthcare, what other possibilities outside of a data warehouse could quench this thirst for information?



If you were to just brainstorm what might work without thoroughly thinking through the impacts of every solution, there would be an innumerable number of technical solutions that could provide this type of access.  Most of those would have some challenges that would inhibit success.  Some generally accepted alternatives include:

  1. Direct Reporting:  The most common method of providing access to data is to write reports individually against a specific system.  This gives you the ability to report on what is happening for the data in that system.  In healthcare this is a great challenge because it only contains data for that system, which is typically a fraction of the data accumulated.  Thus it is very limited in value and the amount of analytics you have access to.  If that system is a mission critical system, then it exposes the reporting to only running during off hours to limit the effect to the users of the system.
  2. Canned enterprise data warehouse solutions exist across most industries.  I know of several vendors that want to produce and provide this for healthcare.  Yet I do not know of one that has made it both implementable and valuable.  One of the biggest challenges with this is related to the uniqueness in healthcare terminology.  All of the canned solutions I know of are focused on abstract terminology.  This enables them to show you how the data can all fit.  That creates a much more significant challenge of helping anyone to figure out how to put the data in and how to get it out.  Until healthcare comes up with an agreed upon vocabulary and terminology, a canned solution will not be possible.
  3. Canned departmental or specific usage solutions exist for many aspects of healthcare analytics.  For those that have built an enterprise data warehouse, it is reasonable to map data from your warehouse to one of these solutions and get immediate value.  For those that have not consolidated their data into an enterprise class warehouse, many of these solutions are greatly complicated by the need to gather, cleanse, and govern their data.  Thus, the value of these solutions is negatively impacted and the cost to build goes up significantly.  As various medical workgroups build out terminologies and more healthcare providers build large scale data warehouses, I see a tremendous future in this area.
  4. Federation is a concept that many vendors are now promoting to provide access to all data no matter where it resides.  In theory, federation allows you to connect databases across many platforms and technology layers, providing access to all data no matter where it resides.  The promise of this technology sounds great.  However, in relation to analytics, the theory of the solution is an exercise in futility.  Remember earlier we talked about the original purpose of a data warehouse was to alleviate the burden of large scale queries on systems that have to capture and process individual data in a mission critical fashion.  The concept of federation does not address that issue, in fact, it complicates it further.  However, some organizations may have 3 or 4 data warehouses that are all based on the same model, vocabulary, and/or terminology.  In that case, federation would be a great way to connect data across those solutions.



Over the years other industries have found that the companies that can gather and provide access to comprehensive information in a timely manner are the ones best able to set themselves apart from their competitors.  It is well recognized that healthcare providers are on the cusp of a changing landscape that will challenge them in many ways like:

  • Shortages of knowledge workers
  • Increasing Medicare patients, which leads to decreasing profits
  • A competitive landscape with new providers, techniques, and methods for care
  • Costs that are continuing to escalate


In the face of all of this, they are in dire need of access to information about their business that will:

  • Allow them to keep their finger on the pulse of their financial health
  • Satisfy requirements for producing reporting to external governing bodies
  • Improve their own internal process and operations
  • Enhance healthcare knowledge and advance discovery
  • Provide a vehicle to enable Medical Research to have appropriate access to clear, comprehensive data


Understanding that the challenges in front of healthcare providers are unique from most other industries, the question isn’t whether or not healthcare providers need to have a data warehouse.  It is how to go about it.  How many should we have, where can we get standardized solutions, and how do we go about trying to build one.  This is where healthcare is actually similar to other industries.  It is much more difficult than it would appear – There are many variables to sort through, few examples of success, and each organization being unique doesn’t allow for a one size fits all solution.  The key is coming up with a sound approach and appropriate plan, then leverage common tools, standards, technologies, and expert resources.

About the Author

Bruce has over 20 years of IT experience focused on data / application architecture, and IT management, mostly relating to Data Warehousing. His work spans the industries of healthcare, finance, travel, transportation, retailing, and other areas working formally as an IT architect, manager/director, and consultant. Bruce has successfully engaged business leadership in understanding the value of enterprise data management and establishing the backing and funding to build enterprise data architecture programs for large companies. He has taught classes to business and IT resources ranging from data modeling and ETL architecture to specific BI/ETL tools and subjects like “getting business value from BI tools”. He enjoys speaking at conferences and seminars on data delivery and data architectures. Bruce D. Johnson is the Managing director of Data Architecture, Strategy, and Governance for Recombinant Data (a healthcare solutions provider) and can be reached at

Free Expert Consultation