Getting Access To Healthcare Source Data

By Bruce Johnson

One commonly overlooked challenge to developing any data warehouse is actually getting access to the source systems and data that you plan on integrating into a data warehouse environment.  Whether it is just getting permission to get the data, or working with source system resources to actually acquire the data, this can be a significant challenge.  Over many years of working in a variety of industries, I have not seen an industry where this is more complex than it is in healthcare.


The Root Of The Problem

Healthcare has many system and data variables that are not present or at least not found in the same level of magnitude, in other industries.  The vast number of systems, the unique nature of data, the prevalence of unstructured data, the broad technologies, and number of proprietary data platforms are all commonplace in most healthcare organizations.  Combine this with the strong need for security and protection of patient privacy, and any request for access to data can be met with intense scrutiny.

Healthcare is also overburdened with demand for access to data from many different sources for many different reasons.  In addition to financial and operational analysis needs that most businesses face, clinical analysis and decision support, as well as medical research are activities that frequently require systems to provide data extracts.  Since the need for access to this information is extremely high, most of these needs are never met, with only the most urgent provided in a timely manner.  Due to this demand, information systems department resources have developed processes and staffed resources that prioritize those requests and satisfy individual requests, typically one at a time.  Thus, for those organizations without an integrated data warehouse, requests for data from many systems rarely get satisfied.

Provided that background, it is easy to understand how any requests for getting data are met with sincere skepticism and concerns for what will be done with the data.  The resources that support these systems are used to feeling responsible for the data in their system and being a gatekeeper for anyone who wants to use it.  It is also necessary to understand they are likely seeing this as any other request where the specific system resources will have to do all of the work to understand your needs and provide a specific extract – which satisfies one request, but does little or nothing to help integrate data from many systems to provide for many needs as a data warehouse should do.


Have A Sound Approach

At the least, getting access to source system data will be littered with roadblocks, but it could also prove to be a show stopper that shuts down your data warehousing projects.  The most important thing you can do to mitigate this is to have a clear recognition of the challenges ahead and have a sound approach that is communicated/shared across the organization.  Here are some of the most important factors that you should consider in your approach to getting access.

  • Change the perception that application developers own the data in their systems and are the gatekeepers for anyone who wants to use it.  While it is very important that these resources are accountable for the quality of data in their system, it is the institution as a whole that owns the data.  When the business decides it can provide access to many resources faster, easier, and more cost effectively, an integrated data warehouse effort is now enabled (that is just the first recognition step).  This starts with leadership and must be driven down to all levels in a consistent fashion.
  • Minimize impact on source system resources.  Use source system SMEs only as needed to understand source structures/schemas.  It can actually be counterproductive to have your source system experts work on your ETL.  Since you are actually pulling data from many disparate sources, having any particular source system build your ETL will mean the employees of each of the source systems need to learn and know the tools and techniques to build ETL, but even then they will still all do it in their own style/approach.  It will also create the dilemma that cleansing and transformation rules will not be consistently applied to data and one of the primary objectives you have of integrating data will not be realized.
  • Minimize impact on source systems and their SLAs.  Don’t do all the work when you extract the data.  Just get a copy of the data and do all of your cleansing, staging, transformation, and integration work on the data warehouse servers.  This can help alleviate response time issues in source systems, demands on source system servers, contention with outages/downtime, etc.   In general, ETL work is much more effective if you can stage data, to get it all on one platform, before you apply consistent rules and integrate – this in and of itself creates the basis for writing less code that is more effective, consistent, and maintainable.
  • Have your security clearly defined.  When talking with IS managers from the areas where you will be trying to acquire data, one item that will really help them support you will be their ability to understand the business projects, what you are doing, how you are doing it, and the security/protection you have in place.  When you can show them how their data is protected as it moves to your data warehouse and how security in your warehouse and through your front-end access applications is controlled and audited, they will be much more receptive to your requests.


Proactively Get Buy-In

Even with the right approach, great planning, and strong resources, you will not breeze through getting access to your data sources.  One of the best ways to mitigate this eventual roadblock is to have leadership and particularly the executive sponsor (s) of your effort aware of this dilemma and engaged in helping to mitigate it.  Any executive leader who is sponsoring an effort will want to know that you see a roadblock coming and you have an approach to making sure it doesn’t slow you down, but you need their help and support.  That is what they are there for.  Here are a few tips for how to help them help you:

  • Provide background – Help them understand how systems developers are responsible and diligent in securing their systems.  Then help them understand how most systems deal with requests for data – show them the number of systems you have to get data from and why you can’t follow the existing process for data access for every request.  When you are justifying a data warehouse project, you can accomplish this task by educating them before they even approve the effort and letting them know that if they can’t resolve this issue, creating an integrated data warehouse in not a project that will be successful.
  • Suggest a communication method/mechanism that can be delivered to groups by your sponsor.  The goal is to finish with senior management being aware of the business needs, value, and challenges to the effort you are undertaking and then making it clear that you need their help in making sure this roadblock doesn’t slow this project down.
  • Help executive leadership to see that data is an asset and that it belongs to the institution.  As this gets accepted at a senior leadership level, the planning, monitoring, and addressing of related issues provides a forum to make these challenges more obvious to executives and thus put responses in place to reduce them.
  • Prepare a presentation for the team to use when they are going to get data from a system or area – include some of these key slides:
    1. Some of the same slides that executive leaders gave for background information
    2. Who you are and why you are there
    3. Project goals, timelines, and key resources they will be interacting with
    4. Security you have built into your data integration process and into the data warehouse and any tools that access data there within.
    5. How can they help?
  • Lastly, make sure your team understands that when they run into issues with getting access to systems that they don’t struggle too long in the mud before they make leadership aware of the issue and ask for help.

One other key consideration is getting support and interaction from your audit and compliance areas.  While this is something most IT resources shy away from, in this type of effort having that endorsement and backing will be a huge plus.  Make sure these resources are added to steering committees and prioritization groups so that they can help you address any questions or concerns.  That being said, you are obviously going to have to convince them of why your approach is sound, secure, and effective.  You need to encourage them to understand the approach and architecture you are building.  You also need to help them understand all of the challenges you will face and how they can help you and the organization by being a key active member of the leadership of this effort.  If your compliance and security officers support your effort, it will be much harder for anyone to scrutinize security and privacy aspects that could shut your project down.  If you aren’t comfortable with this approach, consider getting some help in engaging and convincing those key resources.



Healthcare is challenged with large numbers of systems, many data complexities, privacy/security demands, and technology platforms of every possible combination.  As most organizations struggle to implement EMRs, new systems, and changes to the plethora of existing systems, demand for access to data is escalating at an extremely rapid pace.  Invariably, getting access to the source system data you need to build your warehouse will be more challenging than anyone could have expected.  In planning for your data warehousing effort, don’t forget to include the extra time it will take you to gain access to the critical systems that house the data you need.  Also, educate your sponsors and leadership of this challenge right away so they can help you educate those that you will be contacting as you request a copy of their data.

About the Author

Bruce has over 20 years of IT experience focused on data / application architecture, and IT management, mostly relating to Data Warehousing. His work spans the industries of healthcare, finance, travel, transportation, retailing, and other areas working formally as an IT architect, manager/director, and consultant. Bruce has successfully engaged business leadership in understanding the value of enterprise data management and establishing the backing and funding to build enterprise data architecture programs for large companies. He has taught classes to business and IT resources ranging from data modeling and ETL architecture to specific BI/ETL tools and subjects like “getting business value from BI tools”. He enjoys speaking at conferences and seminars on data delivery and data architectures. Bruce D. Johnson is the Managing director of Data Architecture, Strategy, and Governance for Recombinant Data (a healthcare solutions provider) and can be reached at

Free Expert Consultation