Separate Data From Views
It seems like there are endless levels of planning and challenges that are encountered in data warehousing/business intelligence efforts. One specific challenge that has caused many a project to fall into an “analysis paralysis” trap is the detailed definition of all requirements and thus metrics. Requirements definition typically means different things to each person. Yet, requirements are either loosely defined, or buttoned down to the extent they are not flexible.
You can easily spend months to a year trying to come up with all of the calculations a business “MIGHT” need. Ideally, you want your requirements to be clear, manageable, and good enough to proceed. In data warehouse projects, you might want to consider separating out requirements of data needs from those of usage needs. One way to look at this is the difference between data and views. When you define and build your solution, think about pulling all of the data related to the data topic that you are working with. I think of it like this: if you are gathering data about an order, gather all root data about that order. If you accomplish this successfully, any views or calculations that you want to apply to that data are possible. If you only pull what the business asks for, you will continue to have to go back to the well for every request. The more concerning aspect to this is that you will likely have to redesign as you add information, because it wasn’t planned for in your original design.
Let’s Redefine A View
Let’s get more in depth on views. If data is the actual information at its root level, then we could think of a view as merely one way to visualize data. For example:
- Derivatives or Calculations – These are simply calculations against data or combinations of fields that result in a new way to look at data.
- Age is a good derivative – It is the calculation of today’s date minus the birthdate. Age at time of clinical procedure is merely taking the date of the procedure and subtracting the birthdate. When thought of this way, you can quickly understand that if you store birthdate in your database (or birth year if that is all you can get), the ability to tie it to any date attached to a person’s records is feasible. Whether or not you precalculate any of these dates and store them in the database or not is almost irrelevant. For performance purposes, once you have the solution up and running, you can make those changes as needed without significant difficulty. If you choose to only store the calculated field, then if the user ever wants to know how long it was between when the procedure was requested and when it was actually performed might require a second project to redesign and include this new information. It may keep an IT employee with work in the short term, but a frustrated user will more typically understand the poor design is why they have to continue to wait for what they want.
- How about time of surgery – If you have a start time and an end time, it is very easy to calculate the total time. The hardest part here is not the calculation, but the definition. Is it the time the person was wheeled into the OR or is it the time where the anesthesia was administered, etc… The beauty of capturing all root data is that you can meet the inevitable challenge. Ultimately, if you want to satisfy many different types of users, you must store all of the appropriate times and then create the calculations that define what any specific user is trying to measure.
- Net sales – Typically this is some sort of calculation, perhaps it is gross sales revenue minus commissions. Gross margin would be a further calculation in that you would probably take out SG&A expenses. Then Net Margin, would subtract any additional expenses.
- Classifications or Groupings – This type of view would represent grouping of information. In healthcare we deal with a concept of an “Episode of Care”. Unfortunately, what an episode of care is differs on every level – institution to institution, how an EMR views an episode of care, potentially across medical practice areas, and most certainly from a billing vs clinical care perspective. Ultimately, if we store the root data about the visits, diagnosis, and other key data, it is theoretically possible for an episode of care to be able to be looked at in different ways depending on what it is the user is trying to measure. While this may sound utopian, it is ultimately only dependent on how well the data is captured and more importantly how well the data is initially defined. Another classification theme might be a territory distribution. If you have the details of where your orders come from and you want to break that out any number of ways geographically, ultimately you will be much more successful than only seeing it one way. Which way is right? There likely is not only one right way.
- Timescales – It can be invaluable to see your information spread across time. This could be used to reference the events of a day (flows of customers through your facilities – by hour, day of the week, etc…). It could also be used to reference customer flow or volumes over the course of the year. One healthcare quality metric that is not easily measured is the number of quality errors that occur the day after the Super Bowl or the day after the Interns start. While these may sound trivial, if we had simple access to this information, it would be possible that we could identify some specific areas of business that we need to shore up.
Where Is Your Data Maturity?
Most Business Intelligence or Data Warehousing efforts put a tremendous amount of focus into the specifics of business derivatives and views. In industries where the data is well captured and understood, this is a great approach. In those cases where your corporate data is in some disarray, where you lack the enterprise definitions or terminology, or where much of your data isn’t captured, that same approach can be fruitless. Solutions that don’t contain the data the business wants and need to be rearchitected are rarely the cause of any tool or technology, but directly related to the capture and storage of the root data. Generally BI should be thought of as putting the pretty face on the front of a well architected, terminology consistent, sound data warehouse. This is how you are able to realize the benefits of building 60 – 120 day solutions that all of the experts preach is standard to build by. If those same experts are challenged by the fact that your “data house is not in order”, even they should realize delivering those same solutions in that same time is not feasible or in the best case only meets a couple needs and doesn’t align with an enterprise solution that meets many needs.
If you take the time to capture that root data and clearly define it (storing meta data for users to leverage), then the fun part of data warehousing – business intelligence begins. Yes, actually working with the business to analyze their data and come up with new derivatives, calculations, classifications, timescales, and views is downright fun. As the business better understands its identity and analyzing results becomes a way of life, both business and the corresponding IT resources quickly grow. The complexity of the tools grows, the business strategy gets realigned, and pockets of expertise are in open view.
In an industry where the data is barely leveraged for analysis today, it is nearly impossible to come up with all of the views of data that will benefit the company at each turn. If you try to predict this or try to ask the business what they want, you may get lucky and satisfy a need or two. If you capture the data at its root level, you can work side by side with the business to test out new metrics and views. While this sounds like you are skirting requirements, it actually is a paradigm shift. Now you define the high level and root data requirements up front, and define the detailed calculations and classifications once you have the root data in place and tested.
With the power and beauty of the BI tools available in the market place today, it is much less important which tool you use, but much more important how well you place you root data. Tools for reporting, analysis, mining, visualization, and other methods of exploration are all relatively easy to use in calculations/derivatives/views.
About the Author
Bruce has over 20 years of IT experience focused on data / application architecture, and IT management, mostly relating to Data Warehousing. His work spans the industries of healthcare, finance, travel, transportation, retailing, and other areas working formally as an IT architect, manager/director, and consultant. Bruce has successfully engaged business leadership in understanding the value of enterprise data management and establishing the backing and funding to build enterprise data architecture programs for large companies. He has taught classes to business and IT resources ranging from data modeling and ETL architecture to specific BI/ETL tools and subjects like “getting business value from BI tools”. He enjoys speaking at conferences and seminars on data delivery and data architectures. Bruce D. Johnson is the Managing director of Data Architecture, Strategy, and Governance for Recombinant Data (a healthcare solutions provider) and can be reached at firstname.lastname@example.org