Comparing Warehousing Toolsets
Are you looking at purchasing or selecting a data warehouse related toolset – BI, ETL, Data Profiling, etc…? Organizations often burn lots of time and energy trying to figure out what tools, software, and hardware they want to run their data warehouse and analytics on. Most of the various tools and components can be quite expensive to build and support, so I can understand the concern that they feel for making sure they spend their money wisely.
Whether you are building a new environment, trying to go towards only one tool, or re-evaluating your existing solution, the task always seems to take much longer than it should. There are many platforms and toolsets to address, with a plethora of vendors in each area. So how can you go about choosing these tools without taking months or years? Here are a few thoughts that should help you navigate this landscape.
Existing Methods For Tool Evaluation
There are many project templates, spreadsheets, and classes/workshops that are targeted at helping you pick the right tools for your organization. Many consultants charge a significant fee for helping to lead an organization through tool selection. Here are a few of the more common, mostly unsuccessful, approaches I have seen:
Overkill –Designate a committee of resources from many areas that have little to no experience with the tool to be chosen and let them evaluate all of the vendors. They will then need to come up with a gigantic spreadsheet that lists all of the criteria they are evaluating the vendors on (many times hundreds of items to weed through). Then, once they have spent months or more whittling the contenders down, there will have to be a 3 month bake-off of the tools in an actual small-sample solution. All in all, you could be looking at a year or more. The more people involved, the harder to generate consensus. The less actual experience and architects involved, the less likely they are to pick the right tool for your need.
Our Standards –Whatever vendor we like working with, whoever our executive leadership tells us to work with is who we buy from. This is referred to as our “corporate standard”. The truth is that someone already has the solution picked out for reasons other than whether it will fit or actually perform as needed.
Just pick one –Tools are unimportant, just pick one. While this is nice to work from in regards to getting it done quickly, the right tool is critical to perform the right need. You can use a hammer to pound in screws, but don’t be surprised if there aren’t many complications that are created by that approach.
Demand only one –Due to over-riding costs and decreasing budgets, many leaders identify an IT point person and charge them with selecting one tool that will replace the variety of tools and environments in use today. The thought is that forcing all solutions down to one vendor or tool will allow them to eliminate all of the other tools and drive their costs down significantly. The reality of this approach is that you will spend money and time trying this only to cause disarray with your current solution and in all likelihood, have to revert back to other tools due to problems/complications (if your organization has a culture that will allow you to admit mistakes). Having the right tools for the right usage will speed development, increase usage, and ultimately lead to better business improvement. Typically, only very small organizations can do that with one vendor, or one tool. However, I promise you the vendor will tell you they can do it all.
The Root Challenge
Picking the proper tool suite to cover all areas of your data warehouse environment is only as complex as you make it. In theory, you could spend significant amounts of time and resources doing bake-offs of the technologies in all key areas of your warehouse environment and you would be happy knowing that you picked the right tools for your needs. Unfortunately, years would have passed and the business need would have caught up and passed the wasted cost and time spent doing something that can be done very quickly.
There are many layers to choose from, whether it be the DBMS, ETL tools, BI tools, Data Profiling and Quality Assurance tools, Hardware, Replication tools, etc… Each one of these areas has a tie to needs, usage, and environments that can impact the other. Navigating all of these individually doesn’t allow for tying together these dependencies. Taking an approach that leverages experience and knowledge of the marketplaces for the various tools would significantly expedite the process.
The More The Merrier?
A past sponsor of mine, a great organizational visionary, is a sailor in his spare time. We would talk about building a strong solution being a matter of expertise. If he were to build a boat, would he hire 50 people who had never sailed and have them beat the options around until somehow they came up with and conducted an effort to build it OR would he hire the 2 sailors that had sailed all their lives and hand built many boats? The answer is simple. There are many blueprints for boats, but without the seasoned builder, that blueprint is likely nearly impossible for a complete novice. The question you have to ask is: would you even get on the boat built by the novices, knowing you may be risking your life? Building data warehouses, most specifically larger ones, is the same for most application developers, albeit not quite as much of a personal investment in final success.
I was replacing the siding on a house I own recently with a good friend who is a carpenter by trade and one of his associates. I pride myself on being a very hard workerwho likes to move as fast as I can and yet get the job done right. Watching a couple of seasoned siders, I quickly found myself playing the gopher role. I was grunt labor and would “go for” anything that was needed to keep them equipped to get the job done. Staying out of the way was the best way I could make sure the project went as quickly as possible. When picking these types of tools, it is best to work with experts that have your best interests in mind and have references that will back up their ability to assimilate and digest information quickly and ultimately lead to an environment that is successful and cost-effective.
If an organization is going to approach this in a truly pragmatic manner, picking any of these tools is a very simple process that can and should be done in a matter of days or weeks by a couple of experienced people, not over months or years by committees of resources that are inexperienced. Almost every DBMS, BI tool, ETL tool, and hardware has success stories that you can leverage to say why you should be using them in your organization. That isn’t to say that every tool is right for your organization. Most of these vendors over sell that they can do everything when in reality they can’t.
Data warehouses can be significant business advantages when used properly, but when they fail can prohibit the crucial growth your organization requires to compete in its marketplace. Unfortunately, when resources that haven’t worked with these tools and solutions make decisions on the tools that their efforts will use, they are typically misguided, oversold by the vendor, or tied to some questionable relationships with a vendor. That is why you often see organizations that have good tools setting up projects to reset their strategy – perhaps it wasn’t so good to begin with.
Picking the tools and technologies for your warehouse environment is a crucial step, but it shouldn’t be a lengthy one. Strong, experienced data warehousing architects that have worked with many tools can quickly understand your organizations resources, expertise, needs, and existing investments to help you select all of the components of your environment without slowing down your projects or stopping them altogether. Having your goals and objectives defined, then hiring a seasoned consultant to help you through this is a great approach, but making sure they do not drag out the process to line their pockets will allow you to get to the business of building your analytics, instead of wasting time on perfecting the perfect technical environment that could eventually run those analytics.
About the Author
Bruce has over 20 years of IT experience focused on data / application architecture, and IT management, mostly relating to Data Warehousing. His work spans the industries of healthcare, finance, travel, transportation, retailing, and other areas working formally as an IT architect, manager/director, and consultant. Bruce has successfully engaged business leadership in understanding the value of enterprise data management and establishing the backing and funding to build enterprise data architecture programs for large companies. He has taught classes to business and IT resources ranging from data modeling and ETL architecture to specific BI/ETL tools and subjects like “getting business value from BI tools”. He enjoys speaking at conferences and seminars on data delivery and data architectures. Bruce D. Johnson is the Managing director of Data Architecture, Strategy, and Governance for Recombinant Data (a healthcare solutions provider) and can be reached at firstname.lastname@example.org