Staffing A Data Warehouse – Part III
Last month we went through part II of a three part series designed to outline criteria for staffing a data warehouse program within a corporation. Part I focused on breaking down the various roles, thoughts on how to organize those resources, and considerations for handling support and development. Part II dealt with key concepts for building a right-sized organization targeted towards the unique needs of your organization.
In part III, we will go through some important best practices, critical success factors, and basic things you can do that may help you get up and running more easily and efficiently. All of these topics can be classified into any number of groupings. For the purposes of covering the most common factors, I will choose to define criteria by the sizes of development staff and limit suggested points to four [Under CSF you list five – want to change that or this?].
For organizations with sizable analytics needs (over 20 total staff working on analytics efforts across the organization):
1. Centralize your data warehousing resources. The skills required to design and develop data warehousing solutions are not inherent in programmers. These are unique skills that programmers and analysts must learn in order to be effective. If you allow anyone without the proper skill set in your company to develop these solutions, you risk having developers who spend one year on what should be a three month solution. Then those resources, who haven’t touched the tools or technologies for six months face a later problem, they can’t even remember how to navigate the tools and language. They will also likely lack best practices knowledge around development processes and techniques. I have seen this approach of letting everyone use these tools fail many times, but have seen only a few where it has been successful. Don’t try to outsmart everyone else, go with what works.
2. Separate resources by area of focus: data vs. usage. Large organizations have the challenge of serving many users. That usually means many types of access tools and broad collections of data from many areas. These tools and technologies take years to learn. By separating skill sets, you will build the experience in your team a component at a time. You can also build expertise within the team, rather than use generalists. This gives you a best practice of well designed, consistent development, support, and maintenance. Processes are refined and followed.
3. You need an architecture level resource or two with an overall understanding of the various technical layers of your solution. When developing your solution, this person must have prior successful data warehouse architecture experience similar to what you are trying to accomplish. Since that skill can be really difficult to find, I often recommend hiring an experienced consultant with several examples of successful architecture (ideally similar to your needs) and then having that individual work side by side with a senior level person on your team that has strong design skills for knowledge transfer. This resource will focus on component layers, approaches, techniques, tools, and infrastructure.
4. Build a small team of database experts. They will handle system and application related database design, development, support, and auditing/monitoring. These resources are crucial to effective organizations. While Extract, Transform, Load (ETL) developers are challenged to deliver programs, the database administrators (DBAs) help ensure that all programs and processes are efficient and fit within the overall series of jobs and usage to be as effective as possible.
For organizations with smaller analytics needs (over 20 total staff working on analytics efforts across the organization):
1. Develop up to a quarter of your staff to be specialists who understand most aspects of a technology – like Business Intelligence (BI) or ETL. While you may only have 1 person that is focused on ETL all the time, that will give them the ability to learn some of the more complex design techniques and improve your overall team skills and progress. Make sure to pick someone that is really strong at design and has good teaching / mentoring skills.
2. Hire at least one resource with significant prior data warehousing experience or use a consultant. With a smaller team, it is even more critical that you have someone that can help you get up and running with tools and processes effectively and in a reasonable timeframe.
3. Get training for your technical resources on processes, professional skills, and techniques. Let them learn tool syntax via limited training, but more hands on learning and leveraging knowledge bases.
4. Don’t ignore the business analyst position. It is common for smaller organizations to think that they just have technical resources meet with the business users and build exactly what they ask for. A business analyst will help outline the right solutions that will provide much more value at less cost for your users.
Critical Success Factors
1. Process training is much more important than tool training. Language or tool syntax is important in order for developers to be able to deliver their solutions, however, it is only fractionally as important compared to setting up the right processes and techniques. I always recommend that you spend at least three times as much energy on process training as on syntax training. A tool or language is only that. Once a developer knows how to code, switching language is as easy as learning syntax. However, if a developer is the best Java developer in your corporation but now are expected to excel in the design of data integration, without significant training and guidance, they will likely have several failures or very poor designs before they ever start building effective solutions.
2. Spend a little extra time up front and define an architecture and roadmap for where your business needs require you to go. Data warehouse architectures DO NOT evolve. I often hear management level resources talk about a data warehousing approach where they will initially design it one way and then as it grows it will evolve into another solution. Imagine a star schema data mart meant for a pilot project turning into a 3rd normal form atomic data warehouse after 5 years. You can define an incremental approach that will allow you to build a solution a component at a time, but architectures just don’t evolve.
3. Make sure your training and travel budget is two to three times your normal IT budget. Since you are building an organization, it is imperative that you recognize the difference in skill sets and the training required to develop them to build an effective data warehousing organization.
4. Buy infrastructure periodically, not constantly. Right sizing your infrastructure will help you avoid having constant infrastructure projects that conflict and constrain the projects where you are trying to add data or usage to your warehouse solution. Growth in warehouses is imperative (if you have any level of success). Thus, if you can make it a practice to buy 100 – 300% more processing and storage power than you think you need, you will enable additional growth without every project having a separate dependency on adding infrastructure. There is also a significant cost savings in building this up periodically.
5. Operations – separate out infrastructure support resources consistently with how you leverage these resources across your applications. Every organization has an approach for infrastructure and it is more important that you follow that than break out with something new.
Tips and Tricks
For sizable analytics needs:
1. Encourage resources to switch between areas of expertise every other year. This helps to build your next Senior Analysts, Architects, and Team Leads while strengthening the overall team. Any performance issues must be dealt with and should be held back from moving or it will continually slow down all efforts.
2. Conduct formal walkthroughs of all ETL design, and later of code by DBAs, Data Modelers, and Quality Resources. This will help catch problems in architecture or design before they get to the users. Too often the DBAs and Data Modelers are not invited to these meetings, but their specific expertise can be critical to the process.
3. Build success before you consider outsourcing. Outsourcing the right components at the right time can be a key part of your strategy. I have found that developing proven processes and solution design templates greatly enhances the probability of success when dealing with outsourcing.
4. Focus ETL and BI tool support within those groups. I have seen organizations attempt to put responsibility for tool configuration into the hands of operations. Since those resources do not work directly with the customer, their priorities and experiences are not business driven. They are typically also not in-depth users of the tools. This results in restrictions or controls that limit the effectiveness of the tools and thus your resources.
For smaller analytics needs:
1. Don’t go overboard with tools – have only one or two options for ETL and only one or two options for BI. If every user gets to pick their own tool, it becomes a practice of resources constantly learning new tools, but never getting good at any of them.
2. Utilize user groups for seeing examples of what other organizations have done and getting contacts that can help with specific technical challenges. Almost all of these user groups are free and only meet for a half day to a full day once a quarter. This is a significant potential source of building experience in your team and might even lead to connecting with resources that can be relied on for a call or two that will help you get past a design or development challenge.
3. Encourage resources to build on existing work. This is especially pertinent in ETL work. Good developers love to write complex code. Unfortunately, many of them need to be coached to leverage work that has already completed 95% of what they need for the next job. Leveraging this could lead to producing much more work with less effort.
4. Stay away from the big vendors. For smaller, more focused organizations, the big vendors will generally tax your funds and define approaches that are way beyond fitting the requirements of your organization. Very few smaller analytics needs require the use of the top of the line hardware/software that is invariably recommended. Big vendors also tend to bring in a small army of consultants to deliver to your needs and don’t do a good job of training your employees.
In the same way that data warehouses can be architected in unlimited designs and approaches, so an organization can setup and staff a department to build and maintain these solutions any way they want. With the options and factors involved in defining a successful organization, it is only feasible to outline some advice and best practices. While unlimited options are a luxury, they are also a curse. What is right for your organization? Leveraging some of the best practices, critical success factors, and tips above may help you improve your effectiveness or speed your organizational development. Copying another organization is only good advice if they are in need of and doing the same thing the same way that you are (and obviously only if they are really effective at what they do).
About the Author
Bruce has over 20 years of IT experience focused on data / application architecture, and IT management, mostly relating to Data Warehousing. His work spans the industries of healthcare, finance, travel, transportation, retailing, and other areas working formally as an IT architect, manager/director, and consultant. Bruce has successfully engaged business leadership in understanding the value of enterprise data management and establishing the backing and funding to build enterprise data architecture programs for large companies. He has taught classes to business and IT resources ranging from data modeling and ETL architecture to specific BI/ETL tools and subjects like “getting business value from BI tools”. He enjoys speaking at conferences and seminars on data delivery and data architectures. Bruce D. Johnson is the Managing director of Data Architecture, Strategy, and Governance for Recombinant Data (a healthcare solutions provider) and can be reached at email@example.com