Beware of Scrum Fanatics On DW/BI Projects
By Larissa Moss
You may be wondering what in the world are “Scrum Fanatics?” And why should you beware of them? And what is Scrum anyway? Scrum is an agile software development method. What’s so bad about that? Nothing, as long as you use it for software development projects and not for business integration projects. Isn’t a data warehouse a software development project? No, it isn’t. It’s a business integration project. Scrum, or any other feature-driven software development method, is not the right methodology for data-driven business integration projects.
Those of you who have read my previous articles know that I am a big fan of agile methodologies as opposed to traditional waterfall methodologies, especially for DW and BI projects. In fact, I have been evangelizing an agile approach for DW/BI in my courses and conference presentations for almost five years. Imagine my surprise when I recently started to get attacked and ostracized by conference attendees for using the word “agile” with my DW-specific agile Extreme Scoping™ methodology because my methodology isn’t Scrum. Since when did the word “agile” become synonymous with Scrum? Last time I checked my dictionaries, agile meant “nimble, quick-moving, and able to move quickly and easily.” Nowhere did it say that agile meant Scrum.
Where Did Scrum Come From?
Scrum is a term borrowed from the British game Rugby. Ken Schwaber is the author of Scrum. Kent Beck is the author is eXtreme Programming (XP). Kent Beck and other agile methodology gurus like Jim Highsmith, Alistair Cockburn, and Sanjiv Augustine are members of the Cutter Consortium, a Boston Think Tank, as am I. I have met many of them at the Cutter Consortium over the last decade. All of them are either project managers or senior developers, each with decades of experience developing stand-alone operational systems – most written with object-oriented code. Individually, and collectively, they have compiled volumes of evidence why waterfall methodologies do not work and why those methodologies should be abandoned. They have applied and refined agile techniques since the mid 1990’s and officially created and published the Agile Manifesto in the 2000-2001 time frame. Most agile authors and practitioners are members of the Agile Alliance (www.agilealliance.com). None of the published authors are DW or BI practitioners and none of the popular agile software development methodologies like Scrum and XP were developed with DW or BI in mind. Writing software to create stand-alone operational systems does not require data integration efforts like data standardization, enterprise data modeling, business rules ratification by major business stakeholders, coordinated ETL data staging, common meta data, collectively architected (designed) databases, and so on. Therefore, it must be understood that all popular agile software development methodologies are focused on writing and delivering quality software (code) as quickly as possible without significant regard or focus on data.
What Are The Main Components Of Agile Methodologies?
There are many wonderful components to agile methodologies. I will mention but a few. The Agile Manifesto emphasizes people over processes and demands that the developers have full authority over their workflow and will not tolerate any interference from anyone outside the team, including the users and their own management. The Manifesto prefers customer collaboration over contract negotiation and expects system ownership and ongoing participation of the users. It further believes that nobody should work alone, and that all tasks and deliverables must be co-owned and co-developed. Product owners are the users who are solely responsible for writing user stories (broken-down and simplified requirements) and prioritizing and selecting the sequence in which their user stories will be developed. Agile also places more importance on working software than comprehensive documentation and makes frequent software deliverables its primary goal.
The Agile Manifesto also values responding to change over following a plan. In order to manage projects without tracking task activities, all Sprints (software releases) must be time-boxed into weeks or months (10 days in XP and 29 days in Scrum) – not years. Furthermore, software is release in predictable increments (i.e., rhythm or cadence). Nothing is cast in concrete. Therefore, everything can change, including the original requirements. But once a set of features (requirements) are selected, the scope is frozen until delivered even if the user notices an error. In that case, the project is cancelled, re-estimated, re-packaged, and re-started. Quality comes before quantity and is applied throughout the software development process by having two developers responsible for each line of code (pair programming). The developers keep the code simple. They build and use reusable code modules, write test cases before writing code, and produce production-worthy software with each Sprint. Since systems evolve one Sprint at a time, all code can be “refactored” (i.e., refined, redesigned, simplified, improved) with each release. Projects are not tracked by tasks on Gantt charts but with burn-down charts. Effort, converted into points based on relative complexity, is plotted on the Y axis and time (10 days for XP or 29 days for Scrum) is plotted on the X axis. When it becomes evident that the trajectory of current (used up) effort in time will miss the deadline, the project is immediately re-scoped.
How Is Agile Different From Waterfall And Spiral?
Traditional waterfall system development life cycle (SDLC) methodologies were created to support a traditional stovepipe development approach. The activities and tasks in all phases are geared toward developing stand-alone (stovepipe) systems. Unlike DW projects, stovepipe systems do not require enterprise activities and do not involve other users from other departments during the SDLC. Each phase must be completed before the next phase can start. The majority of development time is spent on paper, creating a requirements document, external design models, internal design specifications, and so on. Each system is believed to have a beginning and an end. The projects are launched because of some specific well-defined business needs, and they end when the systems are put into production with full functionality as originally requested. After implementation, the stovepipe systems get maintained and occasionally enhanced by maintenance technicians other than the original developers. Even with stovepipe systems, this type of methodology has been a problem because each system is different, each project team is different, and each set of users is different. Estimates based on other projects are highly unreliable because of all the differences. There are too many layers of analysts and designers between end users and developers, and requirements often get mis-communicated every time a piece of work is handed off to the next person. Users don’t see their system until acceptance testing, at which time they frequently notice errors and omissions that have to be corrected with future enhancements.
Spiral methodologies do not presume to build stand-alone systems with a distinct beginning and end. Instead they support building large systems iteratively, one chunk at a time. This method is popular with data warehousing where we build the DW one BI application at a time. The DW is assumed to be an evolving, standardized, and integrated collection of databases and applications, which provide the business community easy access to their business data. This type of development has an enterprise perspective, which needs an enterprise infrastructure with technical and non-technical components. Some of the non-technical components include enterprise data modeling, data standardization among vertical lines of business, data reusability, data governance, data resources, data policies, and enforcement of those policies. That means that spiral DW methodologies have additional tasks that need to be performed and some of these tasks involve stakeholders other than the primary user of the BI application being developed. But with the exception of developing the system (or the DW) in iterations, spiral methodologies still follow the waterfall approach within each iteration. While the benefit of spiral methodologies over waterfall methodologies is the recognition that Big-Bang development of a huge and complicated scope is not doable, their drawback is that during each iteration the tasks are still performed in a prescribed sequence (planning, requirements, analysis, design, testing, implementation, deployment) with the majority of effort being documented on paper before starting to code.
Agile methodologies do not recognize a service request for a new system to be the final set of requirements that the team has to deliver on a committed date, which was calculated based on their estimates (best guess) of how long it will take to turn the requirements into working software. Instead, the agile project teams view the service request as a “vision” for a system that may or may not end up looking exactly the same when it is finally delivered. With the participation of the user, the developers dissect the requirements into desired “features” (documented as user stories), which are put on a product backlog. The users control the product backlog. They can add features or remove features. They are also responsible for prioritizing the features and selecting a few for the first (or next) release. Rather than come up with estimates that are cast in concrete, the team “speculates” how long it might take to turn a few selected features into working software based on what is known to the team at that point in time. Progress for developing the requested features is measured by the number of features delivered and not by the number of tasks performed. Any task deemed necessary can be performed anytime by anybody during “exploration” (prototyping) or Sprint (10 or 29 days in duration). Progress on a Sprint is tracked through a burn-down chart, which shows the total number of effort-points (Y axis) plotted on a timeline (X axis). The chart reflects the effort speculated versus the effort spent. If the trajectory of the burn-down points indicates that the deadline cannot be met, the scope is immediately adjusted or the project is cancelled. The deliverable of a Sprint is production-worthy software for the selected features. In other words, it is a partially functioning system. Before the next Sprint is started, the team takes one day to review lessons learned and to plan the next release.
There is unanimous agreement among agile authors, experts, and practitioners that agile software development methodologies work for small stand-alone projects with self-motivated developers and a participating user. However, there is considerable disagreement among the same experts whether agile can work for any and all types of projects. What about extremely large projects with dozens of people on the team? Or projects that are so complex or so regulated that they cannot be dissected into multiple smaller releases? What about highly interdependent projects or highly interdependent resources? What if you are a consultant or government contractor who has to bid on the entire service request or RFP (request for proposal) in order to be awarded the contract? What if your project management office (PMO) insists on following their procedures or if you have to meet stringent compliance requirements? What if the business people won’t accept “speculations” but insist on estimates cast-in-concrete and refuse to participate in project activities? What if you have mediocre developers that need a lot of coaching and motivating because they cannot or don’t like to think for themselves? The jury is still out on all these types of projects.
Can Agile Be Used For BI?
That brings us to the next question: Can agile be used for BI? Well, that depends on what you call BI. There are a growing number of companies that profess to be using agile methodologies on BI projects, in particular Scrum and XP. My research shows that those companies restrict their development effort to writing code for stand-alone BI applications, very similar to the stand-alone OO applications the rest of the agile practitioners are developing. In other words, the BI teams don’t deal with the data – or at least, not very effectively. Many companies have gone so far as to separate the DW team from the BI team and have both teams report to different managers, which not only disrupts the cohesion of the effort but creates an unfair competition and ill feelings between the two teams.
Some BI teams wait for the data to be ready in the DW before they develop selected BI features – often using XP because that type of front-end effort can often be accomplished within days or weeks. Other BI teams cannot wait for the data to be ready in the DW – or they don’t even have a DW – and they develop their silo BI solutions by going directly against the operational source systems. I find that most of these types of projects use a form of Scrum, which gives them a little more time to deal with data issues in their ETL sourcing process. Both of these approaches focus on writing code (BI application and minimum necessary ETL code) and not on standardizing, integrating, cleansing, and documenting the data. In fact, I often hear complaints that the data is the “problem” on these agile BI projects. Clearly these companies don’t understand that data management (content management) is the most important aspect of BI and not just data delivery (BI functionality).
Can Agile Be Used For DW?
If your definition of BI includes building the necessary ETL process, loading the DW data necessary to support the BI application, and having that effort be part of delivering the BI features, and if you want to apply an agile method to building the entire end-to-end solution (including data standardization, enterprise data modeling, and DW ETL), then using the published agile methods like Scrum or XP will not work because those methods were never designed for data-centric business integration solutions! They were designed for code-centric operational silo systems.
Using the proverbial 80/20 rule, 80% of our combined DW and BI development effort is on the back-end data management side and only 20% of our effort goes into building the front-end BI application or the ETL code (mostly meta data for an ETL tool). To make things more complicated, while BI applications are separate and independent pieces of software, a collectively architected ETL process is not, which makes the ETL architecture as well as the ETL software (code) extremely complex, not to mention the collectively architected DW databases. By their own admission, agile practitioners concede that the more complex the system architecture and the software are, the more “thinking” (architecting) has to be done before coding. On DW projects, the “thinking” alone can sometimes take weeks to avoid omissions and errors in the collective architectures, which later could result in rework measured in months! This is the primary reason why BI teams who want to use Scrum or XP separate their front-end BI activities from the complex and highly interdependent back-end DW activities. In my opinion, that is the wrong solution. The right solution is to find a compromise that will include back-end DW activities in the agile process.
Developing the front-end BI application is only one of 16 DW development steps! Most of the other 16 development steps have activities and tasks that require an enterprise perspective, such as data standardization, data integration, enterprise data modeling, business rules ratification, coordinated ETL data staging, common meta data, collectively architected (designed) databases, and so on. Only five development steps have a purely narrow project focus: project planning, requirements definition, prototyping, BI application development, and implementation. And for some projects, the steps of prototyping and BI application development may also require an enterprise perspective because of some shared functionality or common reporting. Neither Scrum nor XP take any of these additional DW-specific complexities and interdependencies into account.
All the agile BI practitioners that I have met so far realize that they cannot include all the other 15 DW development steps in their 2-week XP or 4-week Scrum software delivery cycles (cadence). But separating the BI application from the related DW activities and flying solo – as they do – is not the answer. Remember that BI is not true BI if you don’t address both data management (through a DW) and data delivery (through BI applications). Hence, a BI application by itself is not a comprehensive BI solution.
Instead, the functional requirements of the BI application should be used to prioritize the sequence of software releases. But the scope of each software release is completely determined by the effort required to populate the data in the collectively architected DW databases for the selected BI application features. Together, the prioritized features of the BI application and the effort to prepare the data to support those features should make up the scope of the software releases. Therefore, release plans must be based on speculation of data effort and not on speculation of how many features can be coded. Remember that data effort includes data profiling, data modeling, data standardization, data integration, data dispute resolutions (involving users from different departments), meta data capture, meta data storage, data lineage, ETL architecture, and database architectures. All of these activities must be included in scoping agile DW/BI projects. Since data drives DW/BI projects and since data management is the larger effort, the number of features that can be delivered with each software release using an agile DW/BI approach are often extremely small. Hence the name of my DW/BI-specific agile methodology Extreme Scoping™.
In the July 2007 issue of EIMInsight Magazine, Volume 1, Issue 5, I briefly explained how my agile methodology Extreme Scoping™ worked. My agile project planning process includes all the business and data integration activities that are so vital to DW/BI projects. Extreme Scoping™ uses all of the agile principles that can be used on data integration projects and discards those that don’t apply. It does not seek to replace the popular agile software development methodologies like Scrum or XP; instead, it provides the necessary agile DW/BI umbrella for them.
Scrum fanatics who don’t understand the difference between software development projects and business integration projects try to make every project look like a software development project. That endangers the business investment for a DW enterprise decision support solution. I appeal to the Scrum fanatics who may be reading this article: Just because you have a hammer, don’t assume that every project is a nail! It is not. Most of the work required to produce a DW with trust-worthy information about the enterprise has nothing to do with software development (writing code). And I caution DW managers to beware of Scrum fanatics who may try to force the rigid Scrum (or XP) software development rules onto your DW teams. Your DW project involves far more activities than just writing code and you are right in being dubious about fitting all those other activities into a rigid 10-day or 29-day delivery cycle. You can’t.
I sincerely hope that this is not the beginning of another DW war – this time a war of methodologies (the last war was a war of architectures). There is a place for developers to use their favorite Scrum and XP methods on DW/BI projects, but only for the comparatively small software development efforts of writing the front-end BI application or the back-end ETL metadata for the ETL tool. Use Scrum and XP for what it was intended for – writing software! Manage the rest of the DW/BI project in an agile manner with Extreme Scoping™ (soon to be published as my 5th book on managing DW/BI projects).
About the Author
Larissa Moss is president of Method Focus Inc., and a senior consultant for the BI Practice at the Cutter Consortium. She has 27 years of IT experience, focused on information management. She frequently speaks at conferences worldwide on the topics of data warehousing, business intelligence, master data management, project management, development methodologies, enterprise architecture, data integration, and information quality. She is widely published and has co-authored the books Data Warehouse Project Management, Impossible Data Warehouse Situations, Business Intelligence Roadmap, and Data Strategy. Her present and past associations include Friends of NCR-Teradata, the IBM Gold Group, the Cutter Consortium, DAMA Los Angeles Chapter, the Relational Institute, and Codd & Date Consulting Group. She was a part-time faculty member at the Extended University of California Polytechnic University Pomona, and has been lecturing for TDWI, the Cutter Consortium, MIS Training Institute, Digital Consulting Inc. and Professional Education Strategies Group, Inc. She can be reached at firstname.lastname@example.org