Non-Technical Infrastructure for BI Applications
This article is excerpted from a book tentatively titled, BI Roadmap: The Complete Lifecycle for Decision Support Applications (Addison-Wesley, 2003). All material is copyright Larissa T. Moss, Shaku Atre, Addison-Wesley.
An enterprise infrastructure is to business intelligence (BI) applications what a transportation infrastructure is to automobile owners. In order to safely and comfortably travel with an automobile, there must be a physical infrastructure, such as roads, bridges, traffic lights, and traffic signs, as well as non-physical infrastructure, such as standardized traffic rules and their interpretation. For example, without the universal interpretation of the rules that “Green means Go” and “Red means Stop,” traffic lights would be of no use. Similarly, an enterprise infrastructure consists of two major components:
- Technical Infrastructure, such as hardware, middleware, and DBMS;
- Non-Technical Infrastructure, such as standards, meta data, business rules and policies.
Most organizations place a lot of emphasis on establishing their technical infrastructure but they completely neglect their non-technical infrastructure, presumably without realizing that it too is a critical success factor for an integrated BI decision support environment. Without a non-technical infrastructure, BI applications would only contribute to the existing chaos of stovepipe applications and databases. Here is why:
From the early age of IT, the mental model for providing an automated solution to a business problem has been to divide and conquer. Divide a large problem into smaller “digestible” pieces; i.e. prioritize and separate the deliverables. Conquer the problem and work on each piece individually until solved; i.e. build each deliverable separately.
This approach works very well for reducing risk by breaking a complex problem into small manageable chunks. However, this approach also has a severe drawback when applied without a non-technical infrastructure. Namely, it produces stovepipe systems (automation silos). The effects of stovepipe systems are lost business knowledge and lost cross-organizational business view, which severely impact business analytics and data mining.
Lost Business Knowledge
Most businesses are very complex, and as organizations mature, their business complexity increases. As business complexity is broken apart into smaller and less complex components, the interrelationships among those individual components are lost. Much of the business intelligence is contained in these lost interrelationships, and that is a problem for BI applications. Most BI applications, and especially data mining applications, expect to find “golden nuggets” of business wisdom embedded in these complex interrelationships.
Lost Cross-Organizational View
Although business managers can answer most questions about their own business functions, when asked a question spanning two or three lines of businesses (where complex interrelationships have been lost), they must scramble for weeks to piece together the answer. Fundamental business questions, such as “Do we know what general classes of customers we have?” or “Can we forecast the long term buying habits of our generation X customers?” are presenting multi-million dollar problems to large organizations. The answers to these and many other questions do exist in the real business world; only we have been neglecting to design our systems in such a cross-functional manner that would allow us to find these answers quickly.
Need for Non-Technical Infrastructure
An infrastructure needs to be put into place to prevent the BI decision support environment from becoming as fragmented as the operational and traditional decision support environments, from which cross-organizational questions cannot be answered. Creating this infrastructure involves cross-organizational activities, such as:
- Extensive business analysis involving business representatives from many lines of business, defining, or redefining the lost business knowledge and the lost interrelationships among business functions and business data.
- Resolving age-old disputes on data definitions and domains (valid data contents).
- Standardizing data names and data values to reflect true business rules and business policies.
- Getting agreement from the business representatives on these business rules and business policies in the first place.
- Creating a regular forum of business representatives for ongoing maintenance and review of standards, business rules, and business policies.
- Creating (over time) one consolidated, non-redundant data architecture for the entire enterprise to reflect the complex reality of the business; that is creating an enterprise data model that documents the data inventory of an organization.
- Using an enterprise data model as the primary vehicle for mapping the inventory of operational data to the inventory of BI data, including transformation and navigation paths.
- Creating a meta data repository with non-redundant business meta data and technical meta data.
- Creating and managing one expanding central staging area (per load periodicity) for the extract/transform/load (ETL) processes, rather than allowing independent ETL processes (not reconciled and with inconsistent transformations) for each data mart solution.
Enterprise infrastructure activities, technical as well as non-technical, are strategic cross-organizational architecture activities, which must be managed and coordinated by an enterprise architecture group, as illustrated in Figure 1.
Figure 1 Enterprise architecture group
An enterprise architecture is comprised of a set of pictorial representations (models) of the business (e.g., business functions, business processes, business data) and their supporting meta data, such as standard definitions, business rules, and policies. An architecture describes a set of business actions performed on any real world object in the course of conducting business. In other words, it describes the actual business the organization is engaged in. Every active organization has an enterprise architecture by default, even if it is not documented. When the architecture is not documented, the business actions and business objects of the organization are most likely not consistently understood by everyone in the organization. The goal of documenting the architecture is to avoid abusing, misusing, or redundantly recreating unique processes or data about business objects, and thereby losing sight of the cross-organizational picture.
A fully documented enterprise architecture includes at least five architectural components, as shown in Figure 2.
Figure 2 Enterprise Architecture Components
Business Function Model
This model depicts the hierarchical decomposition of an organization’s nature of business; it shows what the organization does. This model is instrumental for organizing, or reorganizing a company into its lines of business. It is common that one vertical line of business supports a major business function. Two examples are the Loan Origination division and the Loan Servicing division of a mortgage lending institution.
Business Process Model
This model depicts the processes implemented for the business functions; it shows how the organization performs its business functions. This model is essential for business process reengineering as well as for business process improvement initiatives, which often result from BI projects. An example is Loan Payment Processing, a process performed by a loan servicing division of a mortgage lending institution.
Business Data Model
This model, which is commonly called the enterprise data model or enterprise information architecture, depicts the data objects, the relationships connecting these objects based on actual business activities, the data elements stored about these objects, and the business rules governing these objects; it shows what data describes the organization. Unique data objects and unique data elements appear in the real world only once, and therefore they are documented in the business data model only once, regardless of how many hundreds of physical files and databases they are redundantly stored in. There is only one business data model for an organization. This model and the meta data repository are the two most important non-technical infrastructure components for an evolving BI decision support environment.
An application inventory is an accounting of the physical implementation components of business functions, business processes, and business data (objects as well as data elements). It shows where the architectural pieces reside in the technical architecture (e.g., programs, databases, tables, columns). An example is: “Customer Name resides in the Customer table, which is in the CRM01 database and is updated by the CRMDLY09 program.” Organizations should always identify, catalog, and document their applications as well as the business rules about their business data as part of the development work on every project – but they seldom do. Such inventories are paramount for performing impact analysis. Case in point: the colossal effort of Y2K impact analysis without such inventory!
Meta Data Repository
Although “a picture is worth a thousand words,” business models without words are not worth much. The descriptive details of the models are called meta data. Business meta data is collected during business analysis, and technical meta data is collected during design and construction. These two types of meta data are mapped to each other and are made available to the business community of the BI decision support environment. Meta data is an essential BI navigation tool. Some examples of meta data components are column name, column domain (allowable values), table name, program name, report name, report description, data owner, data definition, data quality metrics, and transformation logic.
Organizations must also establish architectural standards for their BI decision support environment in the same way they set up standards for their Web site. It would never occur to anyone to build a Web site where each Web page had a different look and feel. In the same vein, it should never occur to anyone to build a BI environment where each BI application had a different look and feel. Therefore, all BI applications must adhere to the same enterprise standards, which are itemized below.
A BI development effort must be guided by a methodology, which includes a set of all major activities and tasks that are appropriate for BI projects. However, not every BI project will have to perform every single activity in every development step. Some projects may justifiably skip steps, combine activities from different steps into one, or skip activities within a step. However, no BI project should be developed ad hoc. Organizations should have some guidelines for their project teams listing the minimum number of activities required (work breakdown structure), the mandatory deliverables, signoff requirements, and workflow dependencies, in order to control the risk of the projects.
Data Naming and Abbreviations
Similar to Web sites, some standards should be applied when creating BI applications (e.g., naming and abbreviation standards). Proven standards can be applied, such as the “Of Language” and the convention of name compositions with prime-, qualifier- (modifier), and class words — or new company specific standards can be created. The Data Administration group is usually trained in the various industry-standard naming conventions.
Abbreviations are part of naming standards, but they only apply to physical names (e.g., column names, table names, program names) not to business names. A standard enterprise-wide abbreviations list should be published, and it should include industry specific and company specific acronyms. Every BI project team should be expected to use these abbreviations and acronyms.
Meta Data Capture
Meta data is a world unto itself. Large amounts of descriptive information can be collected about business functions, business processes, business data objects, business data elements, business rules, data quality, and other architectural components. There should be standards or guidelines for: who captures which meta data components, how, when, and where. The meta data repository will then have to be set up in such a way that it will support the standards for meta data capture and usage.
Logical Data Modeling
Logical data modeling is a business analysis technique (not to be confused with logical database design). Every business activity or business function uses or manipulates business data in some fashion. A logical data model documents those logical data relationships irrespective of how the functions or the data are implemented in the physical databases and applications.
Project-specific logical data models should be merged into one cohesive cross-organizational enterprise data model. This activity is the primary job description for data administration, which is part of the enterprise architecture group. The enterprise data model is the baseline business data architecture into which physical systems (operational or decision support, including BI applications) are mapped. Standards should be established for creating the logical data models for BI projects and for merging them into the enterprise data model.
BI information can only be as good as the raw data it is based on. Most organizations have a lot of dirty data — too much to cleanse it all. Guidelines must be established to triage (categorize and prioritize) dirty data for cleansing. In addition, standards must be created that define acceptable quality thresholds and that specify how to measure data quality during BI database loads. Instructions for error handling and suspending dirty data records should also be part of the standards for BI projects.
Testing standards should specify what types of tests should be performed and who should participate in the various types of tests. Guidelines should be provided which describe the types of test cases that are required at a minimum, how much regression testing to perform and under what circumstances. A brief description of a test plan, maybe even a template, as well as instructions for how to organize and manage the various testing activities should be included.
Since a BI decision support environment will have multiple BI databases and multiple BI applications, and since BI applications are not standalone systems (silos), their development must be coordinated and reconciled so that consistency across the BI decision support environment can be guaranteed. That includes having one central staging area with extensive reconciliation programming for every input-process-output module regardless whether it is written in native code or produced by an ETL tool.
BI data is derived from operational data. Therefore, security guidelines applicable to the operational data are also applicable to the BI data. If data is summarized and drilling down to the detail is not enabled, some of the security features can be relaxed. But rather than allowing each project team to make up the rules as they go along, the owners of the data should establish security standards, which should be used to guide the project teams on what type of security measures are mandatory for what type of data exposure. These standards should include guidelines for categorizing security risks. Categories include data sensitivity, application security, network security, security against intrusions, hacks, viruses, and other nuisances on the Web.
Service Level Agreements (SLA)
Organizations function according to explicit or implicit business principles. They are explicit if stated in mission or vision statements, implicit if they are just “understood” by the staff. For example, if an organization rewards project managers for meeting deadlines even though their applications are full of errors, while it punishes project managers for missing deadlines even though their applications are flawless, the implicit principle is “speed before quality.” Service level agreements ordinarily support the explicit as well as the implicit business principles. Therefore, SLA standards should state the business principles and outline the minimum acceptable SLA measures to support those principles. For example, all projects must meet a 98% data quality threshold for all financial data. SLA measures can also be applied to query response time, timeliness, availability, or level of ongoing support.
Policies and Procedures
Standards and guidelines should also cover the policies and procedures of an organization, such as operating procedures, project change control procedures, issues management procedures, and dispute resolution procedures. Additional topics, such as the communication processes, estimating guidelines, roles and responsibilities, and standardized document format should also be part of the policies and procedures. It is important to remember that the purpose for having policies and procedures, along with standards and guidelines, is to help streamline and standardize the BI decision support environment. In other words, policies, procedures, standards, and guidelines must be value-added for the organization as a whole — or they should not exist.
Business intelligence is all about an enterprise architecture solution to the decision support chaos that exists today. It is a cross-organizational initiative. Therefore, a non-technical infrastructure is of critical importance. Absence of a non-technical infrastructure will lead to stovepipe development and will add to the spaghetti chart of systems with more data marts and more silo BI applications, which are neither integrated nor reconciled. As a result, organizations continue to lose the opportunity and advantage to enhance their business decisions and to compete.