A BRIEF HISTORY OF DATA WAREHOUSING FROM THE VENDOR’S PERSPECTIVE (PART II)
By Bill Inmon
ETL PRODUCTS
One industry that did start as a series of new products with the advent of data warehousing was the ETL industry. In the very beginning, all code for placing data in a data warehouse was generated manually. This was a very tedious exercise. A better approach was to generate the code automatically. Thus formed was the ETL industry.
In the very early days – before there were data warehouses – there was Carleton and ETI. Carleton long ago merged with Oracle. Carleton and ETI were data movement products that were capable of reading data from old legacy systems. Then came Prism Solutions. Prism Solutions was a pure ETL company. Prism Solutions – through a series of mergers – is now a part of Ascential which is a part of IBM.
The early ETL processors looked at data a record at a time, using code that was specifically designed for the transformation. Then came Informatica with a new approach. Instead of generating code, Informatica was parametrically controlled.
There were both advantages and disadvantages to the generated code versus the parametrically controlled approach. Some of those advantages and disadvantages are:
- generated code – advantages – executes very efficiently, handles any form of logic, does not require any changes to source systems
- generated code – disadvantages – is code based
- parametrically controlled – advantages – can be generated quickly
- parametrically controlled – disadvantages – requires data to be flattened, operates inefficiently, cannot be applied to complex transformations, and so forth.
ANALYTICAL PRODUCTS
Analytical products have been around for a long time, easily predating data warehousing. SAS Institute had been around for 20 years before data warehousing came along. SAS primarily services the statistical analytical marketplace. SAS adapted very nicely and naturally to data warehousing.
Other products arose to do different styles of analytical processing. In the very early days there were what were called EIS (executive information system) products. EIS purported to serve the executive community. (When approached with the concept of a data warehouse as a foundation on which to base EIS technology – the president of one EIS firm said – I see no relation to EIS and data warehousing. I don’t see why EIS needs data warehousing.) EIS – in the form that it was in the early days of data warehousing – no longer exists today.
Today the major analytical products are Business Objects, Cognos, Crystal Reports, and MicroStrategy. These products service the needs for analysis of data coming from the data warehouse. In the beginning, these products encouraged customers not to build a data warehouse. They encouraged customers to take data directly from the legacy applications and bypass a data warehouse. The thought was that it would take too long to build a data warehouse and the vendor wanted to make a quick sale. Therefore, the customer was told that they did not need a data warehouse.
Today there are so many data warehouses built that the analytical vendor does not have to tell they customer that they do not need a data warehouse.
Another vendor that started down the data warehouse path was Hyperion. Then, several years back Hyperion announced that they were an application company not a data warehouse analytical company. Now they have decided that data warehouses are not such bad things and are aggressively supporting data warehouses once again.
METADATA PRODUCTS
Although the early data warehouses did not officially acknowledge metadata as part of the infrastructure, indeed, metadata has always been an unofficial and absolutely necessary part of the data warehouse infrastructure.
There were several products on the marketplace when data warehousing first started. They were Reltech and Brownstone, two metadata repository companies. In addition there was Rochade. Reltech and Brownstone merged together into Platinum, then Platinum was purchased by Computer Associates.
The repository that was in place when data warehousing came along was hardly ideal for data warehousing. In order to make the repositories work, much customization and much modification of the basic repository product needed to be done. In addition, the repositories were passive, not active, and to make the repositories work in the most effective manner there needed to be an active repository.
Related to the need to gather and manage metadata was the need for data modeling. The leader in data modeling tools was Erwin. Erwin was sold to Computer Associates. Like most of the products in the day when data warehousing began, Erwin was not designed specifically for data warehousing needs.
One product worth mentioning is DAG – Data Advantage Group. DAG works to solve some of the problems of ossified repositories and other stores of metadata.
THE GENESIS
The preceding description of the products that were in the marketplace and that were used for the data warehousing market space only describes a moment in time. Today things are different. There are several factors which shaped the world of yesterday into becoming the world of today.
Some of the more interesting of those factors are:
- the number of organizations who are doing data warehousing – today literally everyone – in one form or the other – is doing data warehousing. The marketplace is huge,
- the fact that vendors start out with a low price tag, and over time, the deal size goes up – way up. The effect is that the smaller organizations that need data warehousing are left in the dust. They just cannot afford the price tag of the “standard” industry solutions that we have today. This opens up great opportunity for vendors who can start with a smaller price tag and exploit that part of the market space that has been vacated by the successful companies. The result of this price increase is that the cost of data warehousing has gone ballistic,
- the fact that there are no solutions – every thing is a point solution – not a complete solution – in the data warehouse marketplace. However there is one exception to this rule. That solution is SAP. Only SAP has approached the data warehouse marketplace holistically. First there was R/3. Then there were Infocubes. Then there was SAP BW. Now there is a complete infrastructure, and only SAP has a true data warehousing solution.
- the definition of data warehousing is changing. Now there is DW 2.0, the definition of the next generation of data warehousing. (See www.inmoncif.com for details.) Data warehousing is going places where there before has been no movement.
With that in mind, some of the new products and companies to look out for are:
- Dataupia. Dataupia is a specialized data warehouse that has the potential for greatly lowering the cost of the data warehouse. The really great thing about Dataupia is that it can be deployed incrementally. You can take some of your data warehouse and deploy it under Dataupia. Or, if you want, you can take all of your data warehouse and deploy it under Dataupia. Furthermore, you can deploy Dataupia while still doing your basic processing under Oracle, DB2, et al.
- Talend. For the open source advocates, there is Talend as a tool for ETL. The really great thing about Talend is that it opens up the ETL marketspace to customers who need it but cannot afford a “standard solution”.
- Inmon Data Systems Foundation software. IDS Foundation is an ETL tool for unstructured data. Suppose you want unstructured data as part of your data warehouse. Now there is an unstructured tool that can access, integrate, and structure your textual data so that it is fit to be placed in your data warehouse, ready for analytical processing,
- SAP’s BIA. SAP has long been known as a software vendor. But, now in collaboration with IBM and Intel, SAP has hardware called the Business Intelligence Accelerator. You just plug in your SAP BIA and magically analytical performance gets better – much better.
- HP’s NeoView. Now there is even more competition in the world of MPP technology. In a marketplace where existing vendors price their products very high, competition is welcomed by the consumer.
- Kalido. Kalido provides a really unique capability. Kalido provides the ability to change technology as requirements change. With Kalido you don’t get stuck in the mud of having a data warehouse defined to solve the yesterday’s requirements, not today’s new requirements.
IN SUMMARY
The data warehouse marketplace has matured greatly. There has been – predictably – consolidation. In addition there has been an advancement of new products and concepts.
There has been a troubling trend, and that trend is that with each new sale, the traditional data warehouse vendors raise their prices. The net result is that there is a whole class of customers who need products but whose price point is below the point at which products are currently priced.
In addition, there is a general solution that is now appearing, and that solution is SAP, with a full suite of integrated products. Only SAP has a full solution.
About the Author
Bill Inmon, the father of the data warehouse concept, the corporate information factory, and the government information factory has written 47 books on data warehouse, data base, and information technology management. His publishers include John Wiley, Prentice-Hall, and QED. His books have been translated into nine languages. More than thirty of his books have been book club selections. In addition Bill has written over 750 articles for trade journals such as Data Management Review, Byte, Datamation, ComputerWorld, and many others. Currently Bill has a newsletter with b-eye-network.com that reaches 55,000 people. Bill founded Inmon Data Systems, a company that reads and manages unstructured data - emails, telephone transcripts, documents - and processes them for inclusion into a structured data warehouse. In addition IDS creates visualizations for unstructured data. IDS has technology that crosses the bridge between unstructured data and structured data, currently protected by seven patents. Bill can be reached at [email protected]