The Semantic Web and EIM: Part 2 – Information Architecture
By Pete Stiglich
This is the second in a series of articles on the Semantic Web and EIM (click here to see first article). In this article I will discuss how the Semantic Web and semantic technologies can tie to your Information Architecture for a more holistic view of the enterprise, and for richer understanding and analysis of data resources.
Semantic models or ontologies can be a key change agent to increase enterprise integration as heterogeneous data resources can be described using RDF/RDFS/OWL standards and improved knowledge discovery achieved as more facts about our data or architecture can be uncovered as a result of inferencing.
Data Architects need to understand what the Semantic Web is (per W3C it is a web of data – not just a web of documents); what semantic initiatives are underway in the enterprise and what ontologies are being used or developed. Indeed, Enterprise Data Architects should be intimately involved in ontology development or adoption.
Overview of Information Architecture
In an excellent article by Mark Mosley on “Challenging EIM Issues Ahead”, key components of Information Architecture are documented. I will list these here in brief:
- Enterprise Data Model
- Information Value Chain Analysis (i.e., CRUD matrices)
- Information Supply Chain Analysis (horizontal data lineage)
- Reference data sets – e.g. standard code values
- Meta data architecture
- Data Integration Architecture, including Master Data Management (MDM) architecture
- DW / BI Architecture
- Semantic models aka ontologies (e.g., ontologies in RDF/OWL)
I would also add Data Rationalization analysis to this list. Data Rationalization is vertical data lineage – the lineage between model objects across model levels (e.g., conceptual, logical, physical), and it complements Information Supply Chain Analysis.
Semantic Models and the Enterprise Data Model
While our semantic models/ontologies can tie in with many of the other components of Information Architecture, the Semantic Models and the Enterprise Data Model (EDM) are more closely related.
An Enterprise Data Model typically is comprised of three (3) models – Subject Area Model (SAM), Enterprise Conceptual Data Model (ECDM), and the Enterprise Logical Data Model (ELDM). The ECDM in particular should be closely aligned with enterprise semantic models and vice versa. The ECDM identifies and defines the key business objects/data entities and the relationships between these (which can express many business rules). It is only natural that the ECDM inform our semantic models/ontologies given the knowledge captured in the ECDM about our enterprise. The ECDM could be considered a semantic model given the semantic clarity which much be achieved through enterprise collaboration required to come to common naming and definition of conceptual data entities. The ECDM is a business model – and is technology, application, business process, and business unit neutral. It very often allows the enterprise to understand itself in a new way – and must be very human understandable, as it is a key means (though too often underutilized) to achieve alignment between business and IT.
Semantic models or ontologies expressed in a paradigm such as RDF/OWL on the other hand, are meant to enable computers to understand data and meta data in a new way to facilitate knowledge discovery (through inferencing) and increased machine actionability. There can and should be much overlap and integration between the ECDM and ontologies – as is applicable for the domains and depth covered by the ECDM. An ECDM will encompass many domains (or subject areas) over time. Ontologies are “taxonomies and thesauri about a domain” – and so a product ontology would probably have taxonomies for product types, categories, and other hierarchical classifications. For example, a car would be categorized by make and model, and probably other taxonomies. These taxon entities would be represented in the ECDM, though the ECDM would not have the actual taxon values (except perhaps as informal subtypes so as to enable more rapid comprehension of entity meaning – but the number of these subtypes would be very limited). An important distinction between a data model and ontology is that ontology can contain both classes and instances.
The development of ontologies in RDF/OWL is not for the faint of heart. It is a relatively new paradigm, but has similarities with more traditional data modeling (e.g. E/R) – though with very real differences. For example, cardinality is a concept common to both data modeling and OWL for describing the maximum or minimum number of occurrences in a relationship. However in OWL, cardinality may also be used to make inferences in the data. An important distinction to keep in mind is that data modeling (at least in the Conceptual Data Modeling sense) is all about knowledge description whereas semantic models are about knowledge discovery – where a predetermined data structure might not be enforced. Given the AAA slogan applicable to the Semantic Web (Anyone can say Anything about Any topic), our ontologies can be used to piece together disparate pieces of information from many sources across the internet or intranet and be able to have new knowledge be inferred from these bits and pieces.
Aligning The ECDM And Enterprise Ontologies
Given the classes (entities) and relationships which already exist in the ECDM, can the ECDM be converted into a semantic model? The short answer is yes. Ontology Definition Metamodel (ODM) is a standard by the Object Management Group (OMG) for “model driven ontology development” using UML as a way to visually express and develop ontologies. Adoption and usage of ontologies (whether internally or externally developed) in the enterprise should be given consideration by a Data Governance Board, or at least by a Data Stewardship Coordinating Committee. However, if you want to see a lot of eyes glazing over – give an RDF/OWL document to these business people to review. With ODM however, ontologies can be expressed as UML class diagrams which can make it easier to understand. Of course, not all Data Stewards are going to be able to interpret a UML model, but at least it makes it easier to visualize the ontology.
Returning from this tangent, ODM includes RDF, OWL, ER and other metamodels, plus mappings which enable an ER model to be translated into OWL and vice-versa. Figure 1 below is the ODM ER metamodel and Figure 2 below is the ODM ER to OWL mapping:
|Figure 1 – ODM ER Metamodel|
|Figure 2 – ODM ER Model to OWL Model mapping|
Unfortunately, there are a limited number of tools which currently support ODM – but it can still serve as a valuable resource for translating your ECDM into RDF/OWL ontology. I highly recommend reviewing the RDF and OWL metamodels in order to better understand these technologies. For example, from the OWL Class metamodel below we can understand that an OWL Class is a subtype of RDFS Class.
|Figure 3 – ODM OWL Class metamodel|
Other Aspects Of Information Architecture And Ontologies
While the ECDM has the most logical connection with enterprise ontologies, other aspects of Information Architecture can be incorporated into enterprise ontologies as well. For example, incorporating an Information Value Chain (IVC) Analysis into your enterprise ontology will provide for a much richer ontology as the IVC can serve as a bridge between the information domain and business domains such as functional area, business process, or organizational unit.
Incorporating various information architectures (models) such as DW/BI, MDM, and Meta Data and tying objects from these with implementation artifacts can enable rich knowledge discovery about your EIM program performance.
Of course, translating Information Architecture components into enterprise ontologies must be supported by clear business drivers. Unless your organization is very experienced with RDF/OWL, you will want to develop your enterprise ontologies in very small (relatively) iterations. I say relatively as even a modest number of asserted triples in your ontology can result in a large number of inferred triples, and you will want to spend adequate time in testing to ensure that the inferred triples created by your inferencing engine make sense.
Semantic models or ontologies typically fall under Information Architecture in EIM programs – these should not just be something that web developers put together. After all, the Semantic Web is a “web of data”. The ECDM is most closely related to ontologies as it identifies key business concepts and the relationships between these. The ECDM should be converted into RDF/OWL as the ECDM should mirror the business and should be reviewed and approved by the business. Ontologies in RDF/OWL format enable computers to understand our data resources and make new facts by inferencing, but are more difficult for humans to read. Incorporating other aspects of Information Architecture into enterprise ontologies can enrich these and enable greater knowledge discovery about the business and our EIM programs.
About the Author
Pete Stiglich is a Principal Consultant/Trainer with EWSolutions with nearly 25 years of IT experience in the fields of data modeling, Data Warehousing (DW), Business Intelligence (BI), Meta Data management, Enterprise Architecture (EA), Data Governance, Data Integration, Customer Relationship Management (CRM), Customer Data Integration (CDI), Master Data Management (MDM), Database Design and Administration, Data Quality, and Project Management. Pete has taught courses on Managed Meta Data Environments (MME), Data Modeling, Dimensional Data Modeling, Conceptual Data Modeling, ER/Studio, and SQL. Pete has presented at the 2008 MIT Information Quality conference, 2007 and 2008 Marco Masters Series, at DAMA at the international and local level, and at the 2007 IADQ Conference. Pete’s articles on Information Architecture have been published in Real World Decision Support, DMForum, InfoAdvisors, and the Information and Data Quality Newsletter. Pete is a listed expert for SearchDataManagement on the topics of data modeling and data warehousing. Pete is an industry thought leader in the field of Conceptual Data Modeling. He can be reached at firstname.lastname@example.org