Sunday, November 10, 2013

Understanding Data Architecture

Someone asked me what at first sounded like a very straightforward question earlier this week; "what is Data Architecture" - or more precisely, what does it mean to you. Usually, I'm not usually at a loss for words when it comes to expounding upon IT Architecture related topics - but it occurred to me at that moment that my previous understanding of what Data Architecture really represents is or has been a little flawed or perhaps just outdated. So I gave a somewhat convoluted and circumspect answer.

Where does Architecture fit within this picture?

The nature of what's occurring in the Data domain within IT is itself changing - very quickly and somewhat radically. The rise of Big Data and proliferation of User Driven discovery tools represents quite a departure from the previous more deterministic view of how data ought to be organized, processed and harvested. So how does all of this effect Data Architecture as a practice within IT (or more specifically within IT Architecture)?

But before we dive into the implications of the current revolution and its subsequent democratizing of data, we need to step back and look again the more traditional definitions as to what Data Architecture represents. I'll start with a high level summary view:

Traditional Data Architecture can be divided into two main focus areas; 1 - the structure of the data itself and 2 - the systems view of whatever components are utilized to exploit the data contained within the systems. Data in itself is the semantic representation or shorthand of the processes, functions or activities that an organization is involved with. Data has traditionally been subdivided (at least for the past several decades) into two categories; transactional and knowledge-based or analytic (OLTP vs. OLAP). 
Now we'll move to a traditional summary definition of Data Architecture practice:

Data Architecture is the practice of managing both the design of data as well as of the systems which house or exploit that data. As such, this practice area revolves around management of data models and architecture models. Unfortunately, the application of Governance within this practice is sporadic and when it does occur is often split into two views: governance of the data (models) and governance of systems (patterns and configurations). 
So, that seems to be fairly comprehensive; but is it? Where does Business Intelligence fit in - is it part of the data management or system management - is it purely knowledge focused or does it also include transactional data? For that matter, do Data Warehouses only concern themselves with analytic data or can they be used to pass through transactional data to other consumers? And isn't Big Data both transactional and analytic in nature? And BTW- how do you model Big Data solutions either from a systems or data modeling standpoint? Now - we start to begin seeing how things can get confusing.

We also need to take into consideration that there has been an attempt made to standardize some of this from an industry perspective - it's referred to as the Data Management Book of Practice or DMBOK. I think in some ways it's been successful in attempting to lay out an industry taxonomy (much like ITIL did) but not as successful in linking that back into the practice of Data Architecture. The following diagram represents an attempt to map the two together...

There isn't a one to mapping between DMBOK and data architecture practice, but it's close
One of the areas that the DMBOK has fallen short is Big Data; my guess is that they will need to rethink their framework once again relatively soon to accommodate what's happening in the real world. In the diagram above, we have a somewhat idealized view in that we've targeted a unified governance approach for both data modeling and data systems.

Let's take a moment and discuss the challenges presented by the advent of new Big Data and BI technology. We'll start with BI - let's say your organization is using Oracle's BI suite - Oracle Business Intelligence Enterprise Edition (OBIEE). Within OBIEE you have a more or less semantic / metadata management tool called Common Enterprise Information Model (CEIM). It produces a file (or files) that maps out the business functionality of all the reports or dashboards associated with the solution. Where does that fit from an architecture standpoint? It has a modeling like interface but it isn't a 3rd normal form model or even a dimensional model. It represents a proprietary Oracle approach (both as an interface and modeling approach). It allows you to track dimensions, data hierarchies and data structures - so it is a viable architecture management tool for BI (at least for OBIEE instantiations). But some traditional Data Architecture groups would not view this as something the architects would manage - it might handed off to OBIEE administrators. This situation is not unique to Oracle of course, it applies to IBM / Cognos and other BI tools as well and there's a whole new class of tools that are completely driven by end users (rather than structured in advance from an IT group).

Now let's look at Big Data. Many of the Big Data tools require command line interface management and programming in order to create or change core data structures. There is no standard modeling approach for Big Data as it encompasses at least 5 different major approaches (as different say as 3NF is from Dimensional). How does an architecture group manage this? Right now, in most cases it's not managed as data architecture but more as data systems architecture. The problem here is obvious; just as organizations have finally gained some insight into the data they own or manage - a giant new elephant as entered the room. How is that new capability going to impact the rest of the enterprise - how can it be managed effectively?

Back to the original question - what is Data Architecture. I'd like to suggest that the practice of Data Architecture is more than the sum of its traditional activities. Data Architecture is the practice of understanding, managing and properly exploiting data in the context of problems any given organization has to solve. It is not limited by prior classifications or practice but has a consistent mandate to be able to represent and hopefully govern in some fashion data as an asset (internally or shared collaboratively). Data Architecture as we know is going to change quite a bit in the next two years and that's a very good thing.

Copyright 2013, Stephen Lahanas



Post a Comment