Understanding Data Warehouse Challenges ~ Technovation Talks

Friday, November 2, 2012

Understanding Data Warehouse Challenges

12:38 PM BI, Big Data, Business Intelligence, Data, Data Warehouse, DBMS, IT, Semantech Inc., Stephen Lahanas, Technovation Talks, Understanding Data Warehouse Challenges No comments

Last week we talked about some of the issues arising from the emerging field of Big Data, today we're going to explore a more mature data solution and point out some of the challenges that it's faced - and some of those challenges were instrumental in pushing the IT industry towards Big Data.

The Data Warehouse concept is built atop the notion that all data related to the enterprise can be captured and centrally or holistically managed. This is a powerful idea, yet there is more than one way to achieve that goal. The traditional view of the EDW attacked the problem from a very DBMS-centric perspective. This is primarily why EDW projects become so expensive, difficult and ultimately hard to adopt. The typical EDW approach attempted to gather all of the data related to the enterprise and place it into one massive repository structure. Whether this approach was attempted in chunks or as a “Big Bang” assault made little difference in the long the run as the byproducts of the practice were the same; those byproducts included:

A more bureaucratic management approach to the data layer in general.
An added degree of separation between the data owners and the data developers.
A certain level of inflexibility in regards to how data was updated, corrected or otherwise transformed.
An added degree of separation between database developers and data exploitation developers.
An added degree of separation between database developers and application developers.
An inability to quickly respond to major changes in the business.
Dependence upon a sub-set of industry experts and equipment that is more expensive than the industry norm.
A higher cost associated with scalability in general.

It is worth examining some of the core concepts associated with Data Warehousing a little bit closer to help understand why these outcomes tend to result from the traditional EDW approach.

DBMS Focus – At the time when EDWs became popular, other areas of data architecture were only just beginning to blossom. Today’s Business Intelligence platforms represent much more than mere reporting engines. Metadata management was only just beginning to be understood in the mid-1990’s and focus on Semantic technologies was virtually non-existent. The world according to DBMS in 1995 had a relational management system in the middle with ETL feeding into and reports coming out. This might be thought of as a three layer, stove-piped database systems view of the data architecture.

The Enterprise Single Instance – While consolidating like capabilities into marts or stores or some other ‘functional single instance’ approach has achieved quite a bit of success over the past two decades, attempting to manage all data in one structure has proven much more difficult. This is why the notion of Massively Parallel Processing (MPP) was needed to make it viable back in the 1990s. MPP in the context of proprietary hardware was expensive though and perhaps failed to recognize the power of networked processors on inexpensive hardware (i.e the Google scalability model). The other key consideration here was the added steps that were needed in order to make such a system perform within reasonable parameters. So, the single instance enterprise faced and still faces major hurdles in terms of costs, manageability and performance.

Data is no longer confined within the context of single systems

EDW Fallacies
If we were to directly challenge the core EDW assumptions and illustrate the fallacies associated with the philosophy, our list would resemble the following:

The Business will remain static over a relatively long period of time.
The Enterprise will remain static over a relatively long period of time.
That source data and data exploitation should not be managed synergistically, in other words that Decision Support or Business Intelligence solutions built on top of EDW source data should be viewed as separate, albeit related efforts.
That the data layer and the application layer can or should be viewed or designed separately.
That computer Hardware would not catch up to the processing load – i.e. that the data layer would always require specialized Massively Parallel Processing (MPP) in order to manage very large quantities of data. Furthermore, this assumption also implied the data would remain in a single instance data source. So instead of parallel processors deployed in specialized equipment, Big Data now uses the cheapest processes / equipment possible in a commodity approach with data spread out in sets of distributed file systems. In fact this has been take even further as this week the US Government announced the completion of the world's most powerful supercomputer. It achieved all of its latest gains by using commodity hardware (in this case, GPUs, game video processors widely available on the market).
That network architecture, data architecture, application / SOA architecture and enterprise architecture are separate.
That the Internet (Cloud) would not represent a viable mechanism for connecting to distributed data sources.
That unstructured data was not as valid as structured data (mainly because no mechanism existed to incorporate into the traditional database management approaches).
That most major transformations need to occur before data is placed into the primary storage / management entity (i.e. DBMS, warehouse).
That there is a single version of the truth, period. This is perhaps the biggest fallacy behind all data warehouse, MDM and Governance solutions. Data can be managed, but it is dynamic and all always will be. Viewing data as incontrovertible, orthodox truth immediately eliminates much of the value that data otherwise provides. Situations change, and every stakeholder views the whole from their unique perspectives. Yet, there can still be order in a relativistic environment (much as there is in the real world). This doesn't mean data cannot be standardized or managed - it merely takes into consideration the inevitable evolution that will occur.

0 comments:

Post a Comment

Technovation Quotes

It's not that I'm so smart, it's just that I stay with problems longer.

Albert Einstein

The machine does not isolate man from the great problems of nature but plunges him more deeply into them.

Antoine de Saint-Exupery

I saw the angel in the marble and carved until I set him free. Michelangelo.

If you only have a hammer, you tend to see every problem as a nail.
Abraham Maslow

The best way to predict the future is to invent it.
Alan Kay

Choose a job you love, and you will never have to work a day in your life.
Confucius

I am convinced all of humanity is born with more gifts than we know. Most are born geniuses and just get de-geniused rapidly.
Buckminster Fuller

Programming is like sex. One mistake and you have to support it for the rest of your life.
Michael Sinz

One man's constant is another man's variable.
Alan J. Perlis

If at first, the idea is not absurd, then there is no hope for it.
Albert Einstein

There are truths on this side of the Pyrenees, which are falsehoods on the other.
Blaise Pascal

A lot of people in our industry haven’t had very diverse experiences. So they don’t have enough dots to connect, and they end up with very linear solutions without a broad perspective on the problem. The broader one’s understanding of the human experience, the better design we will have.
Steve Jobs

Experience is not what happens to you; it's what you do with what happens to you.
Aldous Huxley

Knowledge in the form of an informational commodity indispensable to productive power is already, and will continue to be, a major --perhaps the major --stake in the worldwide competition for power. It is conceivable that the nation-states will one day fight for control of information, just as they battled in the past for control over territory, and afterwards for control over access to and exploitation of raw materials and cheap labor.
Jean Francois Lyotard

While all other sciences have advanced, that of government is at a standstill - little better understood, little better practiced now than three or four thousand years ago.
John Adams

Our Mission

Categories

Friday, November 2, 2012