Monday, October 29, 2012

The Trouble with "Big Data"

How can there be trouble with one of the two biggest trends in IT you ask? Well, perhaps from a hype and marketing perspective there isn't any trouble yet. But from an expectations perspective, the trouble began nearly two years ago and has only gotten worse. And it starts in a name, it sounds simple, but is it?
Can you define "Big Data" and if so would your define match an industry standard expectation?

What is Big Data, anyway? Well, this is where the trouble begins - it means something different to a fairly diverse set of interests. For some, Big Data implies use of a parallel processing paradigm (which BTW has been used for more than a decade in Data Warehousing as well), use of commodity hardware and a clever algorithm created by Google about a decade ago to help index the web. Much of this is now combined with the use of "Hadoop," although use of Hadoop doesn't always imply that companies will follow the same hardware path as some of the giants who pioneered the paradigm. In fact, more of than not the real market for commodity hardware is moving to the Cloud. But wait, aren't we talking about Big Data? What's the relationship between Big Data and the other biggest trend in IT today, Cloud Computing? Are they really two separate trends or variations of the same trend? The answer to that question is - who knows.

There are some other problems with Big Data; let's review them:
  1. It seems to encompass a wide range of emerging technologies, such as storage, parallel processing, cloud technology, high performance discovery, new DBMS paradigms and more. 
  2. The Use Cases for Big Data tend to blur into the same set of Use Cases for most enterprise data related functions. This wasn't always the case - the original Google exploitation its technology was fairly narrow and unique to its business model / mission. It sill isn't entirely clear how smaller enterprises will harness the newer Big Data capabilities - that clarity is vital - especially in regards how to integrate within the existing ecosystem.
  3. There is no universally accepted definition for what it represents, but just as important, there is no recommended solution approach or set of approaches or even a recommended solution methodology. The largest IT trade group dedicated to Data Management, DAMA, has barely scratched the surface as to how integrate Big Data within the larger set of Data Management activities. Or should we assume that Big Data will somehow eventually swallow all of rest of what we were viewing as Data Management?
Let's step back in time for moment. Back in early 1999, I attended a technology conference in Washington D.C. that was convened to assess emerging technology trends for the next decade and beyond. One of the most interesting discussions that occurred during their main panel revolved around a question on how much bandwidth or data would be utilized in coming years. Recall, that in 1999, having a Terabyte of memory in a DBMS was  big deal and few if anyone had DSL like speeds for Internet access. The majority of the panel did not see any explosive growth happening in the foreseeable future. I disagreed - I countered that the demand had already been pent up and that a torrent of Digital content and communication would explode as soon as the hardware prices and bandwidth allowed. It's this exponential growth in data that the proponents of Big Data expound upon a lot these days (supposedly 2/3 thirds of all data ever created was generated in the last two years).

Well, guess what - that exponential data growth was merely a drop in the bucket to what's coming. And if that is truly the case, then we have to ask ourselves what this really means. Sure, we needed more affordable hardware and more affordable software to handle volume; we needed better algorithms and architecture to handle performance. The thing is though, we still haven't defined what this all means in relation to how we manage the enterprise. We've still got and we continue to support all sorts of legacy architectures and approaches - and now we've been handed a whole new set of challenges. But those challenges aren't just focused on bigger, cheaper, faster - we also have to deal with smarter, integrated and targeted. And we also have to become a bit more visionary when it comes to imaging what we can and should do with emerging worlds of data and that may take us right back to another set of technologies that have been emerging over the past decade right alongside Big Data - Semantic Technology. 
So, let's ask ourselves again:
  1. Is Big Data about handling larger volumes of data faster?
  2. Is Big Data about making Data Management more efficient / less expensive?
  3. Is Big Data about harnessing the Cloud and Storage?
  4. Is Big Data about expanding Data Discovery to cover ever-increasing sets of data?
  5. Is it all of the above and / or something else?
Perhaps it's time for so more or better definition. Without better definition it will likely be to understand what the ROI is that you're shooting for or whether your organization is actually achieving it.

Copyright 2012, Semantech Inc. All Rights Reserved


Post a Comment