You Probably Don't Have Big Data
Unless you're dealing with petabytes and exabytes, you probably just have "large data"
One of the hottest business buzzwords of 2012 was the phrase “Big Data”. I distinctly remember hearing the term for the first time in May or June of this year. Even though I didn’t fully understand how the phrase was being used, I knew pretty much instantly that it was one of those buzz phrases that was going to have a long life. I made a note to Google the term later that day and make sure I understood the new jargon.
What I learned from that short research session was that while big data is a growing phenomenon and a fascinating new business challenge, it is still relatively rare. My guess is that 99% of businesses will never have a “BIG” data set. Much of the buzz about big data is coming from folks who haven't had a solid foundational primer on the subject. Chances are, you probably don’t have big data.
What is Big Data?
Let me admit up front that once a term like this has entered the public lexicon, there probably can be no single authoritative definition. I think “Big Data” can be considered a fuzzy term that means different things to different people. I did a fresh round of research before writing this post, and I couldn’t find an originating instance of the term. No single individual seems to be credited with a first utterance. Looking at Google Trends, it appears the term started a slow creep towards popularity as early as 2011, but didn’t vault into the public consciousness until the middle of 2012. Query volume seems to have peaked by the end of 2012. Perhaps the term will mercifully fade as we enter 2013.
But I think most of us can live with settling for Wikipedia as a proxy for a common definition. Wikipedia now has a pretty comprehensive listing for Big Data. Here are a few key excerpts from the Wikipedia entry:
“ Big data usually includes data sets with sizes beyond the ability of commonly-used software tools to capture, curate, manage, and process the data within a tolerable elapsed time ”
Big Data, Wikipedia
“ As of 2012, limits on the size of data sets that are feasible to process in a reasonable amount of time were on the order of exabytes of data. ”
Big Data, Wikipedia
“ Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data… ”
Big Data , Wikipedia
As a refresher, here’s a quick reference chart providing a hierarchy of the multiples of bytes.
In my experience, most businesses are still doing business in the gigabyte range. If you add up every database, every document, every digital asset that your company possesses, the odds are low that you’ve crossed into the terabyte range, let alone petabytes or exabytes.
I think that one of the reasons “Big Data” caught on so quickly is that there general sense that data sets are proliferating rapidly. The expectations of managers and business leaders are also rising. We all want more insights and better intelligence from the data around us. While that feeling may not justify use of the term “big data”, it does represent an objective reality that most of us can see.
So, how should we describe these challenges? Somehow phrases like “large data” and “data proliferation” just don’t have the sex appeal of “big data”. This is the vocabulary I try to use in a personal quest for accuracy but I expect this to be a lonely quest.
Why Should You Care?
Here at MicroPact, data is at the center of everything we do. Essentially we provide applications that manage and track data sets, even big ones. Our application design process starts with a data model. Once a data model is established, we apply rules, process, workflow and add feature sets like document management and reporting. We believe this is the fastest and most effective means of building really useful business solutions.
As you seek new ways to manage and leverage the growing data sets in your organization, I simply urge you to proceed with caution when you encounter people who throw that “big data” phrase around casually. They may be trying to sell you way more processing power than you need, or they simply don’t know what they’re talking about at all.
- 2019 Medicaid Enterprise Systems Conference (MESC) August 19-22 Chicago, IL