What is Big Data?

Big Data is not just about quantifying the GBs or TBs of data being processed, whether data is big is relative to your needs and challenges. Big Data is about figuring out how to listen to all of your data (whether you currently are or not) and asking, “What’s the story?”

I was cleaning out my inbox last week, and was struck by the sheer volume of email marketing I receive containing the term ‘Big Data’. While volume is one of the proverbial 3 V’s of Big Data (more on this in a minute), I don’t believe they were referring specifically to the Big Data-related contents of my inbox. That said, email is certainly a target for Big Data analysis, but I’m getting ahead of myself.

In the interest of extracting some value out of the otherwise mundane chore of cleaning out my inbox, I’ll use this as a chance to share the key points I’ve gleaned from this pile of email marketing, tempered with my personal experiences, and infused with the perspective of other industry experts. But the question is, what is big data?

So What is Big Data?

You’ll see many definitions of Big Data, and they are each correct in their own way. Often times though, I find these definitions lack the comprehensiveness I want in a definition. Let’s build up one modularity by beginning with the proverbial 3 V’s of Big Data.

Volume

If it’s more data than you can handle, consider it “Big”.

Obviously we’re talking about the amount of data. The volume of data in existence continues to grow, and this growth is accelerating as data are being generated by more sources, the increased level of instrumentation of previously non-measured devices, and the unrelenting flood of tweets, likes, posts, reviews, pins, tags, clicks, check-ins, downloads, etc.

The precise amount of data as measured in terabytes or petabytes is less important to the definition than is the notion of the volume of data with potential utility being much larger than your current capabilities.

big data DNA strand
A strand of DNA! Talk about Big Data.

Velocity

If your data is faster than you can currently handle, consider it “Big”.

Along with the growth in the number of data-creating sources, there have been significant increases in the speed at which it is being created. Much of the data created today (most of which could provide utility to the business if harnessed properly) is via improvements in instrumentation, e.g., website clickstream data, mobile GPS data, machine sensor data, RFID tag data, network device logs, etc.

IBM used a term in one of their eBooks that I happen to like: “data exhaust”. The authors describe it by saying these types of data are, “generated in huge amounts (often terabytes per day) but typically isn’t tapped for business insight.”

The reason this analogy resonated with me is because of the car I happened to drive in high school (1985 Dodge Omni GLH-T, watch here, starting at 0:56). This odd-looking car’s rapid acceleration was accomplished via a turbocharger. For those unfamiliar, a turbocharger harnesses the engine’s exhaust gases to spin a turbine, which in turn forces more air into the combustion chamber; more air means more fuel can be consumed, so each stroke of the piston generates more power.

Bringing this back to the data world, if you are able to utilize your “data exhaust” through Big Data techniques (consider this the turbocharger), it is possible to accelerate your business. As with Volume, consider this term relative to your current capabilities.

Variety

In relative terms, if the data are broader than you can handle, consider it “Big”.

The last V speaks to the growing diversity in the types and structure (or lack of structure) of data available today.

We’re no longer just talking about data stored in a data warehouse, but there is opportunity trapped within our growing volumes of semi-structured (like JSON or XML) and unstructured data (like email, other text content, video, sound, images, etc.). Big Data techniques can be used to find the needles (insights) in the haystack (data) by combining data of various structures to yield the insights that will drive the best next action for the business.

Other V’s

While the original incarnation only included the 3 V’s of Volume, Velocity and Variety, many folks have infused their own perspective into the definition by adding other V’s to the mix:

  • Veracity: IBM uses this fourth V to capture the degree to which data can be trusted.
  • Variability: Forrester offered up this V to reference the variability in the meaning of the data.
  • Visibility: This is referring to the core business fundamentals that should remain the focus of Big Data initiatives.
  • Verification: This term refers to the manner in which Veracity is delivered; in other words, verifying the data will yield trustworthy data.
  • Value: Tibco, Oracle, and others talk about the fourth V as the Value of the output delivered using Big Data.

It’s Relative…

As I think about the definition of Big Data, it is less about being precise in the volume of data being handled, or the speed in which data are being brought into the Enterprise, or the breadth of information types available for consumption.

It is an appreciation of all of the elements I listed above and a host of others (Gartner officially now references 12 dimensions, grouped into Quantification, Qualification and Management & Control of Data), and then assessing these in the context of your current reality.

Said another way, Big Data is thinking about all of the potential sources of a business’s data and the potential utility they could provide relative to where you are today. I’d call this your Big Data “Viewpoint”. (Sorry, couldn’t resist the opportunity for additional alliteration.)

So What’s the Story?

Summing it up, whether it is stationary in a data warehouse or in motion in the form of a clickstream, your data can tell you a story. Big Data is about figuring out how to listen to all of your data (whether you currently are or not) and asking, “What’s the story?”

In future Big Data blog series posts, my colleagues and I will delve into many of the potential uses of Big Data. We will try to dispel some common myths, and we will offer our perspective on how to get your Big Data initiatives underway.

Now, back to my inbox.

Keep Reading: What is Advanced Analytics?

Looking for more on Data and Analytics?

Explore more insights and expertise at smartbridge.com/data

There’s more to explore at Smartbridge.com!

Sign up to be notified when we publish articles, news, videos and more!