Big data is just that – big, and a big opportunity, as well. It may be the first time that the IT industry has finally named a new concept crisply, aptly and lucidly. But, as with all things IT, it still needs explaining.
For instance, what was missing in data-warehousing and business intelligence (BI) that is now possible with Big Data?
Wikipedia, as always, puts it best: big data consists of data sets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing. This trend continues because of the benefits of working with larger and larger data sets allowing analysts to "spot business trends, prevent diseases, combat crime.”
Fifteen years ago, when we were trying to understand what BI can do for us, this was the most quoted example: “Arranging beer cans next to baby diaper stands during weekends in a super market helped increase the sales of both!”
How could that be? These two items seem so unrelated. BI tools helped the retailer discover that on weekends, fathers of young children stopped by to pick up diapers. These are the ones that picked up beer for the weekend as well.
Now, look at what Big Data can do in today's context. In his widely quoted book, “Outliers” the author Malcolm Gladwell, explains why crime in a certain state in the US went down at a point in time. It is so easy to point at policy, better policing, or increasing value systems for this change. Gladwell spots an unlikely event that happened a decade and a half earlier – abortion was made legal in that state at the time. He infers that in a majority of cases, children are born out of wedlock to parents in the lower stratum of society who can ill afford children in that age group. This would typically result in these children growing up to be young adults with fewer opportunities than life should offer them, thus forcing them into a life of crime.
Look at the data needed to process this thought: current demographics of not only young adults but those who were teenagers about 15 years prior to this study, crime rates in the state across all these years, not to mention a study of the police force across all these years and the impact of all policy changes in this period: a study of the latter two would be necessary to just eliminate them as causes.
This is where Big Data comes in. And, interestingly, this is where Cloud Computing comes in too.
Where are you going to process and store such data? How will you process this data? Whatever you are, a single researcher, an institution of intellectual academics or a private market research organisation, few can justify such investment into IT storage and computing at that scale. You have to keep in mind that single data set can contain a few dozen terabytes to many petabytes of data, this being the current range of big data sizes. Cloud Computing that allows you to buy compute power and storage space on a per-use basis, is the panacea.
Now, more than ever, other data points are becoming more relevant as compared to just analyzing available datasets such as sales or crime reports.
Earlier, we thought of data in terms of size. But now, of the Four Vs, Volume (or size), is only one. The rate at which data changes, or is updated, is its Velocity. The same kind of data could vary from time to time causing Variability. And of course, different data sets, giving rise to Variety.
A very common impact on analysis is social behaviour such as Facebook ‘likes', status message and tweets. These datasets are huge and the velocity at which the data is increasing makes it difficult to store and analyse. The whole idea of big data is to use all the possible dimensions for analysis.
And that gives us the answer to our first question: Big Data is big NOW, because of what Cloud Computing offers. A state's meteorology department could have gone into big data with its own investments and that could well have been 3 years ago rather than now. But a researcher with a small budget or a start-up aiming to offer analytic services need not sink all that money before it discovers it is failing. As always, the theme for cloud computing remains the same: If you are failing, it helps you fail fast. And if you succeed, it is because of the cloud that you succeed!