Big Data: a little introduction |
I have been collecting information about Big data for some time and introducing some notions about the subject in some of my courses, but today while I was preparing a conference I realized that it was an issue that we had not yet mentioned on the page, despite being One of the most current trends in the industry.
By Big Data we mean exactly what its name implies: the treatment and analysis of huge data repositories, so disproportionately large that it is impossible to treat them with conventional database and analytical tools. The trend is in an environment that does not sound strange at all: the proliferation of web pages, image and video applications, social networks, mobile devices, apps, sensors, internet of things, etc. Capable of generating, according to IBM, more than 2.5 quintillones of bytes per day, to the point that 90% of the world's data has been created during the last two years. We speak of an absolutely relevant environment for many aspects, from the analysis of natural phenomena such as climate or seismographic data, to environments such as health, safety or, of course, the business environment. And it is precisely in this area where companies develop their activity where an interest is emerging that makes Big Data something like "the next buzzword", the word that we will surely hear coming from everywhere: technology vendors, tools, Consultants, etc. At a time when most managers have never sat in front of a simple Google Analytics page and are powerfully surprised when they see what they are capable of doing, there comes a panorama of tools designed to make things immensely larger and more complex make sense. Be afraid, very afraid.
What exactly is behind the buzzword? Basically, the evidence that the analysis tools do not arrive to be able to convert the generated data into useful information for the business management. If your company does not have a problem with data analytics, it is simply because it is not where it needs to be or does not know how to get information from the environment: as soon as we join traditional operations and transactions issues such as an increasingly intense bi-directional interaction With clients and the web analytics movement generated by social networks of all kinds, we find a scenario in which not assuming a major disadvantage with respect to those who are. It is simply that operating in the environment with the greatest data generation capacity in history entails the adaptation of tools and processes. Unstructured, unconventional databases that can reach petabytes, exabytes, or zetabytes, and require specific treatments for their storage, processing, or viewing needs.
Big data was, for example, the star in the latest Oracle OpenWorld: the position adopted is to offer huge machines with huge capabilities, multi-parallel processing, unlimited visual analysis, heterogeneous data processing, etc. Developments such as Exadata and acquisitions such as Endeca support an offer based on thinking big, which some have not hesitated to discuss: in the face of this approach, the reality is that some of the most focused companies such as Google, Yahoo! Or Facebook or almost all startups do not use Oracle tools and opt instead for an approach based on distributed, cloud and open source. Open source is Hadoop, a hugely popular framework in this field that allows applications to work with huge data repositories and thousands of nodes, originally created by Doug Cutting (who gave him the same name as his son's toy elephant ) And inspired by Google tools like MapReduce or Google File System, or NoSQL, non-relational database systems needed to host and process the enormous complexity of data of all kinds generated, and in many cases do not follow the logic of guarantees ACID (atomicity, consistency, isolation, durability) characteristic of conventional databases.
In the future: a growing adoption landscape, and many, many questions. Implications for users and their privacy, or companies and the reliability or real potential of the results obtained: as MIT Technology Review says, great responsibilities. For the moment, one thing is for sure in Big Data: prepare your ears to hear the term.
0 comentarios: