What is big data, how is big data used and why is it essential for digital transformation and today’s data-driven business where actionable data and analytics matter most amidst rapidly growing volumes of mainly unstructured data across ample use cases, business processes, business functions and industries?
Big Data in a way just means “all data” (in the context of your organization and its ecosystem). And there is quite some data nowadays. The sheer volume of data we can tap into is dazzling and, looking at the growth rates of the digital data universe, it just makes you dizzy.
With the Internet of Things (IoT) and digital transformation having an impact across all verticals it goes even faster. More importantly: data has become a business asset beyond belief. So, better treat it well.
Originally, Big Data mainly was used as a term to refer to the size and complexity of data sets, as well as to the different forms of processing, analyzing and so forth that were needed to deal with those larger and more complex data sets and unlock their value.
Big Data is a term used to describe the large amount of data in the networked, digitized, sensor-laden, information-driven world (NIST)
Most people used to look at the pure volume and variety perspective: more data, more types of data, more sources of data and more diverse forms of data. The term today is also de facto used to refer to data analytics, data visualization, etc.
But data as such is meaningless, as is volume. What really matters is meaning, actionable data, actionable information, actionable intelligence, a goal and…the action to get there and move from data to decisions and…actions, thanks to Big Data analytics (BDA) and, how else could it be, artificial intelligence.
From volume to more volume but mainly to value
It’s easy to see why we are fascinated with volume and variety if you realize how much data there really is (the numbers change all the time, it truly is exponential) and in how many ways, formats and shapes it comes, from a variety of sources.
Consider the data on the Web, transaction logs, social data and the data which gets extracted from gazillions of digitized documents. Consider several other types of unstructured data such as email and text messages, data generated across numerous applications (ERP, CRM, supply chain management systems, anything in the broadest scope of suppliers and business process systems, vertical applications such as building management systems, etc.), geolocation data and, increasingly, data from sensors and other data-generating devices and components in the realm of IoT and mainly its industrial variant, Industrial IoT (and Industry 4.0, a very data-intensive framework).
Regardless of when you read this: if you think the volumes of data out there and in your organization’s ecosystem are about to slow down, think again. You can imagine how Big Data and the Internet of Things, along with artificial intelligence, which is needed to make sense of all that data, only have started to show a glimpse of their tremendous impact as, in reality, for most technologies and applications, whether it concerns digital twins, predictive maintenance or even IoT (and related technologies enabling some of these applications; think AR and VR) as such, it is still relatively early days for most.
The information opportunity of Big Data
So, the term has a technology and processing background in an increasingly digital and unstructured information age where ever larger data sets became available and ever more data sources were added, leading to a real data chaos.
However, just as information chaos is about information opportunity, Big Data chaos is also about opportunity and purpose. On top of that, the beauty of Big Data is that it doesn’t strictly follow the classic rules of data and information processes and even perfectly dumb data can lead to great results as Greg Satell explains on Forbes.
The mentioned increase of large and complex data sets also required a different approach in the ‘fast’ context of a real-time economy where rapid access to complex data and information matters more than ever. Just think about information-sensing devices that steer real-time actions, for instance. Or the increasing expectations of people in terms of fast and accurate information/feedback when seeking it for one or the other purposes. Indeed, customer experience optimization, customer service and so on are also key goals of many big data projects.
To big insights and big decisions
Amid all these evolutions, the definition of the term Big Data, really an umbrella term, has been evolving, moving away from its original definition in the sense of controlling data volume, velocity and variety, as described in this 2001 META Group / Gartner document (PDF opens).
The renewed attention for Big Data in recent years was caused by a combination of open source technologies to store and manipulate data and the increasing volume of data. Add to that the various other 3rd platform technologies, of which Big Data (in fact, Big Data Analytics or BDA) is part such as cloud computing, mobile and additional ‘accelerators’ such as IoT and it becomes clear why Big Data gained far more than just some renewed attention but led to a broadening Big Data ecosystem as depicted below.
Today, and certainly here, we look at the business, intelligence, decision and value/opportunity perspective. From volume to value (what data do we need to create which benefit) and from chaos to mining and meaning, putting the emphasis on data analytics, insights and action.
A key question in that – predominantly unstructured- data chaos is what are the right data we need to achieve one or more of possible actions. The creation of value from data is a holistic one, driven by desired outcomes.
Big data is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making (Gartner)
With the Internet of Things happening and the ongoing digitization in many areas of society, science and business, the collection, processing and analysis of data sets and the RIGHT data is a challenge and opportunity for many years to come.
As such Big Data is pretty meaningless or better: as mentioned it’s (used) as an umbrella term. And as is the case with most “trending” umbrella terms, there is quite some confusion. Analyzing data sets and turning data into intelligence and relevant action is key.
Big Data: a consequence and a catalyst
While Big Data is often misunderstood from a business perspective (again, it’s about using the ‘right data’ at the right time for the right reasons) and there are debates regarding the use of specific data by organizations, it’s clear that Big Data is a logical consequence of a digital age.
At the same time it’s a catalyst in several areas of digital business and society. Just one example: Big Data is one of the key drivers in information management evolutions and of course it plays a role in many digital transformation projects and opportunities.
The importance of Big Data and more importantly, the intelligence, analytics, interpretation, combination and value smart organizations derive from a ‘right data’ and ‘relevance’ perspective will be driving the ways organizations work and impact recruitment and skills priorities. The winners will understand the Value instead of just the technology and that requires data analysts but also executives and practitioners in many functions that need to acquire an analytical, let alone digital, mindset. A huge challenge, certainly in domains such as marketing and management.
The Vs of Big Data: Adding Value
On top of the traditional three big data ‘V’s’ IBM decided to add a fourth one as you can see in the illustration above.
Why not? In the end value is what we seek. And, sure, there is also value in data and information. It’s perhaps not that obvious as volume and so forth. Others added even more ‘V’s’. So you may see different variations on the same theme, depending on the emphasis of whomever added another V.
Volume strictly refers to the size of the dataset (with extensive datasets as one of the – original – characteristics). However, you’ll often notice that it is used to the mentioned growth of data volumes in a sense of all the data that’s being created, replicated, etc (also see below: datasphere). The sheer volume of data and information that gets created whereby we mainly talk infrastructure, processing and management of big data, be it in a selective way.
Velocity refers to the rate of data flow. Velocity is about where analysis, action and also fast capture, processing and understanding happen and where we also look at the speed and mechanisms at which large amounts of data can be processed for increasingly near-time or real-time outcomes, often leading to the need of fast data.
On top of the data produced in a broad digital context, regardless of business function, societal area or systems, there is a huge increase in data created on more specific levels. Variety is about the many types of data, being structured, unstructured and everything in between (semi-structured).
Veracity has everything to do with accuracy which from a decision and intelligence viewpoint becomes certainty and the degree in which we can trust upon the data to do what we need/want to do. Indeed about good old GIGO (garbage in, garbage out). Or as NIST puts it: Veracity refers to the completeness and accuracy of the data and relates to the vernacular “garbage-in, garbage-out” description for data quality issues in existence for a long time.
As said we add value to that as it’s about the goal, the outcome, the prioritization and the overall value and relevance created in Big Data applications, whereby the value lies in the eye of the beholder and the stakeholder and never or rarely in the volume dimension. Per NIST, value refers to the inherent wealth, economic and social, embedded in any dataset. As long as you don’t call it the new oil.
Moving to high-value data and use cases
As mentioned a few times, organizations have been focusing (far too) long on the volume dimension of ever more – big – data. This isn’t too much of a surprise of course.
Volumes were and are staggering and getting all that data into data lakes hasn’t been easy and still isn’t (more about data lakes below, for now see it as an environment where lots of data are gathered and can be analyzed). At a certain point in time we even started talking about data swamps instead of data lakes. You can imagine what that means: plenty of data coming in from plenty of (ever more) sources and systems, leading to muddy waters (not the artist).
Having lots of data is one thing, having high-quality data is another and leveraging high-value data for high-value goals (what comes out of the water so to speak) is again another ballgame.
Big data is pouring in from across the extended enterprise, the Internet, and third-party data sources. The staggering volume and diversity of the information mandates the use of frameworks for big data processing (Qubole)
Fortunately, organizations started leveraging Big Data in smarter and more meaningful ways. Although data lakes continue to grow (to be sure, do note that Big Data and data science isn’t just about lakes, data warehouses and so on matter too) and there is a shift in Big Data processing towards cloud and high-value data use cases.
This is happening in many areas. According to Qubole’s 2018 Big Data Trends and Challenges Report Big Data is being used across a wide and growing spectrum of departments and functions and business processes receiving most value from big data (in descending order of importance based upon the percentage of respondents in the survey for the report) include customer service, IT planning, sales, finance, resource planning, IT issue response, marketing, HR and workplace, and supply chain.
In other words: pretty much all business processes. As mentioned in an article on some takeaways from the report, the shift to the cloud leads to an expansion of machine learning programs (machine learning or ML is a field of artificial intelligence) in which enhancing cybersecurity, customer experience optimization and predictive maintenance, a top Industry 4.0 use case, stick out.
More departments, more functions, more use cases, more goals and hopefully/especially more focus on creating value and smart actions and decisions: in the end it’s what Big Data (analytics) and, let’s face it, most digital transformation projects and enabling technologies such as artificial intelligence, IoT and so on are all about.
The global datasphere
A comprehensive overview of the growth of the global datasphere is offered each year by research firm IDC.
In Data Age 2025, the company forecasts that by 2025 the global datasphere will have grown to 175 zettabytes of data created, captured, replicated etc. per year. Here the data generated by ever more IoT devices are included. They are expected to create over 90 zettabytes in 2025.
The continuous growth of the datasphere and big data has an important impact on how data gets analyzed whereby the edge (edge computing) plays an increasing role and public cloud becomes the core.
Where do organizations focus their Big Data efforts on?
Obviously analytics are key. However, which Big Data sources are used to analyze and derive insights?
In 2012, IBM and the Said Business School at the University of Oxford found that most Big Data projects at that time were focusing on the analysis of internal data to extract insights. Among the internal data sources the majority (88 percent) concerned analysis of transactional data, 73 percent log data and 57 percent emails.
Fewer businesses were busy looking at external big data, from outside their firewalls, which are mainly unstructured (as are most internal sources) and offer ample opportunities to gain insights too (e.g. sentiment analysis).
By now this picture probably has changed and of course it also depends in the goal and type of industry/application. With the network perimeters fading, the ongoing development of initiatives in areas such as the Internet of Things and increasing BDA maturity, we would like to see a detailed update indeed.
Big data used to mean data that a single machine was unable to handle. Now big data has become a buzzword to mean anything related to data analytics or visualization (Ryan Swanstrom)
More about Big Data and its evolutions and applications
Smart data: beyond the volume and towards the reality
With increasing volumes of mainly unstructured data comes a challenge of noise within the sheer volume aspect.
In order to achieve business outcomes and practical outcomes to improve business, serve customer betters, enhance marketing optimization or respond to any kind of business challenge that can be improved using data, we need smart data whereby the focus shifts from volume to value.
Fast data: speed and agility for responsiveness
In order to react and pro-act, speed is of the utmost importance.
However, how do you move from the – mainly unstructured – data avalanche that big data really is to the speed you need in a real-time economy? Fast data is one of the answers in times when customer-adaptiveness is key to maintain relevance.
Big data analytics: making smart decisions and predictions
As anyone who has ever worked with data, even before we started talking about big data, analytics are what matters.
Without analytics there is no action or outcome. While smart data are all about value, they go hand in hand with big data analytics. In fact, big data analytics, and more specifically predictive analytics, was the first technology to reach the plateau of productivity in Gartner’s Big Data hype cycle.
Unstructured data: adding meaning and value
The largest and fastest growing form of information in the Big Data landscape is what we call unstructured data or unstructured information. Coming from a variety of sources it adds to the vast and increasingly diverse data and information universe.
To turn the vast opportunities in unstructured data and information (ranging from text files and social data to the body text of an email), meaning and context needs to be derived. This is what cognitive computing enables: seeing patterns, extracting meaning and adding a “why” to the “how” of Big Data.
What makes (Big) data actionable?
Without intelligence, meaning and purpose data can’t be made actionable in the context of Big Data with ever more data/information sources, formats and types.
Moreover, there are several aspects of data which are needed in order to make it actionable at all. Whether it concerns Big Data or any other type of data, actionable data for starters is accurate: the data elements are correct, legible and valid. A second aspect is accessibility, which comes with several modalities as well. Other dimensions include liquidity, quality and organization.
Big data in customer service
Today’s customers expect good customer experience and data management plays a big role in it.
Making sense of data from a customer service and customer experience perspective requires an integrated and omni-channel approach whereby the sheer volume of information and data sources regarding customers, interactions and transactions, needs to be turned in sense for the customer who expects consistent and seamless experiences, among others from a service perspective.
Solving the Big Data challenge with artificial intelligence
Roland Simonis explains how artificial intelligence is used for Intelligent Document Recognition and the unstructured information and big data challenges.
Among the AI methods he covers are semantic understanding and statistical clustering, along with the application of the AI model to incoming information for classification, recognition, routing and, last but not least, the self-learning mechanism.
Data lakes for Big Data Analytics
Traditional methods of dealing with ever growing volumes and variety of data in the Big Data context didn’t do anymore. That’s where data lakes came in.
Data lakes are repositories where organizations strategically gather and store all the data they need to analyze in order to reach a specific goal. The nature and format of the data nor data source doesn’t matter in this regard: semi-structured, structured, unstructured, anything goes. The data lake is what organizations need for BDA in a mixed environment of data. However, there are challenges to this model as well where Hadoop is a well-known solutions player and data lakes as we know them are not a universal answer for all analytics needs.
Big Data: order from chaos
While, as mentioned, the predictions often have change by the time they are published, below is a rather nice infographic from the people at Visual Capitalist which, on top of data, also shows some cases of how it gets used in real life.
Check out the ‘creating order from chaos’ infographic below or see it on Visual Capitalist for a wider version.
Top image: Shutterstock – Copyright: Melpomene – All other images are the property of their respective mentioned owners.