Big data can mean many things to many people so before we dive into the analysis I thought it pertinent to provide definitions of big data and blockchain as they relate to this report.

According to Merriam Webster, the definition of  big data is :

An accumulation of data that is too large and complex for processing by traditional database management tools

According to Merriam Webster, the definition of  blockchain is :

An digital database containing information (such as records of financial transactions) that can be simultaneously used and shared within a large decentralized, publicly accessible network

 

How Big Is Big Data?

Big Data typically looks at three criteria, Volume, Velocity and Variance when assessing if a need requires a big data approach.

Volume: How much data is there (i.e. usually in terms of terabytes and petabytes).

Velocity: How quickly is it coming into a datacenter, arriving in a database and/or being accessed

Variance: What format(s) and data types are included

There is no hard and fast rules regarding when big data kicks in, it is more a case of assessing needs and coming up with suitable long term solutions which both enables clients data now whilst also allowing for future growth and diversity.

It’s Too Much Captain, the Engines Cannot Take It

Most people who have watched a Star Trek, Star Wars or space adventure movie of any sort will recognize the moment when a ship has been attacked and is operating on limited capacity. The chief engineer is franticly trying to summon enough power from the engines to escape or make a hyper jump.

But they are struggling, and it seems like all hope is lost. Then, at the last minute, through some hitherto unknown energy source or genius technical solution they find a way out  with seconds to spare.

What has this got to do with data I hear you thinking, well actually, more than you may think. Just as the engineer needs good, accurate data to get the ship out of danger so companies need their data to effectively drive their path forward and help them get to where they want to go.

How many companies can relate to a panic situation where you suddenly need data and cannot find it or where a key decision needs to be made but no-one can confirm the accuracy of the data. The consequences may be different but the scenario of needing to quickly get out of a bad situation is not so farfetched.

Read More: Is your Digital Identity safe and secure?

Big Data, Small World

Once upon a time companies focused on their own internal structured data for running and growing their business. However, now, with the advent and uptake of social media platforms and internet search engines many companies are also pulling unstructured data from outside their organization especially when trying to understand the big picture and optimize their business models.

The amount of information out in the public domain is vast but it can also be contradictory, duplicative, and even untrue so using that information needs a degree of diligence and verification.

Blockchain as an Enabler

And this is where Blockchain can help through providing an infrastructure where data can be stored with suitable controls applied to ensure it is verifiable, traceable and accessible to the right people at the right time. The client still needs to identify data sources and which data it needs from them but the consensus mechanism in Blockchain ensures that a process is in place for maintaining that data. Moreover, the ability to minimize duplication and establish one source of the truth within Blockchain can help stop ‘Big Data’ becoming ‘Even Bigger Data’.

Regarding unstructured or structured data, Blockchain can handle this mix well and the ability to layer in smart contracts for specific needs enables many different data formats and approaches to be built in.

Learn more about blockchain from here.

Good vs. Evil

Data is good, most people and companies can buy into that!

But, like many simple statements, when you dig a little deeper and start to ask some questions the holes begin to appear and we start to realize that within the data world, just like in life, there is good and evil.

To help clarify why this may be so, the following table peels back the data layers a little by assessing some key data characteristics.

Disclaimer: Before you all start questioning this table or debating what is in the wrong category or missing, I should clarify that this is all highly subjective and dependent upon the company, the data and the situation.

Data CharacteristicGoodEvilComments
AmountToo muchNot enough
Neither is ideal but at least with too much you have the data somewhere if you can optimize it

AccessRegulatedOpen to all
Most systems allow for both and most companies operate a hybrid model but security is more important than ever now
IntegrityVerified

No checks
Data must be accurate and trusted but this should not slow down the dataflow unnecessarily
SyntaxStructured
Unstructured
Unstructured data is harder to compile and analyze but can provide richer output and open up new communication and marketing channels
LocationCloud
In house
Most companies are moving to the cloud as it is more scalable and reduces in house cost, technology dependency and responsibility
VarietyDiverse insights
Can be hard to assimilate
The availability of data from many sources via many mediums enables richer analysis and focus
Refresh rateReal timeScheduled
The rate of change dictates the refresh needs but generally the quicker the better

From a Blockchain centric perspective it depends which of the data characteristics shown in the table are key. If the focus is data integrity via traceability, consensus, verification, transparency or single source of the truth, Blockchain really does have the edge over other database technologies.

But if the focus is on the amount of data and speed to load and access then the choice is less clear. Because Blockchain is still relatively in its infancy it is still learning to optimize itself in terms of scale and speed.

The actual database model also comes into play also, remember that Blockchain typically works best in a decentralized database model whereas a lot of SAAS systems and tools are designed for a centralized approach.

Many are predicting that Blockchain will come of age in the big data space as it matures.

More From Our Analyst Corner: Why Blockchain can be a game changer for Pharma?

The World is Changing and Data is Growing, Blockchain Can Help

As the worldwide global economy becomes ever more accessible and more companies partner with and acquire others to expand their client base the need for compatible data which can be easily integrated increases.

The evolution of virtual working and ever improving communication tools means location becomes less of an issue for many companies. From a data perspective though, this can increase the challenges especially around security and availability on a global basis. In addition, as regulations develop and mature, companies need to adjust their data strategies accordingly to ensure they stay compliant.

Big data just keeps getting bigger but as Blockchain keeps getting smarter the interception point between the two is narrowing and the opportunity to partner to mutual benefit is becoming clearer.