Why data will be so valuable in the future

‘Data is the new oil’ is an attempt to suggest how valuable data is. It’s an analogy that works when we consider data isn’t valuable in its raw form, it needs to be processed to become something more valuable. Oil is valuable because so much can be done with it, just like data. But there the analogy breaks. Oil is a limited resource, and once processed it cannot be reversed. Data is constantly and continuously being generated, and using it does not use it up. The same data can be reused repeatedly in different ways.

In many ways data is unique and non-analogous. Data is both what is processed by computers and what they use to run the processing. Nothing else in the world uses fundamentally the same thing to do the processing and be processed. Data is an abstraction of the real world, which means all data is interrelated with every other piece of data. Nothing else in the world is as connectable.

Conceptually, all data is connected. Someone buys a desk.The measurements of the desk are on a spec sheet used by the manufacturer, along with the safety standards the desk meets and what materials it’s made of. The manufacturer holds data on the employees that built the desk, including how much they were paid, when, etc. That employee’s bank also has data on how much and when they were paid, but also what they spent their money on. The retailer they spent their money with knows some of what they did with their money, including that they bought a desk. But because all these data sets are independent no one person or system sees how they connect. 

As more data becomes more interconnected its usefulness increases exponentially. But in order to achieve the interconnectedness of datasets and make them useful, collection, storage and processing have to be decoupled from each other. When competitive advantage comes from the collection, storage and processing of smaller specific datasets that organisations use to draw insights only relevant to themselves, interconnection is prevented. If data collected from numerous sources is stored in a way that is equally available to everyone then competitive advantage can only come from processing. Those organisations that have the capabilities to utilise insights from the analysis of huge aggregated datasets win out, but require an intermediary to store the data and prevent monopolisation.

Data Trusts work like a bank but for data rather than money. Just as no organisation keeps the money it makes, nor would they keep the data they collect. Industry standards would standardise data collection and laws would make it illegal for organisations to store data. Data would be held by these Data Trusts and made available only to those that contribute their data. Anonymised data is accessible in real time for processing by organisations to draw insights that enable them to make decisions that take account of an unimaginably huge number of data points. 

Data Trusts would specialise in particular types of data; retail, health, manufacturing, etc., creating a further layer of anonymisation and aggregation for organisations wishing to correlate datasets. Interesting new commercial models would develop around the cost of accessing data to take account of increasing returns mechanisms and the decay of relevance.

Data has a half-life 

Every piece of data that exists about a person, their behaviour, and any prediction has a half-life. Relevance decays over time.

Name, for example, might have a half-life of about a fifty years or so. In a hundred years I’ll be dead and my name will only be half as relevant as it was fifty years before when I was alive. My search history could have a half-life of between two weeks and two minutes. If I’m trying to find my nearest petrol station, the chances are that the results are most relevant between now and when I put some fuel in my car, and ten minutes later the relevance has halved and is only useful for agregating with other behavioural data. A year’s worth of transactional data about what I’ve bought each of the past fifty two weeks might have a half-life of five years if my purchase habits stay the same, but as those habits are likely to change over time the data set would also change over time with certain points decaying faster than others if I stopped buying certain items.

Understanding how each piece of data has it’s own half-life and how the relevance decays over time based on that half-life can help companies provide better personalisation and could be a means of deciding when data should be deleted to conform with evolving data protection regulations.