The diminishing value of a data set
Source: worth observing that this Dilbert was 1993 !!!!
Radar and sonar are incredible inventions as they allow us to perceive what cannot be actually seen with the naked eye. As technology has advance and big data analysis has emerged we have gone from a simple echo to high-quality resolution. However, the peak value for Radar is that it informs you something is there which requires low resolution and very little data. As Radar resolution has improved we can get direction and speed which requires a little more time. This new information definitely adds value to any required reactive decision. The identification of what the actual object is through increased resolution has an incremental value but not as much as knowing it is there and what direction at what speed but such information can lead to a better decision but suddenly there is an economics of cost compared to the incremental improvement in outcome. Knowing what type of bird by species or what plane by manufacturer, does add cost and size of data set but it does not add any value in terms of the decision requirement.
Today, current data scientists, enthusiasts and idealists are asking you to store vast quantities of data from your customer and eco-system in the hope that at some point the algorithms and AI technology will give you new rich and deep insights, such as the equivalent resolution of what rivet types are used in construction for the inbound missile for Radar. We have to question when even though technically possible, does the resolution stop adding value?
The more data you collect the better the decisions you will be able to make in the future is a lie that is driving boards to take on the cost for data and data risk that is beyond their scope of any calculatable return. “Collect all the data and store it as you don’t know what we might discover” is hope and choice, it is not a rational business decision. The hope is wrapped up in a power pitch that our competitive advantage will be the discovery and ability to perform magical acts of mind manipulation which will supercharge sales through the control of customers. Or is it fear that everyone else will have this and we will have no future, FOMO of the big data promise.
Investing in the future is a fantastic idea when collecting moon or martian dust and storing it until we improve our science tools and instruments but for a company right now this is a lie. We are collecting and storing data in the hope that an algorithm will be able to mine more value than the cost of collecting, storing and algorithm development. Right now there is little evidence from all the work done that using data to change behaviour works at scale. This is a real cost today for “hope,” which the CFO has not found an ethical way to put on a balance sheet.
We have been sold a vision the more data we have the better decisions and insights we will have. Any data scientist or statistician will confirm that resolution of a decision or insight will not substantially improve beyond a certain data set size. You are not keeping and storing data for better decision making and insight today, just in the hope for new insights tomorrow. But the COO, CIO, CDO or CTO who is empire-building and determine that size equates to power within your culture are dependent on fear, uncertainty and doubt and insufficiently skilled directors to be able to challenge this. The tech looks like magic and every consulting company sees this as a revenue stream and write reports to support this view.
As a sceptic of the big data story, this article nicely aligns with your affinity and belief. You see that your next action is the stop the ever-increasing data spend and focus on traditional business. Whilst difficult to understand data is the right answer to business problems. The takeaway should be to question: Where is the diminishing return for data for your organisation? How do you know the data is good? How do you know the analysis is good. How do you know if what you are doing is the right thing to do? How do you incentive the right actions?
Data is not the problem, too much data will not create the return we hope for based on the cost we have, but we have to remain focussed that data is essential. How do we make data work for most of the people most of the time and not a few people on the odd occasion.