Does data have a purpose?
We are continually moving towards better data-led decisions, however, without understanding “What is the purpose of data / Does data have a purpose” on which we are basing decisions and judgements, it is hard to understand if our north star (a good decision) is a good one. Why am I interested in this, as I am focusing on how we do governance and oversight better in a data-led world.
I wrote a lengthy article on Data is Data. It was a kickback at the analogies that data is oil, gold, labour, sunlight - data is not. Data is unique; it has unique characteristics. That article concluded that the word “Data” is also part of the problem, but we should think of data as if discovering a new element with unique characteristics.
For a while, the data community has rested on two key characteristics of data: non-rivalrous (which plays havoc with our current understanding of ownership) and non-fungible (which is true if you assume that data carries information.) Whilst these are both accurate observations; they are not universal.
Non-rivalrous. Economists call an item that can only be used by one person at a time. "rivalrous." Money and capital are rivalrous. Data is non-rivalrous as a single item of data can simultaneously fuel multiple algorithms, analytics, and applications. This is, however, not strictly true. It is that numerous perfect copies of data can be used simultaneously.
Non-fungible. When you can substitute one item for another, they are said to be fungible. One sovereign bill can be replaced for another sovereign bill of the same value; one barrel of oil is the same as the next. So the thinking goes, data is non-fungible and cannot be substituted because it carries information. However, if your view is that data carries state (the particular condition that something is in at a specific time), then data is fungible.
I love Hugh’s work, and @gapingvoid nailed this. Data’s most basic representation is “state” where is represents the particular condition something is in at a specific time. Information is knowing that there are different “states” (on/off). Knowledge is finding patterns and connections. Insights know there is an exception to the current state. Wisdom is the journey. The point is that non-rivalrous and non-fungible is not good enough as “data” is the mechanism for representation of all these properties in a digital world.
Money as a framework to explore the purpose of data
Sovereign FIAT currency, money in this setting, has two essential characteristics. It has rivalrous and fungible. Without these foundational characteristics, money cannot fulfil its purpose; a trusted medium of exchange. Money removes the former necessity of a direct barter, where equal value had to be established, and the two or more parties had to meet. What is interesting is that there are alternatives to FIAT which exploit other properties. Because of fraud, we have to have security features, and there is a race to build the most secure wall.
[Just as a side note - money is an abstraction and part of the rationale for a balance sheet was to try to connect the abstraction back to real things. Not sure that works any more]
Revising the matrix but thinking about what problem is to be solved.
We are now adding data and other ideals on the matrix, as a different way to frame data.
These updates to the matrix highlight that, if data is non-rivalrous and non-fungible, these characteristics mean that is is very unclear to what problem data is solving. Indeed we see this all the time in the data market, as we cannot agree on what data is, it is messy.
The question for us as a data community is; “what are the axis [characteristics] that mean data is in the top corner of a matrix? This is where data is a beautiful solution to a defined problem, given that data is at its core is “share state.” We explored this question and proposed Rights and Attestation as the two axes on a call with Scott David.
Rights in this context are that you have gained rights from the Parties. What and how those rights were acquired is not the question; it is just that you have the rights you need to do what you need to do.
Attestation in this context is the evidence or proof of something. It is that you know what you have is true and that you can prove the state exists.
As we saw with the money example, data will never have these characteristics exclusively; it is just when it has them, data is most purposeful. Without attestation, the data you have is compromised, and any conclusions you reach may not be true or real. Continually we have to test both our assumptions and the provability of the data. Rights are different as rights are not correlated with data quality. A business built without rights to the data they are using is not stable or sustainable. How and if those Rights were obtained ethically are issues to be investigated. Interestingly, these characteristics would readily fit into a risk and audit framework today.
I have a specific focus on ESG, sustainability and better data for decision making, and better data for sharing. Given that most comparative ESG data is from public reports (creative commons or free of rights), but more importantly, there is a break in the attestation. ESG data right now is in the least useful data bucket for decision making, but we are making critical investment decisions on this analysis data set. It is something that we have to address.
In summary
If the purpose of data is “to share state” then the two essential characteristics data must have are rights and attestation. Further, as data becomes information (knowing state), knowledge (patterns of states), insight (issues in states) and wisdom - these characteristics of rights and attestation matter even more. If you are making decisions on data that you don’t know if it is true or have the rights to it, becomes a dangerous place.
As a side, there is lots of technology and processes to know if the state is true (as in correct - not truth); if the state sensing is working and the level of accuracy; if the state at both ends has the same representation (providence/ lineage ); if it is secure; if we can gain information; if we can combine data sets and what the ontology is. But these are not so fundamental; they are supportive that make the ecosystem of data work.