what are "Safe People and Safe Projects" for data sharing


The start of this was a directed ideation from Ian Oppermann NSW Chief Data Scientist and CEO of NSW Data Analytics Centre. The Challenge: How do you design a privacy preserving data sharing framework based on these papers. They are well written and provide a very good framework for the question.

Focus Safe People, Safe Projects

Within this context, what is a safe project and safe people for a privacy perserviing data sharing idea? Where “safe” means in this context — privacy preserving. Ignoring other factors such as sensitivity, importance, ethics and outcomes.

Assumption 1. The reconstruction/ re-identification problem

PII (personally identifiable information) can be created from combining personal information


ALSO


PII (personally identifiable information) can be created from the union of different personal information sources (where PII has been removed from each data source)

Assumption 2. The sensitivity problem

Removal or de-sensitising of data reduces the value of the data set but makes reconstruction harder. However, whilst safer for sharing — this also makes solving the problem harder. We know that safe places (secure government site) reduces the sensitivity problem, but the output cannot be shared. The focus is on uncontrolled places for safe sharing. As secure and controlled are easier — within reason, but need to workout for uncontrolled

Assumption 3. How do we know what is safe?

· Safe because it is within standards/ framework (however no safe data sharing standard as yet, reason for this work)
· Safe follows rules and guidelines (however very contextual and each case is different) and as yet very limited set of rules to follow
· Safe as best practice based on risk assessment (how to determine all unintended consequences) — but this is complex and requires expert skills, and will usually only become apparent in the delivery

Assumption 4. What is the range of safe people; each question raises conflicts!

· If highly skilled — easier for the person to do things they should not
· If unskilled — easy to trick and create problems
· If highly screened — may remove those associated with data but removes those who understand
· If connected to the data — may have a motivation that means will solve the problem

Assumption 5. Five safes model



http://www.fivesafes.org/

Safe Project: Is this use of the data appropriate, lawful, ethical and sensible?
Safe People: Can the researchers be trusted to use it in an appropriate manner?

Commentary

When considering a safe project — what is the intended outcome and how is safe measured? Where and who runs the plan, project, programme?
When there is a change to (people, data, setting, output) what and where are the controls?
How does the project (programme/ plan) define who the people are that should work on the delivery — or governance?
View from debate separated in to two models/ options
Option 1 : The project is the master control. It determines the plan, the execution (day to day) and the review. The Project has a purpose which is given, this is fixed and the team can vary other aspects to deliver the purpose. The project knows the intended outcomes, is has an objective, is reports to a sponsor, there is a known recipient for the output and there is governance. The project manager can manage any change to deliver, as long as the purpose does not change.
Option 2. The project is part of a bigger programme, and it cannot of itself decide on any changes. It focussed only on the initial approval. Changes to the project are outside of its own controls.

Conclusion

Thinking about a layered model — providing a governance layer for programmes — where a purpose will not change, enabling a project to manage as more detail/ conflicts emerges. Safe G people is governance and the purpose is governance. Safe D is they are safe for delivery of the project. And Safe execution allows just safe people for that part of the plan.
Allows complexity to be uncovered rather than trying to solve all the problems in one place.


The layers can be 1 to n — depending on the complexity of the purpose.