Information governance play a critical role in enabling the data economy. There are some significant risks in working with data due to legislative, reputational and commercial concerns. Amazon, for example, was fined $886m for an alleged data breach , while Google DeepMind’s use of health data has caused significant reputational damage for the company. Information governance aims to mitigate those risks as much as possible.
Most current approaches to information governance can be thought of through the lens of data-based access control policies. Data-based policies restrict which data can be used. An example of this in action is data-minimisation, which is one risk mitigation approach proposed as a principle of regulations like the GDPR . The idea is that only the minimal amount of data for a particular purpose should be kept and that users should be restricted to accessing only the data they need.
Data minimisation reduces the risk associated data processing because the entity using the data only has access to the data they need. It does not, however, focus directly on the legality of the data processing in question.
In regulations like the GDPR, legality of processing generally requires a lawful basis of the processing. This tends not to have much to do with what data is being processed, but is more about the purpose of processing. In the GDPR, for example, there are six lawful bases, sometimes described as Consent, Contract, Legal obligation, Vital interests, Public Task and Legitimate Interest. Consent requires that “the data subject has given consent to the processing of his or her personal data for one or more specific purposes”. The remaining five lawful bases all require “the processing is necessary for” some purpose . One can see immediately that the legality of processing is determined largely by the purpose, not by what data is processed. The importance of the purpose of processing is also highlighted through the definition of a data controller. This is the entity that “determines the purposes and means” of processing .
What does that mean for information governance? The fundamental concern of legislation is not which data is processed, but why the data is being processed. Although data governance is important, it is not tackling the fundamental concern of legislation which is how the data is being used.
The purpose of processing is also the more critical concern in commercial settings, independent of any legislative concern. As an example, allowing someone to evaluate the performance of a machine learning model on one’s proprietary data has fundamentally different commercial implications from allowing them to train a model on that same data. The purpose and type of processing is critical.
Usage-based access control policies
Usage-based access control policies are a new type of policy for information governance that provide technical controls over the usage of data. Where data-based policies restricts which data is processed, usage-based policies restricts how it is processed. The figure below gives a high-level comparison of the two in the context of the classic pipeline of Data > Processing > Insights.
A benefit of usage-based policies is that they allow the data-controller to keep the data within their infrastructure. Any request for processing by a given user will then need to be run within that infrastructure and can be checked against whether that user is authorised to run that type of processing.
As a risk-mitigation approach, usage-based policies allows data controllers to reduce the risk that data is used for a purpose that is not intended. Comparable to the data minimisation recommendations discussed in data protection legislation, usage minimisation would restrict the usage of a set of a data to the limited set of uses required for a given purpose. This a significantly more direct mitigation of the legislative and commercial risks of processing.
What does this mean for data-based policies?
Data-based policies are still an important aspect of information governance and data minimisation is still an important risk mitigator. Having the wrong data used for a particular purpose could be a critical risk. An example is the potential unfair usage of ethnicity or gender data in decision making. Even when the data is not obviously problematic, it is better that data is not available for usage when there is no clear reason for it to be.
Data-based policies give a complementary set of tools to those provided by usage-based policies. By leveraging the two approaches together, information governance teams can further reduce the risks of data loss, while also reducing the risk of data being used for unapproved purposed.
Usage-based policies in practice
In practice, usage-based policy approaches need to define which algorithms can be run on a given set of data along with any privacy controls. For example, the Bitfount platform allows data analysts and machine learning modellers to be given roles with different “usage permissions”. These take the form of permissions over the algorithms or protocols that may be used for running against the data.
The “usage permissions” each correspond to a particular type of usage and also associate a level of risk that is acceptable for the data controller. Types of usage include:
- Arbitrary SQL: For analysts who need to be able to do arbitrary analysis and have a special trust relationship with the data controller ensuring the risk of this level of usage is tolerated.
- Private SQL: For analysts who need to be able to do analysis on the data but where differential privacy is needed to mitigate the risk of data leakage and ensure suitably aggregated data is all that is released.
- Model Training (with and without privacy requirements): For machine learning modellers training a new model on a given dataset. Acceptable risk levels can be achieved by requiring additional privacy noise levels.
- Model Evaluation (with and without privacy requirements): For machine learning modellers evaluating a model they have trained separately on a given dataset. Acceptable risk levels associated with the metrics being returned can be achieved by requiring additional privacy noise levels.
By using these usage controls along with standard data minimisation techniques, information governance teams can increase the certainty that data is being used for a purpose that has been approved. The fact that every usage is checked for authorisation also provides an audit trail for later analysis.
Usage-based information governance is a relatively new movement. If you’re interested in the ideas or have suggestions on how the approach can be improved we’d love to hear from you. Feel free to sign up to our Community Slack and get involved in the conversation, or sign up to our newsletter at bitfount.com.