The following is an initial, personal summary of high-level observations from the White House-MIT Big Data Workshop.
- Technical contributions from computer science can assess and in some cases control the privacy impact of data usage in a rigorous, quantitative manner. These techniques can help assure that systems handling personal data are functioning consistent with desired public policies and institutional rules. In some cases, these controls will prevent disclosure or misuse of data up front. In other cases, systems can detect misuse of personal data and enable those responsible to be held accountable for violating relevant rules. These technologies are at various stages of development, some ready for broad deployment and others needing more research to enable practical application. Developing the science base that enables people to be in control of their data and assure it is used accountably is a key challenge. Solutions to this challenge are feasible.Technology will not replace the need for laws and social norms protecting privacy. We should expect that systems are built to perform according to privacy rules and make enforcement of those rules easier, especially as the scale of data usage increases beyond the point that manual compliance and audit can be effective. However, throughout the workshop, a variety of implicit and explicit definitions of privacy were used. Some implied that privacy is synonymous with secrecy and complete confidentiality – that as soon as personal information is available to anyone else, privacy is lost. Others suggested that privacy is properly understood as the ability to control how personal information is disclosed and/or used. Finally, privacy is understood by some as a question of whether personal data is used in a manner that harms the individual. Therefore, in order to realize the goal of designing systems that do a better job of respecting privacy, we need greater clarity on rules for specific uses of big data privacy applications.
- Large-scale analytic techniques are unlocking socially important knowledge from personal data, including advances in health care, genomics research, transportation safety and educational effectiveness. We see evidence that in some cases, larger data sets yield otherwise unavailable knowledge. In some circumstances, researchers need unmediated access to raw data, including all personal information. In other cases, it may be possible to control access to data in a manner that sufficiently obscures personal identity while enabling researchers to extract valuable knowledge. In either case, techniques that log access to and use of personal data can help protect individuals from harmful uses of their personal data.
- Progress toward enabling beneficial users of personal information in a manner that protects individual privacy requires answering the following questions:
- How can institutions (public and private) evolve best practices for handling personal information and taking maximum advantage of privacy enhancing technologies? From the scenario discussion at the end of the workshop, we learned that there are enormous legal and ethical complexities that arise with the use of large-scale analytics on personal data. In some cases, the law or social custom provides guidance, but in many cases not.
- How can government and the private sector work together to test privacy enhancing technologies in practical contexts at large scale?
- How can government and industry work together to advance the technical state of the art in privacy research, including developments in cryptographic algorithms that allow private computation on data, differential privacy, as well as accountable systems?
- How can we support multidisciplinary research on the application of privacy enhancing computational techniques informed by legal, ethical and social science research perspectives?
We learned in this workshop that there are variety of developments from computer science that can help those who use big data techniques manage the use of personal data according to rules, as well as offer scalable solutions to those who seem to assess, manage and regulate privacy in these contexts. Some of these techniques are ready for immediate application. Other workshops in the series are expected to explore legal and ethical meaning of privacy. In this first dialogue there was widespread agreement among the presenters that both citizens and those who seek to make large-scale use of personal data are all in need of more clear definitions of privacy. Those definitions may as a single right, a more complex bundle of laws, as well as institutional procedures and social norms.
This is neither official nor final nor a consensus conclusion from the workshop. A more detailed summary will be available shortly.
-Daniel Weitzner, Principal Research Scientist, MIT Computer Science and Artificial Intelligence Lab