Privacy has become a significant concern in modern society as personal information about individuals is increasingly collected, used, and shared, often using digital technologies, by a wide range of organizations. One goal of this project is to precisely articulate what privacy means in various settings, and whether and how it can be achieved. In other words, we seek to develop conceptual and technical frameworks in which privacy notions (policies) are given precise semantics, algorithms for enforcing such policies, and characterizations of classes of policies that can or cannot be enforced. In addition to general results of this form, another goal of the project is to study specific application domains that raise significant privacy concerns in modern society and to apply these results (or specialized versions thereof) to these domains. Our current focus is on the healthcare domain. We are also thinking about privacy issues on the web and in online social media.
Specifically, to mitigate privacy concerns, organizations are required to respect privacy laws in regulated sectors (e.g., HIPAA in healthcare, GLBA in financial sector) and to adhere to self-declared privacy policies in self-regulated sectors (e.g., privacy policies of companies such as Google and Facebook in Web services). We investigate the possibility of formalizing and enforcing such practical privacy policies using computational techniques. We formalize privacy policies that prescribe and proscribe *flows* of personal information as well as those that place restrictions on the *purposes* for which a governed entity may use personal information. Recognizing that traditional preventive access control and information flow control mechanisms are inadequate for enforcing such privacy policies, we develop principled audit and accountability mechanisms with provable properties that seek to encourage policy-compliant behavior by detecting policy violations, assigning blame and punishing violators. We apply these techniques to several US privacy laws and organizational privacy policies, in particular, producing the first complete logical specification and audit of all disclosure-related clauses of the HIPAA Privacy Rule.
An overview paper:
Privacy Policy Semantics
and Specification:
Privacy policies often place requirements on the purposes for which a governed entity may use personal information. For example, regulations, such as HIPAA, require that hospital employees use medical information for only certain purposes, such as treatment. This paper presents semantics for purpose requirements using a formal model based on planning. We use the model to formalize when a sequence of actions is only for or not for a purpose.
This paper presents a semantic model and logic of privacy that formalizes the descriptive component of contextual integrity (a philosophical theory of privacy). The logic provides a way to represent the key construct in contextual integrity - context-relative informational norms. The norms prescribe and proscribe the flow of personal information in a given social context (e.g., healthcare, friendship, banking). The logic is used to represent a number of clauses from US privacy laws – Health Insurance Portability and Accountability Act (HIPAA), Gramm-Leach-Bliley Act (GLBA), and Children Online Privacy Protection Act (COPPA).
This paper presents PrivacyLFP, an expressive logic and signature for specifying privacy policies, building on prior work by Barth et al. (S & P 2006). It supports common concepts in privacy policies, including purposes for uses and disclosures, real-time provisions and obligations, and self-reference. The paper uses PrivacyLFP to produce the first complete logical formalization of the HIPAA Privacy Rule and GLBA.
Privacy Policy Enforcement:
· Policy auditing over incomplete logs
We present the design, implementation and evaluation of an algorithm that checks audit logs for compliance with privacy and security policies. The algorithm, which we name reduce, addresses two fundamental challenges in compliance checking that arise in practice. First, in order to be applicable to realistic policies, reduce operates on policies expressed in a first-order logic that allows restricted quantification over infinite domains. We build on ideas from logic programming to identify the restricted form of quantified formulas. The logic can, in particular, express all 84 disclosure-related clauses of the HIPAA Privacy Rule, which involve quantification over the infinite set of messages containing personal information. Second, since audit logs are inherently incomplete (they may not contain sufficient information to determine whether a policy is violated or not), reduce proceeds iteratively: in each iteration, it provably checks as much of the policy as possible over the current log and outputs a residual policy that can only be checked when the log is extended with additional information. We prove correctness, termination, time and space complexity results for reduce. We implement reduce and optimize the base implementation using two heuristics for database indexing that are guided by the syntactic structure of policies. The implementation is used to check simulated audit logs for compliance with the entire HIPAA Privacy Rule. Our experimental results demonstrate that the algorithm is fast enough to be used in practice.
· Learning to audit
Privacy policies constrain information flow and use based on conditions (such as beliefs) that may not be mechanically checkable. This paper presents a machine learning algorithm that can provide operational guidance to human auditors about the coverage and frequency of audits. The technical approach is based on regret minimization (an online learning technique) over repeated games of imperfect information.
· Auditing information use policies
This paper
presents a formal semantics for what it means for a sequence of actions to be only for or not for a purpose. This semantics enables us to provide an
algorithm for automated auditing of privacy policies that place requirements on
the purposes for which a governed entity may use personal information.
· Privacy-preserving release of statistics
Differential privacy is a promising approach to privacy preserving data analysis with a well-developed theory for functions. Despite recent work on implementing systems that aim to provide differential privacy, the problem of formally verifying that these systems have differential privacy has not been adequately addressed. This paper presents the first results towards automated verification of source code for differentially private interactive systems. We develop a formal probabilistic automaton model of differential privacy for systems by adapting prior work on differential privacy for functions. The main technical result of the paper is a sound proof technique based on a form of probabilistic bisimulation relation for proving that a system modeled as a probabilistic automaton satisfies differential privacy. The novelty lies in the way we track quantitative privacy leakage bounds using a relation family instead of a single relation. We illustrate the proof technique on a representative automaton motivated by PINQ, an implemented system that is intended to provide differential privacy. To make our proof technique easier to apply to realistic systems, we prove a form of refinement theorem and apply it to show that a refinement of the abstract PINQ automaton also satisfies our differential privacy definition. Finally, we begin the process of automating our proof technique by providing an algorithm for mechanically checking a restricted class of relations from the proof technique.
Technical Reports: