Medical Informatics: Datafly System
Problem Statement: Given patient-specific field-structured data, a list of requested data elements, and a level of desired anonymity, construct a method that produces a copy of the data which maintains the utility of the requested elements while protecting the privacy of the patients.
Description: Dr. Sweeney's Datafly System provides a solution. It anonymizes medical data pursuant to two profiles -one profile specifies the privacy sensitivity of each field, and one profile specifies the desired utility of each field. The program then generalizes, substitutes, and removes information as appropriate seeking to satisfy the constraints of the two profiles using a "conservation of anonymity" principle to make decisions about trade-offs. This approach tends to preserve more details in useful fields by enforcing stronger privacy constraints on other fields. The result is the most general version of the data that remains useful to the recipient.
Scientific Influence and Impact: Datafly was one of the first to pose a completely algorithmic solution. Prior work on statistical databases required interactive expert decision-making [Winkler et al.]. Shortly after introducing Datafly, Dr. Sweeney also introduced k-anonymity (described separately), which is a simpler, more formal model that does not use profiles. After publication, other researchers proposed efficiencies and alternatives [Ohno-Machado, Vinterbo, et al.] and hardness proofs working across the two contributions,. Sometimes, there was confusion about their being two distinct contributions. For example, one researcher aware of k-anonymity, re-invented Datafly by adding profiles to k-anonymity [Iyengar, U.S. Patent 7024409]. The patent writers cited one of my Datafly papers [cite], but erroneously described it as k-anonymity.
Other Achievements: 12