Research Accomplishments of Latanya Sweeney, Ph.D.


Medical Informatics
      Genomic identifiability
      Patient-centered management

Database Security

      Risk assessment server

      Face de-identification

      Contactless capture

Policy and Law
      Identifiability of de-identified data
      HIPAA assessments
      Privacy-preserving surveillance

Public Education
      Identity angel

Quantitative assessments

Medical Informatics: Datafly System

[cite, cite, cite, cite]

Problem Statement: Given patient-specific field-structured data, a list of requested data elements, and a level of desired anonymity, construct a method that produces a copy of the data which maintains the utility of the requested elements while protecting the privacy of the patients.

Description: Dr. Sweeney's Datafly System provides a solution. It anonymizes medical data pursuant to two profiles -one profile specifies the privacy sensitivity of each field, and one profile specifies the desired utility of each field. The program then generalizes, substitutes, and removes information as appropriate seeking to satisfy the constraints of the two profiles using a "conservation of anonymity" principle to make decisions about trade-offs. This approach tends to preserve more details in useful fields by enforcing stronger privacy constraints on other fields. The result is the most general version of the data that remains useful to the recipient.

Scientific Influence and Impact: Datafly was one of the first to pose a completely algorithmic solution. Prior work on statistical databases required interactive expert decision-making [Winkler et al.]. Shortly after introducing Datafly, Dr. Sweeney also introduced k-anonymity (described separately), which is a simpler, more formal model that does not use profiles. After publication, other researchers proposed efficiencies and alternatives [Ohno-Machado, Vinterbo, et al.] and hardness proofs working across the two contributions,. Sometimes, there was confusion about their being two distinct contributions. For example, one researcher aware of k-anonymity, re-invented Datafly by adding profiles to k-anonymity [Iyengar, U.S. Patent 7024409]. The patent writers cited one of my Datafly papers [cite], but erroneously described it as k-anonymity.

Other Achievements: 12

  • Recognition (A Best Paper) Award, American Medical Informatics Association. [cite].

  • Datafly paper [cite] is the 4th most cited paper of all American Medical Informatics papers to date, having a statistically significant citation count at the 99.9th percentile.

  • Datafly papers ([cite], [cite]) are among 1% of the American Medical Informatics papers that enabled successful work by others.

  • Datafly paper is among Dr. Sweeney's most cited computer science papers ([cite], [cite], [cite], [cite], [cite]) that jointly have the second highest citation count among those of Associate Professors (tenured and nontenured) in the School of Computer Science at Carnegie Mellon and the count is statistical significant at 99.9th percentile.


12 See quantitative assessments for more details.

Previous | Next

Related links:

Fall 2009