Research Accomplishments of Latanya Sweeney, Ph.D.



Overview

Medical Informatics
      Scrub
      Datafly
      Genomic identifiability
      Patient-centered management

Database Security
      k-anonymity

Surveillance
      Selective-revelation
      Risk assessment server
      PrivaMix

Vision
      Face de-identification

Biometrics
      Contactless capture

Policy and Law
      Identifiability of de-identified data
      HIPAA assessments
      Privacy-preserving surveillance

Public Education
      Identity angel
      SSNwatch
      CameraWatch

Quantitative assessments

Medical Informatics: Genomic identifiability and privacy

[cite, cite, cite, cite]

Work done with her student, Brad Malin. 11

Problem Statement: Given DNA information and publicly available data, or given systems that attempt to protect the privacy of DNA, show how the subjects of the data can be re-identified.

Description: Dr. Sweeney and her student, Bradley Malin, offer solutions in 3 directions. They show methods to re-identify subjects of DNA by matching gene-based disease characteristics in the genetic data (a) to disease presentations in publicly available de-identified health data and then re-identify subjects from demographics also appearing in the health data, and (b) by matching trails of DNA collections at different sites to publicly available medical claims data across those sites and then re-identify subjects from demographics appearing in the claims data. (c ) They provide algorithmic proofs of re-identification vulnerabilities found in various real-world systems that proposed to protect genomic privacy with the use of pseudonymous or data believed to previously be anonymous.

(a)

(b)

Scientific Influence and Impact: The work of Dr. Sweeney and her student, Bradley Malin, began over concern that NIH provided publicly available human DNA databases, under the false belief that even though a person's DNA was specific to him, if all you had was his DNA, you could not re-identify him. Their re-identification experiments (a) and (b) were the first to demonstrate how re-identification of DNA was possible. Other researchers subsequently introduced other re-identification vulnerabilities [Kohane, Altman, et al.] until finally, the databases were removed after a high-profile example [McGuire et al.]. But open issues remain due to research needs to share combined health and genomic information. Critical next steps will be solutions that enable genomic data sharing with guarantees of privacy.

Other Achievements: 12

  • Best of the Year (Best Paper Award), International Medical Informatics Association. [cite]

  • Funding from the National Institutes of Health.

  • Paper [cite] is among the 50 most cited American Medical Informatics papers to date, having a citation count statistically significant at the 90th percentile.



Notes

11 Bradley Malin graduated with a PhD in computer science, and went on to be a tenured track professor at Vanderbilt University in the Medical School and in the Computer Science Department.

12 See quantitative assessments for more details.

Previous | Next


Related links:


Fall 2009