Research Accomplishments of Latanya Sweeney, Ph.D.



Overview

Medical Informatics
      Scrub
      Datafly
      Genomic identifiability
      Patient-centered management

Database Security
      k-anonymity

Surveillance
      Selective-revelation
      Risk assessment server
      PrivaMix

Vision
      Face de-identification

Biometrics
      Contactless capture

Policy and Law
      Identifiability of de-identified data
      HIPAA assessments
      Privacy-preserving surveillance

Public Education
      Identity angel
      SSNwatch
      CameraWatch

Quantitative assessments

Surveillance: Risk assessment server

[cite, cite]

Problem Statement: Given de-identified person-specific data, construct a method for predicting the number of subjects whose information can be re-identified.

Description: A solution is Dr. Sweeney's Risk Assessment Server. Its architecture (a) uses a population model, a meta-level database describing available databases, and an inference engine. An output (b) is an "identifiability report" that plots estimates of the number of explicitly known individuals whose information can be identified in the data. Re-identifications (c) appear in graduated groupings termed as "binsizes". The inference engine finds shortest paths from the given data to data containing explicit identifiers for the same populations. Dr. Sweeney's paper [cite] provides a real-world example from bioterrorism surveillance (d). Re-identifications result from linking to hospital discharge data on medical history. A surprise is that age range releases cannot thwart these re-identifications, no matter how aggregated (5-year ages shown).

(a)

(b)
(c)
(d)

Scientific Influence and Impact: Dr. Sweeney's Risk Assessment Server originated with her study of the identifiability of basic demographics, leading to my highly cited result "87% of the population of the United States is uniquely identified by {date of birth, gender, ZIP}". Researchers replicated these experiments. [Golle et al.] found 64% were uniquely identified in the US using more recent information and a different model. [Malin et al.] explained the difference as model artifacts and demonstrated that as you move to binsizes >= 5, there is no difference.

Other Achievements: 12

  • Testimony before the Technology and Privacy Advisory Committee (TAPAC), a Federal Advisory Committee for the Department of Defense. Highly praised in committee report.

  • Among 28 news articles profiling my work (over 300 total news citations to my work) are references to this research. Venues include Scientific American, Computerworld, CBS News, ABC News, Newsweek, USA Today, and National Public Radio.

  • Two businesses have licenses to perform HIPAA Risk Assessments.



Notes

12 See quantitative assessments for more details.

Previous | Next


Related links:


Fall 2009