Research Accomplishments of Latanya Sweeney, Ph.D.



Overview

Medical Informatics
      Scrub
      Datafly
      Genomic identifiability
      Patient-centered management

Database Security
      k-anonymity

Surveillance
      Selective-revelation
      Risk assessment server
      PrivaMix

Vision
      Face de-identification

Biometrics
      Contactless capture

Policy and Law
      Identifiability of de-identified data
      HIPAA assessments
      Privacy-preserving surveillance

Public Education
      Identity angel
      SSNwatch
      CameraWatch

Quantitative assessments

Public Education: SSNwatch

[cite]

Problem Statement: Given the encoding scheme used to assign Social Security numbers (SSNs) and publicly available information, construct methods that: (1) validate SSNs; (2) expose vulnerabilities; and, (3) pose an alternative scheme that enables current and new uses without the vulnerabilities.

Description: Dr. Sweeney's SSNwatch Validation Server38 offers a solution to (1). This is an on-line public service that identifies the issuing state, date issued, estimated age of the recipient, and activity status of a given SSN or part thereof. This information can be matched against other information the person provides (in resumes, job applications, etc.) for consistency. Mismatches in this information can help identify suspicious presentations. See examples below. As background, the encoding scheme used to assign SSNs leaks inferences about geography and date of issuance. The format of an SSN is aaa-gg-nnnn where aaa provides the state of issuance, gg describes a group order of issuance, and nnnn is a serially assigned number for an aaa-gg. By correlating millions of SSNs from the Social Security Death Index, she was able to construct statistical models for age and issuance dates, which combined with government documents describing SSN assignments, provides the SSNwatch database.

(a)

(b)
(c)

Scientific Influence and Impact: With respect to SSN validation, Dr. Sweeney's SSNwatch server receives about 1000 hits per week, and the primary users are district attorneys confirming information provided in statements and testimonies. With respect to SSN prediction, Dr. Sweeney was first to warn of a pending crisis in the ability to predict the 9 digits of a person's SSN given only {date of birth, home town}. In the late 1980's, SSNs began to be issued near birth, yielding a linear correlation between {date of birth, home town} and a person's SSN. A private company confirmed her suspicion using millions of SSNs of live people. Recently, one of her students, Ralph Gross, working with her colleague, Alessandro Acquisti, repeated the experiment using SSNs of dead people and got noteworthy results and much media attention.

Other Achievements: 12

  • Among 28 news articles specifically profiling aspects of this work. Venues include Scientific American, CBS News, ABC News, Newsweek, USA Today, and NPR.



Notes

12 See quantitative assessments for more details.

Previous | Next


Related links:


Fall 2009