Research Accomplishments of Latanya Sweeney, Ph.D.



Overview

Medical Informatics
      Scrub
      Datafly
      Genomic identifiability
      Patient-centered management

Database Security
      k-anonymity

Surveillance
      Selective-revelation
      Risk assessment server
      PrivaMix

Vision
      Face de-identification

Biometrics
      Contactless capture

Policy and Law
      Identifiability of de-identified data
      HIPAA assessments
      Privacy-preserving surveillance

Public Education
      Identity angel
      SSNwatch
      CameraWatch

Quantitative assessments

Approach and Area Overview

Latanya Sweeney believes that computer scientists who pioneer privacy-invasive technologies and scholars who design related policy are in the best positions to solve privacy-technology clashes through design. 5 So, she works within those communities engaged in constructing privacy invasive technologies and in related disciplines to encourage such solutions 6 Her approach is to identify a privacy-technology clash within a community, formulate a privacy problem statement, and then offer a solution to the problem as an exemplar to seed privacy-preserving work within the originating community. The following summarizes her contributions by area.

Medical Informatics

In medical informatics, Dr. Sweeney contributed: (1) Scrub, which de-identifies textual documents [cite, cite]; (2) Datafly, which balances privacy and utility in field-structured data [cite, cite, cite, cite]; (3) reidentifications of genomic data [cite, cite, cite, cite]; and, (4) a healthcare experiment using anonymous data to compare cohort outcomes [cite].

Scientific influence and impact:

  • Scrub [cite] was first to introduce the problem of de-identifying medical text and poses a solution. Academic institutions, such as MIT [Szolovits et al.] and the University of Pittsburgh, have implemented versions of Scrub and related alternatives, which they currently license to medical organizations (e.g., Vanderbilt) for real-world use.

  • Datafly [cite] was one of the first to pose a completely algorithmic solution. Researchers proposed efficiencies and alternatives [Ohno-Machado, Vinterbo, et al.].

  • The DNA re-identification experiments of Dr. Sweeney and her student, Bradley Malin, were the first [cite, cite, cite, cite]. Other researchers then showed other vulnerabilities [Kohane, Altman, et al.], until most recently, NIH ceased providing human DNA databases publicly based on re-identifications [McGuire et al.].

  • Dr. Sweeney's work seems to be the first to introduce an experimental design for comparing health outcomes of cohorts over time using provably anonymous data for analysis [cite]. Hagan at Price Waterhouse Coopers reports that other healthcare organizations (e.g., Healthnet, Alere, et al.] are already using variants of the experimental design.

Other achievements: 3 best paper awards; papers among 15 most cited American Medical Informatics papers; Fellow in College of Medical Informatics, and, Privacy Leadership Award.

See more about Dr. Sweeney's accomplishments with Scrub, Datafly, Genomic identifiability, and Patient-centered management.

Database Security

In database security, Dr. Sweeney contributed k-anonymity [cite, cite, cite, cite, cite, cite, cite]. Data are k-anonymized if data for each person is indistinguishable from at least k-1 individuals who also appear in the data.

Scientific influence and impact:

  • k-anonymity was the first formal privacy protection model. Its original intention was to thwart the ability to link field-structured databases, but has been viewed more broadly, and in so doing, spurred a series of highly cited works. For example, other researchers have proposed efficiencies, alternatives and hardness proofs [Meyerson, Williams, et al.]. To improve utility, k-anonymity can allow an assumption that it may be enforced on a subset of fields known to lead to re-identifications. L-diversity [Gehrke et al.] poses an alternative motivated if the subset is chosen incorrectly. T-closeness [Li et al.] poses an alternative to address concerns found in l-diversity and vulnerabilities if k-anonymity is applied generally. Most recently, differential privacy [Dwork et al.] poses another alternative, which typically distorts data using randomization and noise, enforced across all values, to report inexact commonly occurring information.

Other achievements: recognition award; patent; second highest citation count among joint citation counts of Associate Professors in the School of Computer Science at Carnegie Mellon.

See more about Dr. Sweeney's accomplishments with k-anonymity.

Surveillance

In surveillance, Dr. Sweeney contributed: (1) Selective Revelation: a data sharing architecture that matches identifiability and utility [cite]; (2) Risk Assessment Server: computes the identifiability of data [cite, cite, cite]; and, (3) PrivaMix: allows a network of data holders to jointly produce a de-identified linked dataset without a trusted third party [cite, cite].

Scientific influence and impact:

  • Selective-revelation [cite] was part of congressional and media discussions regarding surveillance of Americans through secondary uses of data they leave behind. Robert Popp, then Deputy Director at DARPA for the Total Information Awareness Project (TIA), described it often in response to privacy concerns.

  • Risk Assessment Server [cite, cite, cite] originated with Dr. Sweeney's study of the identifiability of basic demographics, leading to my highly cited result “87% of the population of the United States is uniquely identified by {date of birth, gender, ZIP}. [Golle et al.] found 64% using more recent population data and a different model. [Malin et al.] explained the difference due to models and showed there is no difference for binsizes >= 5.

  • Even though PrivaMix [cite, cite] is very recent, HUD had the system and functions evaluated by independent security and cryptographic experts, who confirmed their correctness and applicability. PrivaMix worked flawlessly in real-world HUD experiments in Iowa. NIH provided support to help port PrivaMix to healthcare.

Other achievements: praise from a Federal Advisory Committee; DARPA, HUD, and NIH funding; patent filing; 2 licenses to businesses; and, news articles 7.

See more about Dr. Sweeney's accomplishments with Selective revelation, Risk assessment server, and PrivaMix.

Vision

In vision, Dr. Sweeney and her students acontributed formal methods for de-identifying and anonymizing faces in video and photographs [cite, cite, cite, cite, cite, cite, cite].

Scientific influence and impact:

  • Dr. Sweeney and her students were the first to demonstrate the importance of using provable privacy protection over ad hoc approaches, by showing how face recognition, used in its most ideal settings, could re-identify faces distorted by masking, additive noise, and pixelation [cite]. They then introduced the first formal model for protection [cite]. Others have introduced alternatives and enhancements [Defaux et al.]. Senior recently edited a book on the topic [cite], and in more recent work [cite, cite, cite, cite, cite], Dr. Sweeney working with her student, Ralph Gross, and other collaborators produced anonymized, photo realistic video. Working with Gross, Cohn, de la Torre and Baker, they produced anonymized, photo realistic video of pain grimace in patients for NIH [cite].

Other achievements: paper in a top CS journal (IEEE TKDE); paper in a top CS conference (IEEE Conference on Biometrics, 10% acceptance rate).

See more about Dr. Sweeney's accomplishments with Face de-identification.

Biometrics

In biometrics, we contributed new technologies that use photography for contactless capture of fingerprints [cite, cite, cite, cite, cite, cite, cite, cite, cite, cite, cite, cite, cite, cite].

Scientific influence and impact:

  • This work is still underway, but has already ignited lots of interest from government funding agencies, including DOJ, DOD, and DHS, and has received lots of interest in early real-world trials from local jails (for booking) and from U.S. Border stations.

Other achievements: DOJ funding; paper in a top CS conference (IEEE BTAS 10% acceptance rate), 2 patent filings, business venture, and news articles 7.

See more about Dr. Sweeney's accomplishments with Contactless capture.

Policy and Law

In policy and law, Dr. Sweeney contributed: (1) numerous real-world re-identification studies [cite, cite, cite, cite, cite, cite, cite, cite, cite, cite, cite, cite]; (2) operational standards for determining compliance (e.g. HIPAA) [cite, cite, cite cite]; and, (3) real-world examples of surveillance with privacy protection [cite, cite, cite, cite, cite, cite, cite, cite, cite, cite, cite, cite].

Scientific influence and impact:

  • Dr. Sweeney's earliest re-identification studies were discussed and cited as reasons for approaches taken in the HIPAA Privacy Rule [Gellman, Federal Register, et al.]. Four court decisions cite and discuss her re-identifications, and in one case, her method was sealed [Southern Illinoisian v. Dept. of Public Health]. Researchers have replicated her experiments in other countries [Emam, et al.]. Legal scholars have discussed ramifications [Kerr, et al.] and offer new legal theories to address her findings [Rothstein, Ohm, Weitzner, et al.].

  • Attorneys publicly endorsed Dr. Sweeney's standard for determining HIPAA compliance as a means of reducing litigation risk [Tupman, et al.] and support its use in practice [American health lawyers, et al.]. Two companies have licenses to her related technology and use it to commercially provide HIPAA Compliance Assessments [Privacert, et al.].

Other achievements: citation in the commentary of the HIPAA Privacy Rule and in Medical Breach Regulation, in 4 court decisions; presentations at the European Union and the U.S. Senate; Privacy Advocacy award; appointment to the Privacy and Security Seat of the Federal HIT Policy Committee in the Obama Administration; and news articles 7.

See more about Dr. Sweeney's accomplishments with Identifiability of de-identified data, HIPAA assessments, and Privacy-preserving surveillance.

Public Education

In public education, Dr. Sweeney contributed: (1) Identity Angel, which crawls the Web and notifies people of sensitive personal information found about them on-line [cite, cite, cite]; (2) SSNwatch, which validates Social Security numbers [cite]; and, (3) CameraWatch, which locates URLs of publicly available webcams [cite, cite]. 7.

Scientific influence and impact:

  • Dr. Sweeney's Identity Angel program [cite, cite, cite] found almost 10,000 Social Security numbers on-line and attempted to email about 3000 individuals whose {SSN, email} were found. A month later, about 2000 SSNs were removed. CBS News interviewed different people in different cities for reactions and aired the interviews on local stations, e.g. Denver [cbs4denver.com/video/?id=10164@kcnc.dayport.com].

  • With respect to SSN validation, SSNwatch [cite] receives about 1000 hits/week. District attorneys are primary users, seeming to match SSNs to information in statements.

  • With respect to SSN prediction, Dr. Sweeney was first to warn of a pending crisis in the ability to predict the 9 digits of a person's SSN given only {date of birth, home town}. A private company confirmed her suspicion using millions of SSNs of live people. One of her students, Ralph Gross, working with a colleague, Alessandro Acquisti, repeated the experiment using SSNs of dead people and got publishable results and deserved media attention.

See more about Dr. Sweeney's accomplishments with Identity angel, SSNwatch, and Camera Watch.



Notes

5 Helen Nissenbaum discusses design decisions made by technology developers. See her book, Privacy in Context (2009).

6 Working across areas is unorthodox. Rather than Dr. Sweeney's work residing in one community, which is customary, she pursues scientific contributions of privacy in multiple communities and in the real-world too –in the places where technology-privacy clashes are underway. This makes review of her work difficult for the same reasons it makes it difficult to work across areas. Each area has its own language, concepts, history, and scientific methods. Even though her papers are reviewed with the same rigor as others within a community, it is not easy to assess impact from outside that community. So, an array of quantitative assessments are available.

7 Featured news articles for my work in surveillance, biometrics, policy and law, and public education include as venues: Scientific American, CBS, NBC, ABC, Newsweek, USA Today, and National Public Radio.

Previous | Next


Related links:


Fall 2009