Overview
Medical Informatics
Scrub
Datafly
Genomic identifiability
Patient-centered management
Database Security
k-anonymity
Surveillance
Selective-revelation
Risk assessment server
PrivaMix
Vision
Face de-identification
Biometrics
Contactless capture
Policy and Law
Identifiability of de-identified data
HIPAA assessments
Privacy-preserving surveillance
Public Education
Identity angel
SSNwatch
CameraWatch
Quantitative assessments
|
Approach and Area Overview
Latanya Sweeney believes that computer scientists who pioneer privacy-invasive technologies
and scholars who design related policy are in the best positions to solve privacy-technology
clashes through design.
5
So, she works within those communities engaged in constructing privacy
invasive technologies and in related disciplines to encourage such solutions
6
Her approach is to identify a privacy-technology clash
within a community, formulate a privacy problem statement, and then offer a solution to the
problem as an exemplar to seed privacy-preserving work within the originating community.
The following summarizes her contributions by area.
In medical informatics, Dr. Sweeney contributed: (1) Scrub, which de-identifies textual documents
[cite,
cite];
(2) Datafly, which balances privacy and utility in field-structured data
[cite,
cite,
cite,
cite];
(3) reidentifications of genomic data
[cite,
cite,
cite,
cite];
and, (4) a healthcare experiment using
anonymous data to compare cohort outcomes
[cite].
Scientific influence and impact:
- Scrub
[cite]
was first to introduce the problem of de-identifying medical text and poses a
solution. Academic institutions, such as MIT [Szolovits et al.] and the University of
Pittsburgh, have implemented versions of Scrub and related alternatives, which they
currently license to medical organizations (e.g., Vanderbilt) for real-world use.
- Datafly
[cite]
was one of the first to pose a completely algorithmic solution. Researchers
proposed efficiencies and alternatives [Ohno-Machado, Vinterbo, et al.].
- The DNA re-identification experiments of Dr. Sweeney and her student,
Bradley Malin, were the first
[cite,
cite,
cite,
cite].
Other researchers
then showed other vulnerabilities [Kohane, Altman, et al.], until most recently, NIH
ceased providing human DNA databases publicly based on re-identifications [McGuire et
al.].
- Dr. Sweeney's work seems to be the first to introduce an experimental design for comparing health
outcomes of cohorts over time using provably anonymous data for analysis
[cite].
Hagan at
Price Waterhouse Coopers reports that other healthcare organizations (e.g., Healthnet,
Alere, et al.] are already using variants of the experimental design.
Other achievements: 3 best paper awards; papers among 15 most cited American Medical
Informatics papers; Fellow in College of Medical Informatics, and, Privacy Leadership Award.
See more about Dr. Sweeney's accomplishments with Scrub,
Datafly,
Genomic identifiability,
and Patient-centered management.
In database security, Dr. Sweeney contributed k-anonymity
[cite,
cite,
cite,
cite,
cite,
cite,
cite].
Data are k-anonymized if data for each person is indistinguishable from at least k-1 individuals who
also appear in the data.
Scientific influence and impact:
- k-anonymity was the first formal privacy protection model. Its original intention was to
thwart the ability to link field-structured databases, but has been viewed more broadly,
and in so doing, spurred a series of highly cited works. For example,
other researchers have proposed efficiencies, alternatives and hardness proofs [Meyerson, Williams, et al.].
To improve utility, k-anonymity can allow an assumption that it may be enforced on a subset of fields known
to lead to re-identifications. L-diversity [Gehrke et al.] poses an alternative motivated if
the subset is chosen incorrectly. T-closeness [Li et al.] poses an alternative to address
concerns found in l-diversity and vulnerabilities if k-anonymity is applied generally.
Most recently, differential privacy [Dwork et al.] poses another alternative,
which typically distorts data using randomization and noise, enforced across all values,
to report inexact commonly occurring information.
Other achievements: recognition award; patent; second highest citation count among joint
citation counts of Associate Professors in the School of Computer Science at Carnegie Mellon.
See more about Dr. Sweeney's accomplishments with k-anonymity.
In surveillance, Dr. Sweeney contributed: (1) Selective Revelation: a data sharing architecture that matches
identifiability and utility
[cite];
(2) Risk Assessment Server: computes the identifiability of data
[cite,
cite,
cite];
and, (3) PrivaMix: allows a network of data holders to jointly produce a de-identified
linked dataset without a trusted third party
[cite,
cite].
Scientific influence and impact:
- Selective-revelation
[cite]
was part of congressional and media discussions regarding
surveillance of Americans through secondary uses of data they leave behind. Robert
Popp, then Deputy Director at DARPA for the Total Information Awareness Project
(TIA), described it often in response to privacy concerns.
- Risk Assessment Server
[cite,
cite,
cite]
originated with Dr. Sweeney's study of the identifiability of
basic demographics, leading to my highly cited result “87% of the population of the
United States is uniquely identified by {date of birth, gender, ZIP}. [Golle et al.] found
64% using more recent population data and a different model. [Malin et al.] explained
the difference due to models and showed there is no difference for binsizes >= 5.
- Even though PrivaMix
[cite,
cite]
is very recent, HUD had the system and functions
evaluated by independent security and cryptographic experts, who confirmed their
correctness and applicability. PrivaMix worked flawlessly in real-world HUD
experiments in Iowa. NIH provided support to help port PrivaMix to healthcare.
Other achievements: praise from a Federal Advisory Committee; DARPA, HUD, and NIH
funding; patent filing; 2 licenses to businesses; and, news articles
7.
See more about Dr. Sweeney's accomplishments with Selective revelation,
Risk assessment server, and PrivaMix.
In vision, Dr. Sweeney and her students acontributed formal methods for de-identifying and anonymizing
faces in video and photographs
[cite,
cite,
cite,
cite,
cite,
cite,
cite].
Scientific influence and impact:
- Dr. Sweeney and her students were the first to demonstrate the importance of using provable privacy protection
over ad hoc approaches, by showing how face recognition, used in its most ideal settings,
could re-identify faces distorted by masking, additive noise, and pixelation
[cite].
They then
introduced the first formal model for protection
[cite].
Others have introduced alternatives
and enhancements [Defaux et al.]. Senior recently edited a book on the topic
[cite],
and in more recent work
[cite,
cite,
cite,
cite,
cite],
Dr. Sweeney working with her student, Ralph Gross, and other collaborators
produced anonymized, photo realistic video. Working with Gross, Cohn, de la Torre and Baker,
they produced anonymized, photo realistic video of pain grimace in patients for NIH
[cite].
Other achievements: paper in a top CS journal (IEEE TKDE); paper in a top CS conference
(IEEE Conference on Biometrics, 10% acceptance rate).
See more about Dr. Sweeney's accomplishments with Face de-identification.
In biometrics, we contributed new technologies that use photography for contactless capture of
fingerprints
[cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite].
Scientific influence and impact:
- This work is still underway, but has already ignited lots of interest from government
funding agencies, including DOJ, DOD, and DHS, and has received lots of interest in
early real-world trials from local jails (for booking) and from U.S. Border stations.
Other achievements: DOJ funding; paper in a top CS conference (IEEE BTAS 10% acceptance
rate), 2 patent filings, business venture, and news articles
7.
See more about Dr. Sweeney's accomplishments with Contactless capture.
In policy and law, Dr. Sweeney contributed: (1) numerous real-world re-identification studies
[cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite];
(2) operational standards for determining compliance (e.g. HIPAA)
[cite,
cite,
cite
cite];
and, (3) real-world examples of surveillance with privacy protection
[cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite,
cite].
Scientific influence and impact:
- Dr. Sweeney's earliest re-identification studies were discussed and cited as reasons for approaches
taken in the HIPAA Privacy Rule [Gellman, Federal Register, et al.]. Four court
decisions cite and discuss her re-identifications, and in one case, her method was sealed
[Southern Illinoisian v. Dept. of Public Health]. Researchers have replicated her experiments in
other countries [Emam, et al.]. Legal scholars have discussed ramifications [Kerr, et al.] and
offer new legal theories to address her findings [Rothstein, Ohm, Weitzner, et al.].
- Attorneys publicly endorsed Dr. Sweeney's standard for determining HIPAA compliance as a means
of reducing litigation risk [Tupman, et al.] and support its use in practice [American
health lawyers, et al.]. Two companies have licenses to her related technology and use it
to commercially provide HIPAA Compliance Assessments [Privacert, et al.].
Other achievements: citation in the commentary of the HIPAA Privacy Rule and in Medical
Breach Regulation, in 4 court decisions; presentations at the European Union and the U.S.
Senate; Privacy Advocacy award; appointment to the Privacy and Security Seat of the Federal
HIT Policy Committee in the Obama Administration; and news articles
7.
See more about Dr. Sweeney's accomplishments with Identifiability of de-identified data,
HIPAA assessments, and Privacy-preserving surveillance.
In public education, Dr. Sweeney contributed: (1) Identity Angel, which crawls the Web and notifies people of
sensitive personal information found about them on-line
[cite,
cite,
cite];
(2) SSNwatch, which validates Social
Security numbers
[cite];
and, (3) CameraWatch, which locates URLs of publicly available webcams
[cite,
cite].
7.
Scientific influence and impact:
- Dr. Sweeney's Identity Angel program
[cite,
cite,
cite]
found almost 10,000 Social Security numbers on-line
and attempted to email about 3000 individuals whose {SSN, email} were found. A
month later, about 2000 SSNs were removed. CBS News interviewed different people in
different cities for reactions and aired the interviews on local stations, e.g. Denver
[cbs4denver.com/video/?id=10164@kcnc.dayport.com].
- With respect to SSN validation, SSNwatch
[cite]
receives about 1000 hits/week. District
attorneys are primary users, seeming to match SSNs to information in statements.
- With respect to SSN prediction, Dr. Sweeney was first to warn of a pending crisis in the ability to
predict the 9 digits of a person's SSN given only {date of birth, home town}. A private
company confirmed her suspicion using millions of SSNs of live people. One of her
students, Ralph Gross, working with a colleague, Alessandro Acquisti, repeated the experiment using
SSNs of dead people and got publishable results and deserved media attention.
See more about Dr. Sweeney's accomplishments with Identity angel,
SSNwatch, and Camera Watch.
Notes
5 |
Helen Nissenbaum discusses design decisions made by technology developers. See her book, Privacy in Context (2009).
|
6 |
Working across areas is unorthodox. Rather than Dr. Sweeney's work residing in one community, which is customary, she pursues
scientific contributions of privacy in multiple communities and in the real-world too –in the places where
technology-privacy clashes are underway. This makes review of her work difficult for the same reasons it makes it
difficult to work across areas. Each area has its own language, concepts, history, and scientific methods. Even
though her papers are reviewed with the same rigor as others within a community, it is not easy to assess impact
from outside that community. So, an array of quantitative assessments are available.
|
7 |
Featured news articles for my work in surveillance, biometrics, policy and law, and public education include as
venues: Scientific American, CBS, NBC, ABC, Newsweek, USA Today, and National Public Radio.
|
Previous | Next
Related links:
|