Medical Informatics: Genomic identifiability and privacy
Work done with her student, Brad Malin. 11
Problem Statement: Given DNA information and publicly available data, or given systems that attempt to protect the privacy of DNA, show how the subjects of the data can be re-identified.
Description: Dr. Sweeney and her student, Bradley Malin, offer solutions in 3 directions. They show methods to re-identify subjects of DNA by matching gene-based disease characteristics in the genetic data (a) to disease presentations in publicly available de-identified health data and then re-identify subjects from demographics also appearing in the health data, and (b) by matching trails of DNA collections at different sites to publicly available medical claims data across those sites and then re-identify subjects from demographics appearing in the claims data. (c ) They provide algorithmic proofs of re-identification vulnerabilities found in various real-world systems that proposed to protect genomic privacy with the use of pseudonymous or data believed to previously be anonymous.
Scientific Influence and Impact: The work of Dr. Sweeney and her student, Bradley Malin, began over concern that NIH provided publicly available human DNA databases, under the false belief that even though a person's DNA was specific to him, if all you had was his DNA, you could not re-identify him. Their re-identification experiments (a) and (b) were the first to demonstrate how re-identification of DNA was possible. Other researchers subsequently introduced other re-identification vulnerabilities [Kohane, Altman, et al.] until finally, the databases were removed after a high-profile example [McGuire et al.]. But open issues remain due to research needs to share combined health and genomic information. Critical next steps will be solutions that enable genomic data sharing with guarantees of privacy.
Other Achievements: 12