Overview
Medical Informatics
Surveillance
Biometrics
Policy and Law
|
Surveillance: Risk assessment serverProblem Statement: Given de-identified person-specific data, construct a method for predicting the number of subjects whose information can be re-identified. Description: A solution is Dr. Sweeney's Risk Assessment Server. Its architecture (a) uses a population model, a meta-level database describing available databases, and an inference engine. An output (b) is an "identifiability report" that plots estimates of the number of explicitly known individuals whose information can be identified in the data. Re-identifications (c) appear in graduated groupings termed as "binsizes". The inference engine finds shortest paths from the given data to data containing explicit identifiers for the same populations. Dr. Sweeney's paper [cite] provides a real-world example from bioterrorism surveillance (d). Re-identifications result from linking to hospital discharge data on medical history. A surprise is that age range releases cannot thwart these re-identifications, no matter how aggregated (5-year ages shown).
Scientific Influence and Impact: Dr. Sweeney's Risk Assessment Server originated with her study of the identifiability of basic demographics, leading to my highly cited result "87% of the population of the United States is uniquely identified by {date of birth, gender, ZIP}". Researchers replicated these experiments. [Golle et al.] found 64% were uniquely identified in the US using more recent information and a different model. [Malin et al.] explained the difference as model artifacts and demonstrated that as you move to binsizes >= 5, there is no difference. Other Achievements: 12
|