Medical Informatics: Scrub System

Problem Statement: Given clinical notes and letters between physicians, locate and replace all personally-identifying information with made-up alternatives.

Description: Dr. Sweeney's Scrub System is a solution. The straightforward approach of global search and replace properly located no more than 30-60% of all personally-identifying information that appeared explicitly in test databases. On the other hand, Scrub found 99-100% of these references. Scrub uses detection algorithms that employ templates and specialized knowledge of what constitutes a name, address, phone number and so forth. For example, Fred and Bill are common first names and knowing so makes it easier to recognize them as likely names. On the other hand, Virginia could be the state or a first name. Typing mistakes and abbreviations are common. Scrub uses a system of competing detectors, communicating over a blackboard 11 where precedence and prior occurrence help resolve conflicts.

Scientific Influence and Impact: Scrub seems first to introduce the problem and a solution. Dr. Sweeney reports that it was motivated by data sharing needs at Children's Hospital [Kohane et al.] and Massachusetts General Hospital [Barnett et al.]. Academic institutions, such as MIT [Szolovits et al.] and the University of Pittsburgh, have implemented versions of Scrub and related alternatives, which they currently license to medical organizations (e.g., Vanderbilt) for real-world use.

Other Achievements: 12

  • First prize (Best Paper Award), American Medical Informatics Association. [cite]

  • Scrub paper [cite] is within the top 15 most cited American Medical Informatics papers to date, with a statistically significant citation count at the 99.9th percentile.


11 Raj Reddy and colleagues introduced the notion of a blackboard architecture. In the area of speech recognition, Hearsay- II's blackboard architecture [Erman, Hayes-Roth, Lesser, Reddy, ACM Computing Surveys, 1980] engages multiple knowledge sources that work in parallel. This is similar to Dr. Sweeney's Scrub system except communication is central to Hearsay-II because each level is believed to be so uncertain that a collaborative effort is required and not a competitive one as in Scrub.

12 See quantitative assessments for more details.

