Medical Informatics: Scrub System
Problem Statement: Given clinical notes and letters between physicians, locate and replace all personally-identifying information with made-up alternatives.
Description: Dr. Sweeney's Scrub System is a solution. The straightforward approach of global search and replace properly located no more than 30-60% of all personally-identifying information that appeared explicitly in test databases. On the other hand, Scrub found 99-100% of these references. Scrub uses detection algorithms that employ templates and specialized knowledge of what constitutes a name, address, phone number and so forth. For example, Fred and Bill are common first names and knowing so makes it easier to recognize them as likely names. On the other hand, Virginia could be the state or a first name. Typing mistakes and abbreviations are common. Scrub uses a system of competing detectors, communicating over a blackboard 11 where precedence and prior occurrence help resolve conflicts.
Scientific Influence and Impact: Scrub seems first to introduce the problem and a solution. Dr. Sweeney reports that it was motivated by data sharing needs at Children's Hospital [Kohane et al.] and Massachusetts General Hospital [Barnett et al.]. Academic institutions, such as MIT [Szolovits et al.] and the University of Pittsburgh, have implemented versions of Scrub and related alternatives, which they currently license to medical organizations (e.g., Vanderbilt) for real-world use.
Other Achievements: 12