Feature - Finding a clue in a data-stack
Thanks to a distributed data analysis system under development at Rutgers University, New Jersey, the time it takes police to connect the dots between illicit activities across the globe could go from months to minutes.
Imagine that law enforcement officers in Los Angeles are investigating labs that manufacture the illegal drug methamphetamine. Meanwhile, investigators in Chicago are looking into the illegal sale of large quantities of the drug pseudoephedrine, manufactured in Canada - an over-the-counter medication often abused to make methamphetamine.
Chicago and L.A. investigators could unknowingly be working two ends of the same case, taking months to solve it. But with the DI-HOPE KD (Distributed Higher Order Privacy Enhanced Knowledge Discovery) system, that could change.
The Chicago-L.A. example is hypothetical, but according to William Pottenger, a Rutgers associate research professor, the scenario is based on a real incident. DI-HOPE KD will provide a virtual collaboration environment with a variety of tools to aid investigators, says Pottenger, who is also the director of transition for the Homeland Security Center of Excellence for Command, Control and Interoperability, based at Rutgers.
Once DI-HOPE KD is fully online, the system will use enormous amounts of data from multiple sources nationwide to turn up new leads on cases. In the meantime, the Rutgers group is testing their system's algorithms using supercomputers to simulate a distributed environment, Pottenger says. While some parts of the DI-HOPE KD system are already available, the full system is expected to come online in the next five to ten years.
The first step in any investigation is to interview sources, collect information, and compare reports. To increase efficiency, DI-HOPE KD will take advantage of existing web interfaces through which investigators can directly input data for easier sharing and comparison.
Often there is so much data, however, that it is difficult to identify the important pieces, and quickly pruning out irrelevant data is a challenge. To address this, the Rutgers group is developing a set of 'Higher Order Learning algorithms' that mimic human intuition to identify important patterns and categorize the data into manageable groups. The pruned data can then be processed using data extraction algorithms that make an educated guess about the value of particular pieces of data - thus highlighting potential names, addresses and phone numbers that could be useful.
The next step is to make sense of the data. Normally, investigators meet in a room to search through huge amounts of information distributed across multiple databases for possible links. With DI-HOPE KD, however, investigators can participate remotely in a privacy-enhanced virtual environment, using the system to help connect the dots to solve the case.
In the hypothetical meth case, DI-HOPE KD would be able to connect the source in Canada, the broker in Chicago, and the lab in L.A., and alert investigators in these cities of potential links. The system would report solely the existence of a possible link; only investigators with the appropriate clearance would be able to see why the cases are linked.
"We've got a system that will use a number of technologies in data analytics to speed up investigations," Pottenger says. "Supercomputers are critical to the development of this technology because without them you cannot scale (large enough to accurately test the tools)."
-Amelia Williamson Smith, for iSGTW