“There are only two hard problems in computer science: naming things and cache invalidation.” -Phil Karlton
Why is a data forensics computer program named after a tenacious hunting dog?
Jay Koven, a PhD candidate at the NYU Tandon School of Engineering who’s a member of the security and privacy research group working on data forensics, helped create the forensic visual analytics tool called “Beagle.”
He shared some of the challenges he faced, his exciting future, and the surprises he learned along the way, including the inspiration for the name Beagle.
What Is Beagle?
Beagle is a forensic investigation tool based on visual analytics. Koven and his team designed it to resemble an email client to make it a comfortable fit for investigators who commonly use email.
It has an incredibly powerful query language for the searches. They made that language very simple so that somebody who is an investigator, not a computer scientist, could easily create very complex queries.
Koven noted that the concept may sound simple, but the logistics are challenging. People use abbreviations and speak in slang. That makes a lot of the standard algorithms fall apart. AI doesn’t work well on this subject for the same reasons.
This Beagle Sniffs out Crime
When Koven and his team started this project, they were thinking in terms of big institutional financial fraud, maybe something the SEC or Secret Service (part of the U.S. Department of the Treasury) would be investigating. All of those things have large email datasets associated with them.
Emails, texts, and other electronic communications go hand in hand with many kinds of crimes. In financial fraud, they leave a documented trail of what was done. Transactions generate confirmations. There’s a lot of information passed back and forth.
If you’re a cyber scammer, obviously you’re using email to communicate with the victims, banks, and credit card companies. Communications play a major role.
Beagle allows you to attack a large dataset. In one test case, it poured through three and a half million emails to tease out the criminal network involved, find the actual financial fraud (which is usually found in proof of various bank transfers or credit card transactions), and find the actual identity of the account owners.
All of this information was buried in the dataset. Beagle adds the ability to do a lot more filtering in the searches and give feedback regarding the results. The system is so fast that users can do query after query with almost instant returns.
Building the Beagle
The team approached the problem initially from a social network point of view. The social network in that dataset is going to be a crucial part of the investigation.
The reality is that the overview of the social network is useless. You’re talking about millions of emails, all the connections; it’s what Koven referred to as a “hairball.”
Koven and his team needed to give the investigator the ability to build a social network of the people they discover, and then let the investigators work off of that.
They’re looking for the ring leader, they’re looking the bosses, they’re looking for the mules, they’re looking for people who have different roles. And so they are doing some of that analysis, but it’s more of a semantic analysis of what’s in the emails.
Like a lot of things in life, they didn’t realize that until they were well into the process.
“What we’ve gotten in feedback is, not only did you succeed but we need this yesterday.” —Jay Koven
Finding the Bad Guys
The development team got input in general terms from the Secret Service, FBI, and different district attorney offices, because this is a problem they all face. Unfortunately, no one was willing to share their datasets for legal reasons.
Which was a big problem.
Fortunately, they were able to work with a California data security company called Agari.
Agari had collected email datasets from cyber scammers who had attacked their clients. They had collected the data because they wanted to understand how cyber scammers operated so they could improve their tools to protect their clients.
Koven’s team wanted to identify the kinds of scams that were in there, how they were doing the scamming. As a bonus, if they could identify any of the individuals, that was passed on to law enforcement.
In some cases, law enforcement was already watching the people Beagle identified. In other cases, the names were new information. The team learned a lot about how cyber scammers operate, and what constitutes a successful investigation.
Koven’s long term goal is to turn Beagle into a real product versus a research tool. When somebody asks if Beagle is in its Beta testing, he has to admit that it’s not even in its alpha testing.
Beagle started out as a research tool where the original goals were to understand the process and to see if it could help.
The feedback Koven has gotten from law enforcement is, “Not only did you succeed but we need this yesterday.”
Koven will wrap up his PhD shortly and his goal is to make Beagle available to law enforcement. That may be through open source, or through creating a company. One way or another he will get the tool out there so that people can use it.
“Emails, texts, and other electronic communications go hand in hand with many kinds of crimes.” —Jay Koven
Naming the Beagle
Koven, clearly a Renaissance Man, remembered reading a book that talked about how, in Middle English, the term Beagle meant investigates doggedly.
The computer program is certainly well named.