Dartmouth Engineer - The Magazine of Thayer School of EngineeringDartmouth Engineer - The Magazine of Thayer School of Engineering

Hacking the Hackers

Deciphering patterns in digital behaviors, Professor George Cybenko separates the good from the bad.

By Michael Blanding

Imagine a house.

You put a motion sensor at the doorway and scatter several others around the hallways and stairwells. Then you sit back and watch as they light up in a pattern. It’s easy enough in this imaginary house to determine when an intruder enters and exactly where he or she goes inside the building.

Now, however, imagine several people enter and exit the house at different times and take different routes through the halls and rooms. Based on the pattern of sensors, could you tell how many people arrived and where each one went? Now imagine hundreds of people enter and exit over the course of the day—and one of them is an intruder who has set out to rob the place. Can you tell which one he is?

These are the kinds of questions that preoccupy George Cybenko, Dorothy and Walter Gramm Professor of Engineering at Thayer School, and have informed decades of his research into signal processing. “If I see a sequence of events associated with different behaviors, can I associate those events with the right behaviors?” he says, drawing out the thought experiment. “Based on where the sensors are and how fast people typically walk, can I take all these reports and say with high probability that this is the track of one person?”

Professor George Cybenko
Professor George Cybenko. Photograph by John Sherman.

That may seem like a simple task, but the answer to the problem has implications in areas as disparate as cybersecurity, stock-market fraud, and counterterrorism. During the past 30 years, Cybenko has become one of the preeminent experts in finding patterns in the vast amounts of data that accumulate every time we enter a keystroke on our browser or make a transaction with a credit card. Most of the time such “behaviors,” as Cybenko calls them, are benign. But some behaviors, both online and in the physical world, are dangerous—a hacker trying to gain access to a company database, say, or a drug cartel trying to cross an international border.

Law enforcement must struggle to identify those bad behaviors before they are too late. Making matters more difficult, says Cybenko, at the same time that one person is acting, another person is often reacting, competing against each other in a high-stakes game of hide-and-seek. “I may have some confidential information on my computer, and I am competing with a hacker who is trying to gain access to that information,” he says. “He’s developing techniques to gain access, and I am trying to develop techniques to prevent him.”

In that environment, it’s not individual actions but the dynamic patterns of behavior that must be identified in order to protect the system. The question is, with billions of bytes of data flying back and forth around the world, how can you separate the bad behaviors from the good ones?

 

THAT IS JUST THE KIND OF COMPLEX WEB that Cybenko has spent his career patiently and methodically finding ways to unravel. Eugene Santos, a professor of engineering at Thayer School who frequently collaborates with Cybenko, first discovered him 25 years ago, when, as a grad student, he read a seminal paper Cybenko published about the nature of neural networks. Computer scientists had been using these networks—which mimic the structure of the brain—to solve complex problems, but largely without understanding how they worked. Cybenko’s paper not only provided the mathematical underpinning of neural networks but also showed what they could do.

“He showed mathematically what people didn’t know—that a neural network could learn any mathematical function,” says Santos. “Before then, they were successful, but more of a black box. He managed to ground them in a theory people could build from.”

Santos contacted Cybenko about the paper and received a friendly note of encouragement that has stayed with him ever since. Now that Santos works closely with him, those initial impressions have only been confirmed. “I don’t want to use this word—but I can’t find another one—he is one of the nicest people in the world,” says Santos. “He is thoughtful, he is sociable, and he is also very approachable for someone of his stature. When he is in a group of people he is almost like a gravity wall—people flock to him.”

“He is one of the most relaxed people I know,” says Vincent Berk, a former Thayer professor who continues to collaborate with Cybenko. “I always joke that if he was any more relaxed, he would slip into a coma. It takes a lot to rile him.” At the same time, Berk has counted on that calm demeanor to elegantly get to the source of problems.

“He sets a problem, then takes stabs at it from three or four different viewpoints. Then he goes away and comes back with the math to explain it,” Berk says. “Too many scientists, when asked to boil an egg, will study the laws of thermodynamics and whether it’s a gas stove or an electric stove or what kind of metal the pan is. He has a way of figuring out what part of the problems are worth the thinking time and what parts can be ignored or dealt with early.”

In research funded by the Department of Justice, for example, Cybenko and Santos investigated how criminal gangs might be transporting drugs across international borders—what routes they might take and why, and how they might respond to actions of the border patrol in real time. “His simple intuition was to capture things in terms of different costs—of transporting something on a certain route, or if military was operating in an area, or if citizens of a city were leaning toward them—but being able to capture those costs in a structured way,” says Santos. “He took something very chaotic and trial-and-error and made it methodical again.”

 

CYBENKO GREW UP IN TORONTO, where he was interested in math from an early age. He received his B.A. from the University of Toronto and his Ph.D. in applied mathematics from Princeton. One of his first forays into analyzing patterns in data focused on tracking aircraft on a radar screen. We are all familiar with the “blips” that appear onscreen to designate an object—however, those blips aren’t as straightforward as they seem. “You have to be able to tell which is a plane, which is a helicopter, which is a bird, and which is just noise. If I have 20 aircraft, I have to know which is which,” Cybenko says.

By using the knowledge about how fast airplanes can fly and how quickly they can speed up or turn, he was able to construct a mathematical algorithm to separate out which blip belonged to which plane. “The radar returns are constrained by the kinematics of aircraft behaviors, and in a sense those constraints are an important part of doing the processing,” he says.

When he started applying the same techniques to computer security, however, the constraints weren’t as immediately identifiable. Unlike airplanes, which follow set rules of behavior in how they are able to move, human beings have no such limits on their behavior. Cybenko began looking for patterns of expected behavior. “It’s not just the individual events that are meaningful, but it’s the sequence of events tied together that are interesting. A failed login attempt is pretty common; people forget their passwords all the time. But let’s say there is a sequence of failed logins, followed by a successful login, followed by opening files. That’s a lot more suspicious,” says Cybenko.

Using the house analogy, it is as if in 99 out of 100 times, the sensor in the front hallway was tripped 10 seconds after the sensor at the front door—and then in one case, the sensor in a side hallway was tripped instead. Similarly, Cybenko looked for expected actions that differed depending on the category of behavior being tracked, and then identified suspicious behavior outside of those actions. “The normal constraint on opening a web browser is that you have to be logged into a computer before you can open a browser window. So if you see a browser window open but there is no login associated with it, then that is a strange behavior.”

Arriving at Thayer School in 1992, Cybenko helped disseminate his ideas through the Dartmouth-based Institute for Information Infrastructure Protection (I3P), a consortium of 28 universities and other institutions that promotes greater cybersecurity. At the same time, he began working with other professors at Dartmouth to explore new forms of computer attacks that were just beginning to appear.

Dartmouth computer science Professor Paul Thompson describes a project they worked on 10 years ago funded by the Department of Justice that looked at cases in which people published fake press releases about companies online in order to manipulate their stock prices. “As a mathematician, he was interested in looking at irregular fluctuations in the stock market that might indicate something was manipulating the price,” says Thompson. “I have a background in computational linguistics, and was interested in looking at deception in language.” Together, they wrote a paper in 2003 that coined the term “cognitive hacking” to describe the phenomenon. “By manipulating information, hackers can alter our perceptions of reality in subtle ways—without launching a virus or a network worm,” Cybenko told a reporter at the time. Of course, now, when fake emails and other “phishing” attempts are commonplace, such manipulations are regular occurrences. At the time, however, the concept was groundbreaking, defining a burgeoning new threat.

“When George sees something and moves into a field, he is usually way ahead of the curve—he sees something that really changes the direction of things,” notes Santos.

 

THAT COULD CERTAINLY BE SAID about Cybenko’s latest breakthroughs in analyzing patterns in data. During the past decade, he and colleagues at Dartmouth, including Berk and Santos, developed the analysis techniques into a paradigm called Process Query Systems (PQS), which includes both an algorithm to identify anomalies in data and a software framework in which to store them. Their progress, however, hit a major limitation. The analysis required knowing the expected pattern of behavior—as determined by experts or longtime observers of a field—and then detecting behaviors that deviate from that normal pattern.

But what happens when you don’t know what’s normal? That’s the question Cybenko has been tackling more recently—using processing techniques in order to actually learn behaviors. “If I am given a collection of data, how can I create a model of behavior?” he asks. Humans are good at learning models of behavior intuitively. That’s what enables a 4-year-old to catch a ball by observing the arc and trajectory of the ball and modifying behavior to meet it, without knowing a lick of calculus. People playing rock-paper-scissors or watching an opponent bluff at poker are similarly recognizing patterns. How we do that is another matter. In fact, we know surprisingly little about how the brain processes that information to learn behaviors.

Developing an entirely new kind of algorithm, Cybenko has recently begun teaching computers to learn to identify anomalies in patterns, even when the computers don’t know initially what the pattern should look like. In one recent experiment that picked up from prior research about traders gaming the stock market, he and Ph.D. graduate David Twardowski Th’11 analyzed suspicious behavior in the NASDAQ stock exchange, which processes millions of orders—80 percent of which are anonymous—to buy and sell every day. By analyzing the patterns of when stock was bought and sold, however, Cybenko and Twardowski were able to identify suspicious patterns, tracing the pattern to one particular trader. In other words, they effectively de-anonymized the data.

Such techniques could have applications in a wide variety of contexts, including identifying credit card fraud. We’ve all had the experience of getting a call from our credit card company about a purchase that falls outside of our normal purchasing patterns—but currently credit companies are very unsophisticated in how they determine those patterns. “What is normal for me is different than what is normal for you,” says Cybenko. “For every individual you have to build a specific model, so you have to automate it in some way.”

The question is: How does the company avoid “false positives” that would lead it to flag legitimate purchases. “If they called you every time you tried to buy lunch, they’d catch all the fraud, but you’d change credit cards pretty quickly,” says Cybenko. If computers could learn patterns of purchasing based on each individual customer, they could better identify fraudulent purchases or identify theft before it was too late.

 

CYBENKO HAS BECOME INCREASINGLY INTERESTED in finding applications such as this for his work. With Berk, he recently cofounded a company called FlowTraq, which uses proprietary algorithms to analyze the network traffic of firms in order to create a “behavioral fingerprint” that can then be used to identify any potential security breaches. Berk left Dartmouth in 2012 to become CEO of the new company, which counts major Fortune 500 companies and several cloud-computing service providers among its clients. While Cybenko isn’t involved in the operation of the company, he says he is excited to see his ideas applied in commercial settings.

“Academic journals are graveyards for ideas,” Cybenko says, despite having served as editor-in-chief of several journals for the Institute of Electrical and Electronics Engineers (IEEE). “People have written papers about thousands of good ideas in the field of cybersecurity, but unless a Microsoft or Symantec picks them up and makes a product out it, your idea will remain a paper.”

Cybenko realized he needed to take action. “For most of my career it was good enough to develop good ideas and hope the world was a better place because I wrote it down and published it,” says Cybenko. “Now I would like to see if some of the more recent ideas will gain traction out in the world.”

Along with continuing to explore computer behavioral learning, Cybenko also hopes to spur more collaboration between researchers studying behaviors in the engineering realm and others studying them in the social sciences, such as behavioral psychology and behavioral economics. “You can go through the literature and find 23 different definitions for ‘behavior,’ ” says Cybenko. “In the next five to 10 years, I’d like to see a more coherent body of knowledge around behavior modeling that is truly cross-disciplinary.”

For better or for worse, more and more of our behaviors are being tracked and collated—whether it’s our activity on social media sites or information captured by the GPS on our phones. While that information has obvious marketing potential, Cybenko would like to see it analyzed as well for what it might tell us about the daily patterns of people in society, and how we might better design our environment to improve life.

“My background is in hard-core signal processing and computer engineering, but it’s become obvious to me that for many of the systems we are building, the human user and social aspects of those systems are really critical,” says Cybenko. “The technology behind Twitter is pretty simple, but the phenomenon of Twitter—how it’s being used and how it will evolve and develop—is pretty complicated. Being able to incorporate the human as part of the engineering system will be even more important as we move ahead.”

—Michael Blanding is an award-winning investigative journalist. His most recent book is The Map Thief.

Categories: Features

Tags: complex systems, entrepreneurship, faculty, research

comments powered by Disqus