In 2015, more than 2000 people died in a stampede during the Hajj pilgrimage in Saudi Arabia. In 2013, two terrorists deposited backpacks carrying bombs at the Boston Marathon and slipped away, leaving three spectators to die. If technology could in real-time track and analyze the movement of individuals in dense crowds, we might better predict dangerous pileups or spot suspicious behavior, saving many lives a year. A pair of researchers has just taken a large step in that direction, writing software that for the first time can track hundreds of people in a crowd simultaneously.
Following the paths of many individuals at the same time is enormously difficult, even for humans. Previous computer-based efforts to analyze dense crowd movement have focused on tracking one individual at a time in recorded video. But there are problems with that method. First, you have to run the programs over and over again for each person you want to track. Second, the programs tend to identify people in each frame of a video based on appearance—but heads and faces can be hard to distinguish from above, especially in tight crowds and low-resolution video. The new research, which will be published in IEEE Transactions on Pattern Analysis and Machine Intelligence, finds a way to increase both the efficiency and accuracy of tracking a person, enabling a software program to finally follow many people at the same time.
The trick involves predicting where an individual will go next. The researchers wrote a mathematical function that analyzes five factors, based on previous frames of a video, to anticipate where each person will be in the current frame. One is appearance: Which patches of pixels resemble the target from the previous frame? Another is target motion: Where could the target be based on speed and direction? A third is neighbor motion: If the target is obscured, the program guesses on location based on the motion of the person’s neighbors. Fourth is spatial proximity: The program won’t guess that two people are in the same place, standing on top of one another. And last is grouping: If the program identifies a few people walking in a group, it will assume that they’ll retain the same formation.
For a group of more than about 10 people, a perfect mathematical solution to the formula is impossible, because of the number of targets and interactions between them. So the team uses an iterative algorithm to produce a slightly rougher but easier-to-calculate prediction of each person’s location frame by frame. “It’s not feasible to find an exact solution for a large number of targets,” says Afshin Dehghan, a computer scientist who conducted the work with Mubarak Shah while at the University of Central Florida in Orlando. “We introduced a new technique that is actually popular in other areas, like math.” (It’s a version of what’s called the Frank-Wolfe algorithm, if you’re curious.)
To test their method, the researchers analyzed nine crowd videos that had been used in previous research, covering marathons, commuters, a crosswalk, a train station, an airport, and a Hajj pilgrimage. The crowd sizes ranged from 57 to 747 people. By one measure, the program’s accuracy at tracking everyone in a video ranged from 67% to 99%. That performance matched or greatly exceeded five comparison algorithms—all of which tracked people one by one.
Incorporating collective movements was key to the program’s success. “There are group dynamics at play when you’re with other people, especially in a dense situation,” says Aniket Bera, a computer scientist at The University of North Carolina in Chapel Hill, who studies crowd behavior but was not involved in the work. “That information has been exploited in this paper.”
Hamid Rezatofighi, a computer scientist at the University of Adelaide in Australia, who has worked on multitarget tracking, says the technique is interesting. But he questions its practicality, because a user has to manually select targets to track. That means at the start of a video, users have to click on each person they want to follow.
And currently the method does not work in real time when tracking hundreds of people—it can take close to a second per frame to finish the needed calculations—but Dehghan notes that “the code has a lot of room to be optimized.” He’s hoping to release it publicly.
“There are many applications for tracking individuals in a crowd,” Dehghan says. You might analyze flow to design public spaces better and prevent overcrowding or bottlenecking, or program rules that spot anomalous behavior in security videos, such as lingering or stalking or erratic walking. Crowds form not just at marathons and pilgrimages, of course, but at concerts and political rallies and soccer games. In fact, the research was partially funded by the Qatar National Research Fund, as Qatar will be hosting the 2022 World Cup and wants to assure crowd safety. (Where should entrances and exits be placed? Can a terrorist be identified before detonating a bomb?)
And humans aren’t the only ones to flock together. The same technique, Dehghan says, could be used to follow members of herds of animals or schools of fish for scientific research. Or cells in medical imaging to diagnose disease or understand healthy functioning. “Tracking in dense crowds is a relatively unexplored problem,” he says.
[“Source-sciencemag”]