Researchers at the University of Michigan are teaching self-driving cars to recognize and predict pedestrians’ movements with greater precision than current technologies by zeroing in on their gait, body symmetry, and foot placement.
Data collected by vehicles through cameras, LiDAR, and GPS let the researchers capture video snippets of humans in motion and then recreate them in 3D computer simulations. With that, they’ve created a “biomechanically inspired recurrent neural network” that catalogs human movements.
The network lets AI machines predict poses and future locations for one or several pedestrians up to 50 yards from the vehicle. That’s about the size of a city intersection.
“Prior work in this area typically looked at only still images—it wasn’t concerned with how people move in three dimensions,” says Ram Vasudevan, a UM assistant professor of mechanical engineering. “But if vehicles are going to operate and interact in the real world, we need to make sure our predictions of where a pedestrian is going doesn’t coincide with where the vehicle is going next.”
Equipping vehicles with the necessary predictive power requires that the network dive into the minutiae of human movement: the pace of a human’s gait (periodicity), the mirror symmetry of limbs, and the way foot placement affects stability during walking.
Much of the machine learning used to bring autonomous technology to its current level has dealt with two-dimensional images—still photos. A computer shown several million photos of a stop sign will eventually come to recognize stop signs in the real world and in real time.
But using video clips that run for several seconds lets researchers study the first half of the snippet to make predictions, and then verify the accuracy with the second half.
“Now, we’re training the system to recognize motion and making predictions of not just one single thing—whether it’s a stop sign or not—but where that pedestrian’s body will be at the next step and the next and the next,” says Matthew Johnson-Roberson, an associate professor in UM’s Department of Naval Architecture and Marine Engineering.
To explain the kind of extrapolations the neural network can make, Vasudevan describes a common sight. “If a pedestrian is playing with their phone, you know they’re distracted,” he explains. “Their pose and where they’re looking tells you a lot about their level of attentiveness. It’s also telling you a lot about what they can do next.”
Results show that this new approach improves a driverless vehicle’s ability to predict what’s most likely to happen next.
“The median translation error of our prediction was approximately 10 cm after one second and less than 80 cm after six seconds,” says Johnson-Roberson. “All other comparison methods were up to 7 meters off…We are better at figuring out where a person is going to be.”
To rein in the number of options for predicting the next movement, the researchers applied the physical constraints of the human body, such as the inability to fly or the fastest possible speed on foot.
To create the dataset used to train UM’s neural network, researchers parked a vehicle with Level 4 autonomous features at several Ann Arbor intersections. With the car’s cameras and LiDAR facing the intersection, the vehicle recorded several days’ worth of data at a time. Researchers bolstered that real-world, “in the wild” data with traditional pose data sets captured in a lab.