|
|
|
|
Human trajectory forecasting with multiple socially interacting agents is of critical importance for autonomous navigation in human environments, e.g., for self-driving cars and social robots. In this work, we present Predicted Endpoint Conditioned Network (PECNet) for flexible human trajectory prediction. PECNet infers distant trajectory endpoints to assist in long-range multi-modal trajectory prediction. A novel nonlocal social pooling layer enables PECNet to infer diverse yet socially compliant trajectories. Additionally, we present a simple “truncation trick” for improving few-shot multi-modal trajectory prediction performance. We show that PECNet improves state-of-the-art performance on the Stanford Drone prediction benchmark by ∼20.9% & on the ETH/UCY benchmark by ∼40.8%. |
||
|
Imitating the Human Path Planning Process: We posit that pedestrians in the scene move towards a predetermined position and interactions such as social signalling shape their trajectories only locally while they still go along achieving their original intention. Instantiating this idea, we propose to model the pedestrian trajectory prediction problem (top left) by breaking down the task in two sequential steps that are learned end to end. (a) Inferring the local endpoint distribution (top right) for diverse endpoint sampling for each agent independently; and then (b) Conditioning on sampled future endpoints (bottom left) for planning socially compliant trajectories for all the agents in the scene jointly (bottom right). |
||
|
![]() |
||
Visualizing Multimodality Predictions: Qualitative results for diverse multi-modal predictions produced by PECNet on the Stanford Drone Dataset. White represents the past 3.2 seconds trajectory (8 frames) while red & cyan represents predicted & ground truth future respectively over next 4.8 seconds (12 frames). As demonstrated, PECNet predictions capture a wide-range of plausible trajectory behaviours while discarding improbable ones such as endpoints incompatible with the direction of motion. |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Ground Truth |
Final Predictions |
Multimodal Predictions |
Qualitative Results at Mergers & Intersection: We demonstrate PECNet's socially compliant & diverse trajectories in multi-agent settings in tricky scenarios such as path merger (top row) or collision avoidance at lane intersections (bottom row). The left column denotes the ground truth trajectories from Stanford Drone Dataset and the middle and left columns denote our predictions. For ground truth/final predictions, circles denote the past input fed into PECNet, while stars denote the future to be predicted/predicted with tails denoting the last four observed positions for both. Our "best" predictions follow the ground truth closely while effectively avoid collisions with other pedestrians in a natural seamless way. While PECNet's multimodal predictions produces diverse socially compliant trajectories jointly for all the pedestrians in the scene (extended temporally for visualization using recurrent prediction). For quantitative results please see our paper. |
![]() |
Mangalam, Girase, Agarwal, Lee, Adeli, Malik, Gaidon. It Is Not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction ECCV 2020 (Oral) [Paper] [Bibtex] [Github] |
Acknowledgements |