From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting

Karttikeya Mangalam1,   Yang An2,   Harshayu Girase1,   Prof. Jitendra Malik1,   
1UC Berkeley
2TU Munich

ICCV 2021

[Paper]             [Bibtex]        

Overview Video


Human trajectory forecasting is an inherently multi-modal problem. Uncertainty in future trajectories stems from two sources: (a) sources that are known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions. We propose to factorize this uncertainty into its epistemic & aleatoric sources. We model the epistemic un-certainty through multimodality in long term goals and the aleatoric uncertainty through multimodality in waypoints& paths. To exemplify this dichotomy, we also propose a novel long term trajectory forecasting setting, with prediction horizons upto a minute, an order of magnitude longer than prior works. Finally, we presentY-net, a scene com-pliant trajectory forecasting network that exploits the pro-posed epistemic & aleatoric structure for diverse trajectory predictions across long prediction horizons.Y-net significantly improves previous state-of-the-art performance on both (a) The well studied short prediction horizon settings on the Stanford Drone & ETH/UCY datasets and (b) The proposed long prediction horizon setting on the re-purposed Stanford Drone & Intersection Drone datasets.

Key Ideas

Imitating the Human Path Planning Process: We posit that pedestrians in the scene move towards a predetermined position and interactions such as social signalling shape their trajectories only locally while they still go along achieving their original intention. Instantiating this idea, we propose to model the pedestrian trajectory prediction problem (top left) by breaking down the task in two sequential steps that are learned end to end. (a) Inferring the local endpoint distribution (top right) for diverse endpoint sampling for each agent independently; and then (b) Conditioning on sampled future endpoints (bottom left) for planning socially compliant trajectories for all the agents in the scene jointly (bottom right).

Multimodal Results

Visualizing Multimodality Predictions: Qualitative results for diverse multi-modal predictions produced by PECNet on the Stanford Drone Dataset. White represents the past 3.2 seconds trajectory (8 frames) while red & cyan represents predicted & ground truth future respectively over next 4.8 seconds (12 frames). As demonstrated, PECNet predictions capture a wide-range of plausible trajectory behaviours while discarding improbable ones such as endpoints incompatible with the direction of motion.

Socially Compliant Diverse Predictions

 Ground Truth

 Final Predictions

 Multimodal Predictions

Qualitative Results at Mergers & Intersection: We demonstrate PECNet's socially compliant & diverse trajectories in multi-agent settings in tricky scenarios such as path merger (top row) or collision avoidance at lane intersections (bottom row). The left column denotes the ground truth trajectories from Stanford Drone Dataset and the middle and left columns denote our predictions. For ground truth/final predictions, circles denote the past input fed into PECNet, while stars denote the future to be predicted/predicted with tails denoting the last four observed positions for both. Our "best" predictions follow the ground truth closely while effectively avoid collisions with other pedestrians in a natural seamless way. While PECNet's multimodal predictions produces diverse socially compliant trajectories jointly for all the pedestrians in the scene (extended temporally for visualization using recurrent prediction). For quantitative results please see our paper.


Mangalam, An, Girase, Malik

From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting

ICCV 2021

[Paper]     [Bibtex]     [Github]


We thank Prof. Juan Carlos Niebles for helpful advice and suggestions. This webpage template was borrowed from some colorful folks.