It Is Not the Journey but the Destination:
Endpoint Conditioned Trajectory Prediction

Karttikeya Mangalam1,   Harshayu Girase1,   Shreyas Agarwal1,   Kuan-Hui Lee2,   Ehsan Adeli3
Prof. Jitendra Malik1,    Adrien Gaidon2
1UC Berkeley
2Toyota Research Institute
3Stanford University

ECCV 2020 (Oral)

[Paper]             [Bibtex]         [Short Talk]         [Long Talk]         [Github]          

Overview Video


Human trajectory forecasting with multiple socially interacting agents is of critical importance for autonomous navigation in human environments, e.g., for self-driving cars and social robots. In this work, we present Predicted Endpoint Conditioned Network (PECNet) for flexible human trajectory prediction. PECNet infers distant trajectory endpoints to assist in long-range multi-modal trajectory prediction. A novel nonlocal social pooling layer enables PECNet to infer diverse yet socially compliant trajectories. Additionally, we present a simple “truncation trick” for improving few-shot multi-modal trajectory prediction performance. We show that PECNet improves state-of-the-art performance on the Stanford Drone prediction benchmark by ∼20.9% & on the ETH/UCY benchmark by ∼40.8%.

Key Ideas

Imitating the Human Path Planning Process: We posit that pedestrians in the scene move towards a predetermined position and interactions such as social signalling shape their trajectories only locally while they still go along achieving their original intention. Instantiating this idea, we propose to model the pedestrian trajectory prediction problem (top left) by breaking down the task in two sequential steps that are learned end to end. (a) Inferring the local endpoint distribution (top right) for diverse endpoint sampling for each agent independently; and then (b) Conditioning on sampled future endpoints (bottom left) for planning socially compliant trajectories for all the agents in the scene jointly (bottom right).

Multimodal Results

Visualizing Multimodality Predictions: Qualitative results for diverse multi-modal predictions produced by PECNet on the Stanford Drone Dataset. White represents the past 3.2 seconds trajectory (8 frames) while red & cyan represents predicted & ground truth future respectively over next 4.8 seconds (12 frames). As demonstrated, PECNet predictions capture a wide-range of plausible trajectory behaviours while discarding improbable ones such as endpoints incompatible with the direction of motion.

Socially Compliant Diverse Predictions

 Ground Truth

 Final Predictions

 Multimodal Predictions

Qualitative Results at Mergers & Intersection: We demonstrate PECNet's socially compliant & diverse trajectories in multi-agent settings in tricky scenarios such as path merger (top row) or collision avoidance at lane intersections (bottom row). The left column denotes the ground truth trajectories from Stanford Drone Dataset and the middle and left columns denote our predictions. For ground truth/final predictions, circles denote the past input fed into PECNet, while stars denote the future to be predicted/predicted with tails denoting the last four observed positions for both. Our "best" predictions follow the ground truth closely while effectively avoid collisions with other pedestrians in a natural seamless way. While PECNet's multimodal predictions produces diverse socially compliant trajectories jointly for all the pedestrians in the scene (extended temporally for visualization using recurrent prediction). For quantitative results please see our paper.


Mangalam, Girase, Agarwal, Lee,
Adeli, Malik, Gaidon.

It Is Not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction

ECCV 2020 (Oral)

[Paper]     [Bibtex]     [Github]


We thank Prof. Juan Carlos Niebles for helpful advice and suggestions. This webpage template was borrowed from some colorful folks.