Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision

Abstract

We tackle the problem of Human Locomotion Forecasting, a task for jointly predicting the spatial positions of several keypoints on human body in the near future under an egocentric setting. In contrast to the previous work that aims to solve either the task of pose prediction or trajectory forecasting in isolation, we propose a framework to unify these two problems and address the practically useful task of pedestrian locomotion prediction in the wild. Among the major challenges in solving this task is the scarcity of annotated egocentric video datasets with dense annotations for pose, depth, or egomotion. To surmount this difficulty, we use state-of-the-art models to generate (noisy) annotations and propose robust forecasting models that can learn from this noisy supervision. We present a method to disentangle the overall pedestrian motion into easier to learn subparts by utilizing a pose completion and a decomposition module. The completion module fills in the missing key-point annotations and the decomposition module breaks the cleaned locomotion down to global (trajectory) and local (pose keypoint movements). Further, with Quasi RNN as our backbone, we propose a novel hierarchical trajectory forecasting network that utilizes low-level vision domain specific signals like egomotion and depth to predict the global trajectory. Our method leads to state-of-the-art results for the prediction of human locomotion in the egocentric view.

Publication
Winter Conference on Applications of Computer Vision 2020 (Oral)
Date
Oral Presentation Video (WACV 2020)

Qualitative Results

Qualitative Results for overall locomotion forecasting in self-driving scenarios. Blue represents input locomotion to the network, Green is the predicted output locomotion. The network reasons jointly about path & pose for plausible future locomotion predictions.

Please cite using following bibtex:

@inproceedings{mangalam2020disentangling,
  title={Disentangling human dynamics for pedestrian locomotion forecasting with noisy supervision},
  author={Mangalam, Karttikeya and Adeli, Ehsan and Lee, Kuan-Hui and Gaidon, Adrien and Niebles, Juan Carlos},
  booktitle={The IEEE Winter Conference on Applications of Computer Vision},
  pages={2784--2793},
  year={2020}
}