HumanGPS

In this paper, we address the problem of building dense correspondences between human images under arbitrary camera viewpoints and body poses. Prior art either assumes small motion between frames or relies on local descriptors, which cannot handle large motion or visually ambiguous body parts, e.g., left vs. right hand. In contrast, we propose a deep learning framework that maps each pixel to a feature space, where the feature distances reflect the geodesic distances among pixels as if they were projected onto the surface of a 3D human scan. To this end, we introduce novel loss functions to push features apart according to their geodesic distances on the surface. Without any semantic annotation, the proposed embeddings automatically learn to differentiate visually similar parts and align different subjects into an unified feature space. Extensive experiments show that the learned embeddings can produce accurate correspondences between images with remarkable generalization capabilities on both intra and inter subjects.

Publications

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondence

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
Keywords: correspondences, geodesic distance, embeddings, neural networks, digital human, interactive perception

Videos

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondence


Talks

Cited By

  • Dance in the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis. https://arxiv.org/pdf/2111.05916.pdf. Tuanfeng Y. Wang, Duygu Ceylan, Krishna Kumar Singh, and Niloy J. Mitra. source | cite | search
  • Human View Synthesis Using a Single Sparse RGB-D Input. arXiv.2112.13889. Phong Nguyen, Nikolaos Sarafianos, Christoph Lassner, Janne Heikkila, and Tony Tung. source | cite | search
  • BodyMap: Learning Full-Body Dense Correspondence Map. arXiv.2205.09111. Anastasia Ianina, Nikolaos Sarafianos, Yuanlu Xu, Ignacio Rocco, and Tony Tung. source | cite | search
  • Free-Viewpoint RGB-D Human Performance Capture and~Rendering. Lecture Notes in Computer Science. Phong Nguyen-Ha, Nikolaos Sarafianos, Christoph Lassner, Janne Heikkilä, and Tony Tung. source | cite | search
  • VoLux-GAN: A Generative Model for 3D Face Synthesis with HDRI Relighting. Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings. Feitong Tan, Sean Fanello, Abhimitra Meka, Sergio Orts-Escolano, Danhang Tang, Rohit Pandey, Jonathan Taylor, Ping Tan, and Yinda Zhang. source | cite | search
  • Pri3D: Can 3D Priors Help 2D Representation Learning?. https://arxiv.org/abs/2104.11225.pdf. Ji Hou, Saining Xie, Benjamin Graham, Angela Dai, and M. Nießner. source | cite | search
  • Versatile Multi-Modal Pre-Training for Human-Centric Perception. arXiv.2203.13815. Fangzhou Hong, Liang Pan, Zhongang Cai, and Ziwei Liu. source | cite | search
  • Normal-guided Garment UV Prediction for Human Re-Texturing. arXiv.2303.06504. Yasamin Jafarian, Tuanfeng Y. Wang, Duygu Ceylan, Jimei Yang, Nathan Carr, Yi Zhou, and Hyun Soo Park. source | cite | search
  • LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human Modeling. arXiv.2208.08622. Boyan Jiang, Xinlin Ren, Mingsong Dou, Xiangyang Xue, Yanwei Fu, and Yinda Zhang. source | cite | search
  • ConVol-E: Continuous Volumetric Embeddings for Human-Centric Dense Correspondence Estimation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Amogh Tiwari, Pranav Manu, Nakul Rathore, Astitva Srivastava, and Avinash Sharma. source | cite | search
  • Stay In Touch