3D-Aware Object Goal Navigation
via Simultaneous Exploration and Identification

CVPR 2023


Jiazhao Zhang1,2*, Liu Dai3*, Fanpeng Meng4, Qingnan Fan5, Xuelin Chen5, Kai Xu6, He Wang1†

1 CFCS, Peking University   2 BAAI   3 CEIE, Tongji University   4 Huazhong University of Science and Technology  
5 Tencent AI Lab   6 National University of Defense Technology  

* equal contributions   corresponding author  


pipeline

Teaser. We present a 3D-aware ObjectNav framework along with simultaneous exploration and identification policies: A$\rightarrow$B , the agent was guided by an exploration policy to look for its target; B$\rightarrow$ C , the agent consistently identified a target object and finally called STOP.


Abstract


Object goal navigation (ObjectNav) in unseen environments is a fundamental task for Embodied AI. Agents in existing works learn ObjectNav policies based on 2D maps, scene graphs, or image sequences. Considering this task happens in 3D space, a 3D-aware agent can advance its ObjectNav capability via learning from fine-grained spatial information. However, leveraging 3D scene representation can be prohibitively unpractical for policy learning in this floor-level task, due to low sample efficiency and expensive computational cost. In this work, we propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation.


Methods


input

We take in a posed RGB-D image at time step $t$ and perform point-based construction algorithm to online fuse a 3D scene representation ($M_{3D}^{(t)}$), along with a $M_{2D}^{(t)}$ from semantics projection. Then, we simultaneously leverage two policies, including a corner-guided exploration policy $\pi_e$ and category-awre identification policy $\pi_f$, to predict a discrete corner goal $g_e^{(t)}$ and a target goal $g_f^{(t)}$ (if exist) respectively. Finally, the local planning module will drive the agent to the given target goal $g_f^{(t)}$ (top priority) or the corner goal $g_e^{(t)}$.


Online points fusion*

input

Left: A robot takes multi-view observations during navigation. Right: The points $p$ are organized by dynamically allocated blocks $B$ and per-point octrees $O$, which can be used to query neighborhood points of any given point.




Visualization


input

More results can be found in our paper.


Video





Team


Jiazhao Zhang1,2*
Liu Dai3*
Fanpeng Meng4
Qingnan Fan5
Xuelin Chen5
Kai Xu6
He Wang1†

1 CFCS, Peking University   2 BAAI   3 CEIE, Tongji University   4 Huazhong University of Science and Technology   5 Tencent AI Lab   6 National University of Defense Technology  
* equal contributions   corresponding author  


Citation



@article{zhang20223d,
  title={3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification},
  author={Zhang, Jiazhao and Dai, Liu and Meng, Fanpeng and Fan, Qingnan and Chen, Xuelin and Xu, Kai and Wang, He},
  journal={arXiv preprint arXiv:2212.00338},
  year={2022}
}

Contact


If you have any questions, please feel free to contact Jiazhao Zhang at zhngjizh_at_gmail_dot_com, Liu Dai at dailiu_dot_cndl_at_gmail_dot_com, and He Wang at hewang_at_pku_dot_edu_dot_cn