1 CFCS, Peking University
2 BAAI
3 CEIE, Tongji University
4 Huazhong University of Science and Technology
5 Tencent AI Lab
6 National University of Defense Technology
* equal contributions
† corresponding author
Teaser. We present a 3D-aware ObjectNav framework along with simultaneous exploration and identification policies: A$\rightarrow$B , the agent was guided by an exploration policy to look for its target; B$\rightarrow$ C , the agent consistently identified a target object and finally called STOP.
Object goal navigation (ObjectNav) in unseen environments is a fundamental task for Embodied AI. Agents in existing works learn ObjectNav policies based on 2D maps, scene graphs, or image sequences. Considering this task happens in 3D space, a 3D-aware agent can advance its ObjectNav capability via learning from fine-grained spatial information. However, leveraging 3D scene representation can be prohibitively unpractical for policy learning in this floor-level task, due to low sample efficiency and expensive computational cost. In this work, we propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation.
We take in a posed RGB-D image at time step $t$ and perform point-based construction algorithm to online fuse a 3D scene representation ($M_{3D}^{(t)}$), along with a $M_{2D}^{(t)}$ from semantics projection. Then, we simultaneously leverage two policies, including a corner-guided exploration policy $\pi_e$ and category-awre identification policy $\pi_f$, to predict a discrete corner goal $g_e^{(t)}$ and a target goal $g_f^{(t)}$ (if exist) respectively. Finally, the local planning module will drive the agent to the given target goal $g_f^{(t)}$ (top priority) or the corner goal $g_e^{(t)}$.
Left: A robot takes multi-view observations during navigation. Right: The points $p$ are organized by dynamically allocated blocks $B$ and per-point octrees $O$, which can be used to query neighborhood points of any given point.
1 CFCS, Peking University
2 BAAI
3 CEIE, Tongji University
4 Huazhong University of Science and Technology
5 Tencent AI Lab
6 National University of Defense Technology
* equal contributions
† corresponding author
@article{zhang20223d,
title={3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification},
author={Zhang, Jiazhao and Dai, Liu and Meng, Fanpeng and Fan, Qingnan and Chen, Xuelin and Xu, Kai and Wang, He},
journal={arXiv preprint arXiv:2212.00338},
year={2022}
}
If you have any questions, please feel free to contact Jiazhao Zhang at zhngjizh_at_gmail_dot_com, Liu Dai at dailiu_dot_cndl_at_gmail_dot_com, and He Wang at hewang_at_pku_dot_edu_dot_cn