Layered Depth Refinement with Mask Guidance

CVPR 2022

Soo Ye Kim Jianming Zhang Simon Niklaus Yifei Fan
KAIST Adobe Adobe Adobe
Simon Chen Zhe Lin Munchurl Kim
Adobe Adobe KAIST
Paper arXiv GitHub

teaser

Our layered depth refinement result on an initial prediction by DPT. Aided by a high-quality mask M, automatically generated using a commercial tool, our method is able to accurately refine mask boundaries and correct depth values in isolated background regions. Regions in M and 1-M are refined and inpainted/outpainted separately with our layered approach.


Abstract

Depth maps are used in a wide range of applications from 3D rendering to 2D image effects such as Bokeh. However, those predicted by single image depth estimation (SIDE) models often fail to capture isolated holes in objects and/or have inaccurate boundary regions. Meanwhile, high-quality masks are much easier to obtain, using commercial auto-masking tools or off-the-shelf methods of segmentation and matting or even by manual editing. Hence, in this paper, we formulate a novel problem of mask-guided depth map refinement that utilizes a generic mask to refine the depth prediction of SIDE models. Our framework performs layered refinement and inpainting/outpainting, decomposing the depth map into two separate layers signified by the mask and the inverse mask. As datasets with both depth and mask annotations are scarce, we propose a self-supervised learning scheme that uses arbitrary masks and RGB-D datasets. We empirically show that our method is robust to different types of masks and initial depth predictions, accurately refining depth values in inner and outer mask boundary regions. We further analyze our model with an ablation study and demonstrate results on real applications.

Demo

Select an image from the left and hover over the buttons on the right to compare the initial depth map by DPT [1] and the refined results by Boosting [2] and Ours. All images are from unsplash, pixabay and Adobe Stock.

[1] Ranftl et al., Vision Transformers for Dense Prediction, ICCV, 2021.
[2] Miangoleh et al., Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging, CVPR, 2021.

BibTeX

@inproceedings{kim2022layered,
  title     = {Layered Depth Refinement with Mask Guidance},
  author    = {Kim, Soo Ye and Zhang, Jianming and Niklaus, Simon and Fan, Yifei and Lin, Zhe and Kim, Munchurl},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2022}
}