End-to-End Supervised Stereo Imaging-Based Method for Depth Estimation
Posted:
Proposed Architecture
The proposed approach employs two encoders and one decoder. The encoders extract the features of stereo images, and the decoder reconstructs the depth map.
- An attention module named Spatial and Channel Attention Module (SCAM) is incorporated in the bottleneck to combine and emphasize the most meaningful features of encoders.
- Moreover, for better gradient propagation and faster convergence, the decoder module reuses the feature maps of encoders in two different ways, i.e., (1) concatenation and (2) element-wise addition using non-identity mapping.
- Extensive ablation studies are conducted to evaluate the proposed architecture’s effectiveness and strategies. The experiments are conducted on three publicly available datasets: RGB+D scene, Cityscapes, and KITTI dataset