arrow
Volume 17, Issue 2
LTDNet: A Lightweight Two-Stage Decoder Network for RGB-D Salient Object Detection

Jian Wang & Wenbing Chen

J. Info. Comput. Sci. , 17 (2022), pp. 102-117.

Export citation
  • Abstract

Most existing models of RGB-D salient object detection (SOD) utilize heavy backbones like VGGs and ResNets which lead to large model size and high computational costs. In order to improve this problem, a lightweight two-stage decoder network is proposed. Firstly, the network utilizes MobileNet-V2 and a customized backbone to extract the features of RGB images and depth maps respectively. In order to mine and combine cross-modality information, cross reference module is used to fuse complementary information from different modalities. Subsequently, we design a feature enhancement module to enhance the clues of the fused features which has four parallel convolutions with different expansion rates. Finally, a two-stage decoder is used to predict the saliency maps, which processes high-level features and low-level features separately and then merges them. Experiments on 5 benchmark datasets comparing with 10 state-of-the-art models demonstrate that our model can achieve significant improvement with smallest model size.

  • AMS Subject Headings

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{JICS-17-102, author = {Wang , Jian and Chen , Wenbing}, title = {LTDNet: A Lightweight Two-Stage Decoder Network for RGB-D Salient Object Detection}, journal = {Journal of Information and Computing Science}, year = {2024}, volume = {17}, number = {2}, pages = {102--117}, abstract = {

Most existing models of RGB-D salient object detection (SOD) utilize heavy backbones like VGGs and ResNets which lead to large model size and high computational costs. In order to improve this problem, a lightweight two-stage decoder network is proposed. Firstly, the network utilizes MobileNet-V2 and a customized backbone to extract the features of RGB images and depth maps respectively. In order to mine and combine cross-modality information, cross reference module is used to fuse complementary information from different modalities. Subsequently, we design a feature enhancement module to enhance the clues of the fused features which has four parallel convolutions with different expansion rates. Finally, a two-stage decoder is used to predict the saliency maps, which processes high-level features and low-level features separately and then merges them. Experiments on 5 benchmark datasets comparing with 10 state-of-the-art models demonstrate that our model can achieve significant improvement with smallest model size.

}, issn = {1746-7659}, doi = {https://doi.org/}, url = {http://global-sci.org/intro/article_detail/jics/22353.html} }
TY - JOUR T1 - LTDNet: A Lightweight Two-Stage Decoder Network for RGB-D Salient Object Detection AU - Wang , Jian AU - Chen , Wenbing JO - Journal of Information and Computing Science VL - 2 SP - 102 EP - 117 PY - 2024 DA - 2024/01 SN - 17 DO - http://doi.org/ UR - https://global-sci.org/intro/article_detail/jics/22353.html KW - salient object detection, RGB-D, lightweight, efficient. AB -

Most existing models of RGB-D salient object detection (SOD) utilize heavy backbones like VGGs and ResNets which lead to large model size and high computational costs. In order to improve this problem, a lightweight two-stage decoder network is proposed. Firstly, the network utilizes MobileNet-V2 and a customized backbone to extract the features of RGB images and depth maps respectively. In order to mine and combine cross-modality information, cross reference module is used to fuse complementary information from different modalities. Subsequently, we design a feature enhancement module to enhance the clues of the fused features which has four parallel convolutions with different expansion rates. Finally, a two-stage decoder is used to predict the saliency maps, which processes high-level features and low-level features separately and then merges them. Experiments on 5 benchmark datasets comparing with 10 state-of-the-art models demonstrate that our model can achieve significant improvement with smallest model size.

Wang , Jian and Chen , Wenbing. (2024). LTDNet: A Lightweight Two-Stage Decoder Network for RGB-D Salient Object Detection. Journal of Information and Computing Science. 17 (2). 102-117. doi:
Copy to clipboard
The citation has been copied to your clipboard