FastFCN Rethinking Dilated Convolution in the Backbone for Semantic Segmentation

Motivation

  • Dilated convolutions are effective but bring heavy computation complexity and memory footprint.

Contributions

  • Propose a computationally efficient joint upsampling module (JPU) to replace diated convolutions.
  • Reduce the computation time and memory footprint of the whole segmentation network by a factor of more than 3.
  • Achieve the new state-of-the-art performance in Pascal Context dataset and ADE20K dataset.

Methods

  • Joint upsampling

    yh - high-resolution target image, yl - low-resolution target image, xh - high-resolution guidance image, hl - low-resolution guidance image

  • JPU module

    • Problem formulation



      S - split, M - merge, Cr - regular convolution, Cd - dilated convolution, Cs - stride convolution, adjacent S and M operations can be canceled out, ys - output feature map from normal FCN (32x), yd - output feature map from DilatedFCN (8x).


    y is an approximation of yd. Approximating yd using ys is the same as the joint upsampling problem.

    • Problem solving

Results

  • Pascal Context dataset

  • ADE20K dataset

Conclusion

  • The JPU module is effective on improving the segmentation results and reducing the computation complexity.
  • The analysis regarding formulating the problem of approximating feature output of DilatedFCN based on feature output of normal FCN is interesting but solving the problem using the proposed JPU module is not demonstrated very well.
  • The JPU module is like a simple version of ASPP. Taking the three level feature outputs and feed them to ASPP may generate better results.
  • Depthwise separable convolution was utilized and need a deep investigation.