Empirical Remarks on the Translational Equivariance of Convolutional Layers
<p>Illustration of a translational equivariance.</p> "> Figure 2
<p>Spatial images and their corresponding frequency images. (<b>a</b>) spatial images of a zero-pixel shift to 5-pixel shifts from the center, (<b>b</b>) their corresponding frequency images having the same appearance.</p> "> Figure 3
<p>Digit images in the first row and their corresponding frequency images in the second row where they share the bright center area.</p> "> Figure 4
<p>The fusion strategy. (<b>a</b>) early fusion by concatenating raw image and its frequency image, (<b>b</b>) late fusion by concatenation, (<b>c</b>) late fusion by addition.</p> "> Figure 5
<p>The test accuracy for the Net-1 network of 3 × 3 kernels trained with spatial images only.</p> "> Figure 6
<p>The accuracy of the networks tested with N-pixel translated (depicted by Trans. N) images. (<b>a</b>) for the deep networks with varying convolutional layers of up to 17 with the same 3 × 3 kernel size, (<b>b</b>) for the Net-1 network with varying kernel sizes of up to 37.</p> "> Figure 7
<p>The comparison of networks with and without max-pooling by varying network depth with the same 3 × 3 kernel size. (<b>a</b>–<b>d</b>) for the Net-1, Net-2, Net-3 and Net-4 networks, respectively.</p> "> Figure 8
<p>The comparison of networks with and without augmentation within 2-pixel translation by varying network depth with the same 3 × 3 kernel size. (<b>a</b>–<b>d</b>) for the Net-1, Net-2, Net-3 and Net-4 networks, respectively.</p> "> Figure 9
<p>The comparison of networks with the same 3 × 3 kernel size. (<b>a</b>) for the Net-4 network with and without both max-pooling and augmentation (A&M) within 2-pixel translation. (<b>b</b>) for the Net-1 network with and without augmentation within 5-pixel translation.</p> ">
Abstract
:1. Introduction
2. Equivariance versus Invariance
3. Fusion Strategy for Frequency Image as Complementary Information
4. Experimental Results
4.1. Dataset
4.2. Experimental Setup
4.3. On the Transalational Invariance of Spatial Image Trained Networks
4.4. On the Translational Invariance of Fusion Strategy
4.5. On the Translational Invariance of Max-Pooling and Augmentation
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Pdf ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Chang, J.-R.; Chen, Y.-S. Batch-normalized Maxout Network in Network. arXiv 2015, arXiv:1511.02583. [Google Scholar]
- Van Dyk, D.; Meng, X.-L. The Art of Data Augmentation. J. Comput. Graph. Stat. 2001, 10, 1–50. [Google Scholar] [CrossRef]
- Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic Routing Between Capsules. In Neural Information Processing System NIPS; 2017; pp. 3859–3869. [Google Scholar]
- Chidester, B.; Do, M.N.; Ma, J. Rotation Equivariance and Invariance in Convolutional Neural Networks. arXiv 2018, arXiv:1805.12301. [Google Scholar]
- Worrall, D.E.; Garbin, S.J.; Turmukhambeto, D.; Brostow, G.J. Harmonic Networks: Deep Translation and Rotation Equvariance. arXiv 2016, arXiv:1612.04642. [Google Scholar]
- Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Cohen, T.; Welling, M. Group Equivariant Convolutional Networks. In Proceedings of the Machine Learning Research 48 (PMLR), New York, NY, USA, 20–22 June 2016; pp. 2990–2999. [Google Scholar]
- Levi, G.; Hassner, T.; Zhang, Z.; Cohen, P.; Bohus, D.; Horaud, R.; Meng, H. Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction—ICMI ’15, Seattle, WA, USA, 9–13 November 2015; pp. 503–510. [Google Scholar]
- Muhammad Anwer, R.; Khan, F.S.; van de Weijer, J.; Laaksonen, J. Tex-nets: Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition. In Proceedings of the ACM on International Conference on Multimedia Retrieval, Bucharest, Romania, 6–9 June 2017; pp. 125–132. [Google Scholar]
- Van Hoai, D.P.; Hoang, V.T. Feeding Convolutional Neural Network by hand-crafted features based on Enhanced Neighbor-Center Different Image for color texture classification. In Proceedings of the 2019 International Conference on Multimedia Analysis and Pattern Recognition (MAPR), Ho Chi Minh City, Vietnam, 9–10 May 2019; pp. 1–6. [Google Scholar]
- Hosseini, S.; Lee, S.H.; Cho, N.I. Feeding Hand-Crafted Features for Enhancing the Performance of Convolutional Neural Networks. arXiv 2018, arXiv:1801.07848. [Google Scholar]
- Geirhos, R.; Rubisch, P.; Michaelis, C.; Bethge, M.; Wichmann, F.A.; Brendel, W. ImageNet-Trained CNNs Are Biased Towards Texture. Increasing Shape Bias Improves Accuracy and Robustness. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image Style Transfer Using Convolutional Neural Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Jiang, R.; Mei, S. Polar Coordinate Convolutional Neural Network: From Rotation-Invariance to Translation-Invariance. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 355–359. [Google Scholar]
- Esteves, C.; Allen-Blanchette, C.; Zhou, X.; Daniilidis, K. Polar Transformer Networks. arXiv 2017, arXiv:1709.01889. [Google Scholar]
- Henriques, J.F.; Vedaldi, A. Warped convolutions: Efficient Invariance to Spatial Transformations. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, NSW, Australia, 6–11 August 2017; pp. 1461–1469. [Google Scholar]
- Xu, C.; Makihara, Y.; Li, X.; Yagi, Y.; Lu, J. Cross-View Gait Recognition using Pairwise Spatial Transformer Networks. IEEE Trans. Circuits Syst. Video Technol. 2020, 1. [Google Scholar] [CrossRef]
- Reddy, B.; Chatterji, B. An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Trans. Image Process. 1996, 5, 1266–1271. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nussbaumer, H.J. The Fast Fourier Transform. In Fast Fourier Transform and Convolution Algorithms; Springer: New York, NY, USA, 1981; pp. 80–111. [Google Scholar]
- Salazar, A.; Igual, J.; Safont, G.; Vergara, L.; Vidal, A. Image Applications of Agglomerative Clustering Using Mixtures of Non-Gaussian Distributions. In Proceedings of the 2015 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 7–9 December 2015; pp. 459–463. [Google Scholar]
- Comon, P.; Jutten, C. Handbook of Blind Source Separation: Independent Component Analysis and Applications; Academic Press: New York, NY, USA, 2010. [Google Scholar]
- LeCun, Y.; Cortes, C.; Burges, C.J. The MNIST Database. Available online: http://yann.lecun.com/exdb/mnist (accessed on 1 January 2020).
- Derrode, S.; Ghorbel, F. Robust and Efficient Fourier–Mellin Transform Approximations for Gray-Level Image Reconstruction and Complete Invariant Description. Comput. Vis. Image Underst. 2001, 83, 57–78. [Google Scholar] [CrossRef]
Translation by Pixel | No Fusion Baseline | Early Fusion | Late-Fusion Concatenation | Late-Fusion Addition | |||
---|---|---|---|---|---|---|---|
0 | 1.98 | 1.66 | (19.28) | 1.74 | (13.79) | 1.56 | (26.92) |
1 | 8.47 | 6.04 | (40.23) | 5.08 | (66.73) | 5.03 | (68.39) |
2 | 42.55 | 30.63 | (38.92) | 25.59 | (66.28) | 26.56 | (60.20) |
3 | 74.86 | 66.62 | (12.37) | 58.06 | (28.94) | 60.66 | (23.41) |
4 | 87.25 | 81.06 | (7.64) | 73.08 | (19.39) | 79.48 | (9.78) |
5 | 92.16 | 85.75 | (7.48) | 79.14 | (16.45) | 86.05 | (7.10) |
Translation by Pixel | No Fusion Baseline | Early Fusion | Late-Fusion Concatenation | Late-Fusion Addition | |||
---|---|---|---|---|---|---|---|
0 | 1.08 | 1.19 | (−9.24) | 1 | (8.00) | 0.99 | (9.09) |
1 | 4.51 | 3.67 | (22.89) | 2.95 | (52.88) | 4.07 | (10.81) |
2 | 31.16 | 26.67 | (16.84) | 19.17 | (62.55) | 29.62 | (5.20) |
3 | 70.93 | 71.15 | (−0.31) | 53.18 | (33.38) | 68.14 | (4.09) |
4 | 90.83 | 89.70 | (1.26) | 72.36 | (25.53) | 88.07 | (3.13) |
5 | 96.77 | 96.01 | (0.79) | 79.03 | (22.45) | 94.33 | (2.59) |
Translation by Pixel | No Fusion Baseline | Early Fusion | Late-Fusion Concatenation | Late-Fusion Addition | |||
---|---|---|---|---|---|---|---|
0 | 1.07 | 0.93 | (15.05) | 0.99 | (8.08) | 1.03 | (3.88) |
1 | 3.44 | 3.22 | (6.83) | 2.74 | (25.55) | 2.87 | (19.86) |
2 | 24.1 | 20.14 | (19.66) | 13.86 | (73.88) | 17.77 | (35.62) |
3 | 67.51 | 64.02 | (5.45) | 45.1 | (49.69) | 54.79 | (23.22) |
4 | 88.67 | 89.07 | (-0.45) | 69.23 | (28.08) | 79.05 | (12.17) |
5 | 96.15 | 95.69 | (0.48) | 79.18 | (21.43) | 87.12 | (10.37) |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheoi, K.J.; Choi, H.; Ko, J. Empirical Remarks on the Translational Equivariance of Convolutional Layers. Appl. Sci. 2020, 10, 3161. https://doi.org/10.3390/app10093161
Cheoi KJ, Choi H, Ko J. Empirical Remarks on the Translational Equivariance of Convolutional Layers. Applied Sciences. 2020; 10(9):3161. https://doi.org/10.3390/app10093161
Chicago/Turabian StyleCheoi, Kyung Joo, Hyeonyeong Choi, and Jaepil Ko. 2020. "Empirical Remarks on the Translational Equivariance of Convolutional Layers" Applied Sciences 10, no. 9: 3161. https://doi.org/10.3390/app10093161
APA StyleCheoi, K. J., Choi, H., & Ko, J. (2020). Empirical Remarks on the Translational Equivariance of Convolutional Layers. Applied Sciences, 10(9), 3161. https://doi.org/10.3390/app10093161