research-article

Open access

A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation

Authors:

Francesco Barbato,

Umberto Michieli,

Mehmet Kerim Yucel,

Pietro Zanuttigh,

Mete OzayAuthors Info & Claims

MMSys '24: Proceedings of the 15th ACM Multimedia Systems Conference

Pages 190 - 201

https://doi.org/10.1145/3625468.3647623

Published: 17 April 2024 Publication History

Abstract

Performance degradation caused by corrupted multimedia samples is a critical challenge for machine learning models. Previously, three groups of approaches have been proposed to tackle this issue: i) enhancer and denoiser modules to improve the quality of the noisy data, ii) data augmentation approaches, and iii) domain adaptation strategies. All have drawbacks limiting applicability; the first requires paired clean-corrupted data for training and has an high computational cost, while the others can only be used on the same task they were trained on. In this paper, we propose SyMPIE to solve these shortcomings, designing a small, modular, and efficient system to enhance input data for robust downstream multimedia understanding with minimal computational cost. Our SyMPIE is pre-trained on an upstream task/network that should not match the downstream ones and does not need paired clean-corrupted samples. Our key insight is that most input corruptions found in real-world tasks can be modeled through global operations on color channels of images or spatial filters with small kernels. We validate our approach on multiple datasets and tasks, such as image classification (on ImageNetC, ImageNetC-Bar, VizWiz, and a newly proposed mixed corruption benchmark named ImageNetC-mixed) and semantic segmentation (on Cityscapes, ACDC, and DarkZurich) with consistent improvements of about 5% relative accuracy gain across the board1.

References

[1]

Akbari, M., Liang, J., Han, J., and Tu, C. Generalized octave convolutions for learned multi-frequency image compression. arXiv preprint arXiv:2002.10032 (2020).

[2]

Albreiki, F., Ghazal, S. A., Lahoud, J., Anwer, R., Cholakkal, H., and Khan, F. On the robustness of 3d object detectors. In Proceedings of the 4th ACM International Conference on Multimedia in Asia (2022), pp. 1--7.

Digital Library

[3]

Bafghi, R. A., and Gurari, D. A new dataset based on images taken by blind people for testing the robustness of image classification models trained for imagenet categories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 16261--16270.

[4]

Barbato, F., Michieli, U., Toldo, M., and Zanuttigh, P. Road scenes segmentation across different domains by disentangling latent representations. arXiv preprint arXiv:2108.03021 (2021).

[5]

Barbato, F., Toldo, M., Michieli, U., and Zanuttigh, P. Latent space regularization for unsupervised domain adaptation in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2021), pp. 2835--2845.

[6]

Chen, G., Peng, P., Ma, L., Li, J., Du, L., and Tian, Y. Amplitude-phase recombination: Rethinking robustness of convolutional neural networks in frequency domain. In Proceedings of International Conference on Computer Vision (ICCV) (2021).

[7]

Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, 2017.

[8]

Chen, R., Huang, W., Huang, B., Sun, F., and Fang, B. Reusing discriminators for encoding: Towards unsupervised image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2020), pp. 8168--8177.

[9]

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).

[10]

Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., and Le, Q. V. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501 (2018).

[11]

Dhariwal, P., and Nichol, A. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34 (2021), 8780--8794.

[12]

Duan, L., Liu, J., Yang, W., Huang, T., and Gao, W. Video coding for machines: A paradigm of collaborative compression and intelligent analytics. IEEE Transactions on Image Processing 29 (2020), 8680--8695.

Digital Library

[13]

Feinman, R., Curtin, R. R., Shintre, S., and Gardner, A. B. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017).

[14]

Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., and Brendel, W. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018).

[15]

Gong, T., Jeong, J., Kim, T., Kim, Y., Shin, J., and Lee, S.-J. NOTE: Robust continual test-time adaptation against temporal correlation. In Advances in Neural Information Processing Systems (NeurIPS) (2022).

[16]

Grosse, K., Manoharan, P., Papernot, N., Backes, M., and McDaniel, P. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280 (2017).

[17]

Gündüz, D., Qin, Z., Aguerri, I. E., Dhillon, H. S., Yang, Z., Yener, A., Wong, K. K., and Chae, C.-B. Beyond transmitting bits: Context, semantics, and task-oriented communications. IEEE Journal on Selected Areas in Communications 41, 1 (2023), 5--41.

[18]

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition, 2015.

[19]

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).

[20]

Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F., Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M., et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 8340--8349.

[21]

Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F., Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M., Song, D., Steinhardt, J., and Gilmer, J. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of International Conference on Computer Vision (ICCV) (2021).

[22]

Hendrycks, D., and Dietterich, T. Benchmarking neural network robustness to common corruptions and perturbations. International Conference on Learning Representations (ICLR) (2019).

[23]

Hendrycks, D., and Dietterich, T. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261 (2019).

[24]

Hendrycks, D., Mu, N., Cubuk, E. D., Zoph, B., Gilmer, J., and Lakshminarayanan, B. AugMix: A simple data processing method to improve robustness and uncertainty. International Conference on Learning Representations (ICLR) (2020).

[25]

Hendrycks, D., Zou, A., Mazeika, M., Tang, L., Li, B., Song, D., and Steinhardt, J. Pixmix: Dreamlike pictures comprehensively improve safety measures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 16783--16792.

[26]

Hendrycks, D., Zou, A., Mazeika, M., Tang, L., Li, B., Song, D. X., and Steinhardt, J. Pixmix: Dreamlike pictures comprehensively improve safety measures. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021).

[27]

Hu, X., Uzunbas, M. G., Chen, S., Wang, R., Shah, A., Nevatia, R., and Lim, S.-N. Mixnorm: Test-time adaptation through online normalization estimation. International Conference on Learning Representations (ICLR) (2022).

[28]

Khurana, A., Paul, S., Rai, P., Biswas, S., and Aggarwal, G. SITA: Single Image Test-time Adaptation. ArXiv:2112.02355 (2021).

[29]

Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[30]

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13 (2017), 3521--3526.

[31]

Li, X., and Guo, X. Spn2d-gan: Semantic prior based night-to-day image-to-image translation. IEEE Transactions on Multimedia (2022).

[32]

Lin, H., Van Zuijlen, M., Pont, S. C., Wijntjes, M. W., and Bala, K. What can style transfer and paintings do for model robustness? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 11028--11037.

[33]

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (2021), pp. 10012--10022.

[34]

Lore, K. G., Akintayo, A., and Sarkar, S. Llnet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition 61 (2017), 650--662.

Digital Library

[35]

Lu, J., Issaranon, T., and Forsyth, D. Safetynet: Detecting and rejecting adversarial examples robustly. In Proceedings of the IEEE international conference on computer vision (2017), pp. 446--454.

[36]

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017).

[37]

Masood, S., Hussain, A., Jaffar, M. A., and Choi, T.-S. Color differences based fuzzy filter for extremely corrupted color images. Applied Soft Computing 21 (2014), 107--118.

[38]

Michieli, U., and Ozay, M. Online continual learning for robust indoor object recognition. In Proceedings of IEEE International Conference on Robots and Systems (IROS) (2023).

[39]

Miller, D. J., Xiang, Z., and Kesidis, G. Adversarial learning targeting deep neural network classification: A comprehensive review of defenses against attacks. Proceedings of the IEEE 108, 3 (2020), 402--433.

[40]

Mintun, E., Kirillov, A., and Xie, S. On interaction between augmentations and corruptions in natural corruption robustness. Advances in Neural Information Processing Systems 34 (2021), 3571--3583.

[41]

Modas, A., Rade, R., Ortrz-Jiménez, G., Moosavi-Dezfooli, S.-M., and Frossard, P. Prime: A few primitives can boost robustness to common corruptions. In European Conference on Computer Vision (2022), Springer, pp. 623--640.

Digital Library

[42]

Ni, Z., Yang, W., Wang, S., Ma, L., and Kwong, S. Towards unsupervised deep image enhancement with generative adversarial network. IEEE Transactions on Image Processing 29 (2020), 9140--9151.

[43]

Nie, X., Jia, J., Ding, H., and Wong, E. K. Gigan: Gate in gan, could gate mechanism filter the features in image-to-image translation? Neurocomputing 462 (2021), 376--388.

Digital Library

[44]

Niu, S., Wu, J., Zhang, Y., Wen, Z., Chen, Y., Zhao, P., and Tan, M. Towards stable test-time adaptation in dynamic wild world. In International Conference on Learning Representations (ICLR) (2023).

[45]

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I. Learning transferable visual models from natural language supervision. ArXiv:2103.00020 (2021).

[46]

Recht, B., Roelofs, R., Schmidt, L., and Shankar, V. Do imagenet classifiers generalize to imagenet? ArXiv:1902.10811 (2019).

[47]

Rizzoli, G., Barbato, F., Caligiuri, M., and Zanuttigh, P. Syndrone-multi-modal uav dataset for urban scenarios. In Proceedings of International Conference on Computer Vision Workshops (ICCVW) (2023), pp. 2210--2220.

[48]

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) (2015).

Digital Library

[49]

Saikia, T., Schmid, C., and Brox, T. Improving robustness against common corruptions with frequency biased models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 10211--10220.

[50]

Sakaridis, C., Dai, D., and Gool, L. V. Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), pp. 7374--7383.

[51]

Sakaridis, C., Dai, D., and Van Gool, L. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. In Proceedings of International Conference on Computer Vision (ICCV) (2021).

[52]

Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[53]

Sovrasov, V. ptflops: a flops counting tool for neural networks in pytorch framework, 2018--2023.

[54]

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).

[55]

Testolina, P., Barbato, F., Michieli, U., Giordani, M., Zanuttigh, P., and Zorzi, M. Selma: Semantic large-scale multimodal acquisitions in variable weather, daytime and viewpoints. IEEE Transactions on Intelligent Transportation Systems (2023).

Digital Library

[56]

Tian, C., Fei, L., Zheng, W., Xu, Y., Zuo, W., and Lin, C.-W. Deep learning on image denoising: An overview. Neural Networks 131 (2020), 251--275.

[57]

TorchVision maintainers and contributors. TorchVision: PyTorch's Computer Vision library. https://github.com/pytorch/vision, 2016.

[58]

Ulhaq, A., Akhtar, N., and Pogrebna, G. Efficient diffusion models for vision: A survey, 2022.

[59]

Wang, D., Shelhamer, E., Liu, S., Olshausen, B., and Darrell, T. Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations (ICLR) (2021).

[60]

Wang, H., Gui, S., Yang, H., Liu, J., and Wang, Z. Gan slimming: All-in-one gan compression by a unified optimization framework. In European Conference on Computer Vision (2020), Springer, pp. 54--73.

Digital Library

[61]

Wang, H., Xiao, C., Kossaifi, J., Yu, Z., Anandkumar, A., and Wang, Z. Augmax: Adversarial composition of random augmentations for robust training. In Advances in Neural Information Processing Systems (NeurIPS) (2021).

[62]

Wang, R., Sun, Y., Gandelsman, Y., Chen, X., Efros, A. A., and Wang, X. Test-time training on video streams. arXiv:2307.05014 (2023).

[63]

Xu, W., Evans, D., and Qi, Y. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017).

[64]

Yacoby, Y., Pan, W., and Doshi-Velez, F. Failure modes of variational autoencoders and their effects on downstream tasks. arXiv preprint arXiv:2007.07124 (2020).

[65]

Yi, X., Xu, H., Zhang, H., Tang, L., and Ma, J. Diff-retinex: Rethinking low-light image enhancement with a generative diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 12302--12311.

[66]

Yucel, M. K., Cinbis, R. G., and Duygulu, P. How robust are discriminatively trained zero-shot learning models? Image and Vision Computing 119 (2022), 104392.

Digital Library

[67]

Yucel, M. K., Cinbis, R. G., and Duygulu, P. Hybridaugment++: Unified frequency spectra perturbations for model robustness. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 5718--5728.

[68]

Zhang, H., Liu, S., Wang, C., Lu, S., and Xiong, W. Color-patterned fabric defect detection algorithm based on triplet attention multi-scale u-shape denoising convolutional auto-encoder. The Journal of Supercomputing (2023), 1--26.

[69]

Zhang, M., Levine, S., and Finn, C. Memo: Test time robustness via adaptation and augmentation. Advances in Neural Information Processing Systems (NeurIPS) (2022).

[70]

Zheng, J., Valavanis, K. P., and Gauch, J. M. Noise removal from color images. Journal of Intelligent and Robotic Systems 7 (1993), 257--285.

Index Terms

A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
        Reconstruction
      2. Computer vision tasks
        Scene understanding
        Vision for robotics

Recommendations

Re-thinking model robustness from stability: a new insight to defend adversarial examples
Abstract
We study the model robustness against adversarial examples, referred to as small perturbed input data that may however fool many state-of-the-art deep learning models. Unlike previous research, we establish a novel theory addressing the robustness ...
Anomaly Detection with Robust Deep Autoencoders
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Deep autoencoders, and other deep neural networks, have demonstrated their effectiveness in discovering non-linear features across many problem domains. However, in many real-world problems, large outliers and pervasive noise are commonplace, and one ...
Resisting Adversarial Examples via Wavelet Extension and Denoising
Smart Computing and Communication
Abstract
It is well known that Deep Neural Networks are vulnerable to adversarial examples. An adversary can inject carefully-crafted perturbations on clean input to manipulate the model output. In this paper, we propose a novel method, WED (Wavelet ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MMSys '24: Proceedings of the 15th ACM Multimedia Systems Conference

April 2024

557 pages

ISBN:9798400704123

DOI:10.1145/3625468

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

University of Padova

Conference

MMSys '24

Sponsor:

SIGMM

MMSys '24: ACM Multimedia Systems Conference 2024

April 15 - 18, 2024

Bari, Italy

Acceptance Rates

Overall Acceptance Rate 176 of 530 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
148
Total Downloads

Downloads (Last 12 months)148
Downloads (Last 6 weeks)24

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents