Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3625468.3647623acmconferencesArticle/Chapter ViewAbstractPublication PagesmmsysConference Proceedingsconference-collections
research-article
Open access

A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation

Published: 17 April 2024 Publication History

Abstract

Performance degradation caused by corrupted multimedia samples is a critical challenge for machine learning models. Previously, three groups of approaches have been proposed to tackle this issue: i) enhancer and denoiser modules to improve the quality of the noisy data, ii) data augmentation approaches, and iii) domain adaptation strategies. All have drawbacks limiting applicability; the first requires paired clean-corrupted data for training and has an high computational cost, while the others can only be used on the same task they were trained on. In this paper, we propose SyMPIE to solve these shortcomings, designing a small, modular, and efficient system to enhance input data for robust downstream multimedia understanding with minimal computational cost. Our SyMPIE is pre-trained on an upstream task/network that should not match the downstream ones and does not need paired clean-corrupted samples. Our key insight is that most input corruptions found in real-world tasks can be modeled through global operations on color channels of images or spatial filters with small kernels. We validate our approach on multiple datasets and tasks, such as image classification (on ImageNetC, ImageNetC-Bar, VizWiz, and a newly proposed mixed corruption benchmark named ImageNetC-mixed) and semantic segmentation (on Cityscapes, ACDC, and DarkZurich) with consistent improvements of about 5% relative accuracy gain across the board1.

References

[1]
Akbari, M., Liang, J., Han, J., and Tu, C. Generalized octave convolutions for learned multi-frequency image compression. arXiv preprint arXiv:2002.10032 (2020).
[2]
Albreiki, F., Ghazal, S. A., Lahoud, J., Anwer, R., Cholakkal, H., and Khan, F. On the robustness of 3d object detectors. In Proceedings of the 4th ACM International Conference on Multimedia in Asia (2022), pp. 1--7.
[3]
Bafghi, R. A., and Gurari, D. A new dataset based on images taken by blind people for testing the robustness of image classification models trained for imagenet categories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 16261--16270.
[4]
Barbato, F., Michieli, U., Toldo, M., and Zanuttigh, P. Road scenes segmentation across different domains by disentangling latent representations. arXiv preprint arXiv:2108.03021 (2021).
[5]
Barbato, F., Toldo, M., Michieli, U., and Zanuttigh, P. Latent space regularization for unsupervised domain adaptation in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2021), pp. 2835--2845.
[6]
Chen, G., Peng, P., Ma, L., Li, J., Du, L., and Tian, Y. Amplitude-phase recombination: Rethinking robustness of convolutional neural networks in frequency domain. In Proceedings of International Conference on Computer Vision (ICCV) (2021).
[7]
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, 2017.
[8]
Chen, R., Huang, W., Huang, B., Sun, F., and Fang, B. Reusing discriminators for encoding: Towards unsupervised image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2020), pp. 8168--8177.
[9]
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).
[10]
Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., and Le, Q. V. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501 (2018).
[11]
Dhariwal, P., and Nichol, A. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34 (2021), 8780--8794.
[12]
Duan, L., Liu, J., Yang, W., Huang, T., and Gao, W. Video coding for machines: A paradigm of collaborative compression and intelligent analytics. IEEE Transactions on Image Processing 29 (2020), 8680--8695.
[13]
Feinman, R., Curtin, R. R., Shintre, S., and Gardner, A. B. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017).
[14]
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., and Brendel, W. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018).
[15]
Gong, T., Jeong, J., Kim, T., Kim, Y., Shin, J., and Lee, S.-J. NOTE: Robust continual test-time adaptation against temporal correlation. In Advances in Neural Information Processing Systems (NeurIPS) (2022).
[16]
Grosse, K., Manoharan, P., Papernot, N., Backes, M., and McDaniel, P. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280 (2017).
[17]
Gündüz, D., Qin, Z., Aguerri, I. E., Dhillon, H. S., Yang, Z., Yener, A., Wong, K. K., and Chae, C.-B. Beyond transmitting bits: Context, semantics, and task-oriented communications. IEEE Journal on Selected Areas in Communications 41, 1 (2023), 5--41.
[18]
He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition, 2015.
[19]
He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).
[20]
Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F., Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M., et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 8340--8349.
[21]
Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F., Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M., Song, D., Steinhardt, J., and Gilmer, J. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of International Conference on Computer Vision (ICCV) (2021).
[22]
Hendrycks, D., and Dietterich, T. Benchmarking neural network robustness to common corruptions and perturbations. International Conference on Learning Representations (ICLR) (2019).
[23]
Hendrycks, D., and Dietterich, T. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261 (2019).
[24]
Hendrycks, D., Mu, N., Cubuk, E. D., Zoph, B., Gilmer, J., and Lakshminarayanan, B. AugMix: A simple data processing method to improve robustness and uncertainty. International Conference on Learning Representations (ICLR) (2020).
[25]
Hendrycks, D., Zou, A., Mazeika, M., Tang, L., Li, B., Song, D., and Steinhardt, J. Pixmix: Dreamlike pictures comprehensively improve safety measures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 16783--16792.
[26]
Hendrycks, D., Zou, A., Mazeika, M., Tang, L., Li, B., Song, D. X., and Steinhardt, J. Pixmix: Dreamlike pictures comprehensively improve safety measures. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021).
[27]
Hu, X., Uzunbas, M. G., Chen, S., Wang, R., Shah, A., Nevatia, R., and Lim, S.-N. Mixnorm: Test-time adaptation through online normalization estimation. International Conference on Learning Representations (ICLR) (2022).
[28]
Khurana, A., Paul, S., Rai, P., Biswas, S., and Aggarwal, G. SITA: Single Image Test-time Adaptation. ArXiv:2112.02355 (2021).
[29]
Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[30]
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13 (2017), 3521--3526.
[31]
Li, X., and Guo, X. Spn2d-gan: Semantic prior based night-to-day image-to-image translation. IEEE Transactions on Multimedia (2022).
[32]
Lin, H., Van Zuijlen, M., Pont, S. C., Wijntjes, M. W., and Bala, K. What can style transfer and paintings do for model robustness? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 11028--11037.
[33]
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (2021), pp. 10012--10022.
[34]
Lore, K. G., Akintayo, A., and Sarkar, S. Llnet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition 61 (2017), 650--662.
[35]
Lu, J., Issaranon, T., and Forsyth, D. Safetynet: Detecting and rejecting adversarial examples robustly. In Proceedings of the IEEE international conference on computer vision (2017), pp. 446--454.
[36]
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017).
[37]
Masood, S., Hussain, A., Jaffar, M. A., and Choi, T.-S. Color differences based fuzzy filter for extremely corrupted color images. Applied Soft Computing 21 (2014), 107--118.
[38]
Michieli, U., and Ozay, M. Online continual learning for robust indoor object recognition. In Proceedings of IEEE International Conference on Robots and Systems (IROS) (2023).
[39]
Miller, D. J., Xiang, Z., and Kesidis, G. Adversarial learning targeting deep neural network classification: A comprehensive review of defenses against attacks. Proceedings of the IEEE 108, 3 (2020), 402--433.
[40]
Mintun, E., Kirillov, A., and Xie, S. On interaction between augmentations and corruptions in natural corruption robustness. Advances in Neural Information Processing Systems 34 (2021), 3571--3583.
[41]
Modas, A., Rade, R., Ortrz-Jiménez, G., Moosavi-Dezfooli, S.-M., and Frossard, P. Prime: A few primitives can boost robustness to common corruptions. In European Conference on Computer Vision (2022), Springer, pp. 623--640.
[42]
Ni, Z., Yang, W., Wang, S., Ma, L., and Kwong, S. Towards unsupervised deep image enhancement with generative adversarial network. IEEE Transactions on Image Processing 29 (2020), 9140--9151.
[43]
Nie, X., Jia, J., Ding, H., and Wong, E. K. Gigan: Gate in gan, could gate mechanism filter the features in image-to-image translation? Neurocomputing 462 (2021), 376--388.
[44]
Niu, S., Wu, J., Zhang, Y., Wen, Z., Chen, Y., Zhao, P., and Tan, M. Towards stable test-time adaptation in dynamic wild world. In International Conference on Learning Representations (ICLR) (2023).
[45]
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I. Learning transferable visual models from natural language supervision. ArXiv:2103.00020 (2021).
[46]
Recht, B., Roelofs, R., Schmidt, L., and Shankar, V. Do imagenet classifiers generalize to imagenet? ArXiv:1902.10811 (2019).
[47]
Rizzoli, G., Barbato, F., Caligiuri, M., and Zanuttigh, P. Syndrone-multi-modal uav dataset for urban scenarios. In Proceedings of International Conference on Computer Vision Workshops (ICCVW) (2023), pp. 2210--2220.
[48]
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) (2015).
[49]
Saikia, T., Schmid, C., and Brox, T. Improving robustness against common corruptions with frequency biased models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 10211--10220.
[50]
Sakaridis, C., Dai, D., and Gool, L. V. Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), pp. 7374--7383.
[51]
Sakaridis, C., Dai, D., and Van Gool, L. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. In Proceedings of International Conference on Computer Vision (ICCV) (2021).
[52]
Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[53]
Sovrasov, V. ptflops: a flops counting tool for neural networks in pytorch framework, 2018--2023.
[54]
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
[55]
Testolina, P., Barbato, F., Michieli, U., Giordani, M., Zanuttigh, P., and Zorzi, M. Selma: Semantic large-scale multimodal acquisitions in variable weather, daytime and viewpoints. IEEE Transactions on Intelligent Transportation Systems (2023).
[56]
Tian, C., Fei, L., Zheng, W., Xu, Y., Zuo, W., and Lin, C.-W. Deep learning on image denoising: An overview. Neural Networks 131 (2020), 251--275.
[57]
TorchVision maintainers and contributors. TorchVision: PyTorch's Computer Vision library. https://github.com/pytorch/vision, 2016.
[58]
Ulhaq, A., Akhtar, N., and Pogrebna, G. Efficient diffusion models for vision: A survey, 2022.
[59]
Wang, D., Shelhamer, E., Liu, S., Olshausen, B., and Darrell, T. Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations (ICLR) (2021).
[60]
Wang, H., Gui, S., Yang, H., Liu, J., and Wang, Z. Gan slimming: All-in-one gan compression by a unified optimization framework. In European Conference on Computer Vision (2020), Springer, pp. 54--73.
[61]
Wang, H., Xiao, C., Kossaifi, J., Yu, Z., Anandkumar, A., and Wang, Z. Augmax: Adversarial composition of random augmentations for robust training. In Advances in Neural Information Processing Systems (NeurIPS) (2021).
[62]
Wang, R., Sun, Y., Gandelsman, Y., Chen, X., Efros, A. A., and Wang, X. Test-time training on video streams. arXiv:2307.05014 (2023).
[63]
Xu, W., Evans, D., and Qi, Y. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017).
[64]
Yacoby, Y., Pan, W., and Doshi-Velez, F. Failure modes of variational autoencoders and their effects on downstream tasks. arXiv preprint arXiv:2007.07124 (2020).
[65]
Yi, X., Xu, H., Zhang, H., Tang, L., and Ma, J. Diff-retinex: Rethinking low-light image enhancement with a generative diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 12302--12311.
[66]
Yucel, M. K., Cinbis, R. G., and Duygulu, P. How robust are discriminatively trained zero-shot learning models? Image and Vision Computing 119 (2022), 104392.
[67]
Yucel, M. K., Cinbis, R. G., and Duygulu, P. Hybridaugment++: Unified frequency spectra perturbations for model robustness. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), pp. 5718--5728.
[68]
Zhang, H., Liu, S., Wang, C., Lu, S., and Xiong, W. Color-patterned fabric defect detection algorithm based on triplet attention multi-scale u-shape denoising convolutional auto-encoder. The Journal of Supercomputing (2023), 1--26.
[69]
Zhang, M., Levine, S., and Finn, C. Memo: Test time robustness via adaptation and augmentation. Advances in Neural Information Processing Systems (NeurIPS) (2022).
[70]
Zheng, J., Valavanis, K. P., and Gauch, J. M. Noise removal from color images. Journal of Intelligent and Robotic Systems 7 (1993), 257--285.

Index Terms

  1. A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          MMSys '24: Proceedings of the 15th ACM Multimedia Systems Conference
          April 2024
          557 pages
          ISBN:9798400704123
          DOI:10.1145/3625468
          This work is licensed under a Creative Commons Attribution International 4.0 License.

          Sponsors

          In-Cooperation

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 17 April 2024

          Check for updates

          Author Tags

          1. Content Enhancement
          2. Denoising
          3. Image Classification
          4. Image Segmentation
          5. Model Robustness

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Funding Sources

          • University of Padova

          Conference

          MMSys '24
          Sponsor:

          Acceptance Rates

          Overall Acceptance Rate 176 of 530 submissions, 33%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 148
            Total Downloads
          • Downloads (Last 12 months)148
          • Downloads (Last 6 weeks)24
          Reflects downloads up to 19 Nov 2024

          Other Metrics

          Citations

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media