research-article

Prediction With Visual Evidence: Sketch Classification Explanation via Stroke-Level Attributions

Authors:

Xiaochun CaoAuthors Info & Claims

IEEE Transactions on Image Processing, Volume 32

Pages 4393 - 4406

https://doi.org/10.1109/TIP.2023.3297404

Published: 01 January 2023 Publication History

Abstract

Sketch classification models have been extensively investigated by designing a task-driven deep neural network. Despite their successful performances, few works have attempted to explain the prediction of sketch classifiers. To explain the prediction of classifiers, an intuitive way is to visualize the activation maps via computing the gradients. However, visualization based explanations are constrained by several factors when directly applying them to interpret the sketch classifiers: (i) low-semantic visualization regions for human understanding. and (ii) neglecting of the inter-class correlations among distinct categories. To address these issues, we introduce a novel explanation method to interpret the decision of sketch classifiers with stroke-level evidences. Specifically, to achieve stroke-level semantic regions, we first develop a sketch parser that parses the sketch into strokes while preserving their geometric structures. Then, we design a counterfactual map generator to discover the stroke-level principal components for a specific category. Finally, based on the counterfactual feature maps, our model could explain the question of “why the sketch is classified as X” by providing positive and negative semantic explanation evidences. Experiments conducted on two public sketch benchmarks, Sketchy-COCO and TU-Berlin, demonstrate the effectiveness of our proposed model. Furthermore, our model could provide more discriminative and human understandable explanations compared with these existing works.

References

[1]

D. Liu, J. Li, N. Wang, C. Peng, and X. Gao, “Composite components-based face sketch recognition,” Neurocomputing, vol. 302, pp. 46–54, Aug. 2018.

[2]

M. Eitz, J. Hays, and M. Alexa, “How do humans sketch objects?” ACM Trans. Graph., vol. 31, no. 4, pp. 1–10, Aug. 2012.

Digital Library

[3]

R. G. Schneider and T. Tuytelaarsy, “Sketch classification and classification-driven analysis using Fisher vectors,” ACM Trans. Graph., vol. 33, no. 6, pp. 174:1–174:9, Nov. 2014.

Digital Library

[4]

H. Zhang, S. Liu, C. Zhang, W. Ren, R. Wang, and X. Cao, “SketchNet: Sketch classification with web images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1105–1113.

[5]

J. Guo, C. Wang, E. Roman-Rangel, H. Chao, and Y. Rui, “Building hierarchical representations for Oracle character and sketch recognition,” IEEE Trans. Image Process., vol. 25, no. 1, pp. 104–118, Jan. 2016.

Digital Library

[6]

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 618–626.

[7]

K. Schulz, L. Sixt, F. Tombari, and T. Landgraf, “Restricting the flow: Information bottlenecks for attribution,” in Proc. ICLR, 2020, pp. 1–18.

[8]

S. Desai and H. G. Ramaswamy, “Ablation-CAM: Visual explanations for deep convolutional network via gradient-free localization,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2020, pp. 972–980.

[9]

J. Borowskiet al., “Exemplary natural images explain CNN activations better than state-of-the-art feature visualization,” in Proc. ICLR, 2021, pp. 1–41.

[10]

V. Shitole, F. Li, M. Kahng, P. Tadepalli, and A. Fern, “One explanation is not enough: Structured attention graphs for image classification,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 11352–11363.

[11]

D. Slack, A. Hilgard, S. Singh, and H. Lakkaraju, “Reliable post hoc explanations: Modeling uncertainty in explainability,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 9391–9404.

[12]

A. Khakzar, P. Khorsandi, R. Nobahari, and N. Navab, “Do explanations explain? Model knows best,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 10234–10243.

[13]

N. Garau, N. Bisagno, Z. Sambugaro, and N. Conci, “Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 13679–13688.

[14]

W. Jin, X. Li, and G. Hamarneh, “Evaluating explainable AI on a multi-modal medical imaging task: Can existing algorithms fulfill clinical requirements?” in Proc. AAAI Conf. Artif. Intell., 2022, pp. 11945–11953.

[15]

D. Hong, S. S. Baek, and T. Wang, “Interpretable sequence classification via prototype trajectory,” 2020, arXiv:2007.01777.

[16]

S. Liuet al., “What makes a good movie trailer? Interpretation from simultaneous EEG and eyetracker recording,” in Proc. ACM MM, 2016, pp. 82–86.

[17]

Y. Rao, G. Chen, J. Lu, and J. Zhou, “Counterfactual attention learning for fine-grained visual categorization and re-identification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 1005–1014.

[18]

Q. Zheng, Z. Wang, J. Zhou, and J. Lu, “Shap-CAM: Visual explanations for convolutional neural networks based on Shapley value,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2022, pp. 459–474.

[19]

S. Verma, J. Dickerson, and K. Hines, “Counterfactual explanations for machine learning: A review,” in Proc. NeurIPS Workshop, 2020, pp. 1–22.

[20]

J. Xu, G. Chen, J. Lu, and J. Zhou, “Unintentional action localization via counterfactual examples,” IEEE Trans. Image Process., vol. 31, pp. 3281–3294, 2022.

[21]

G. Chen, J. Li, J. Lu, and J. Zhou, “Human trajectory prediction via counterfactual analysis,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 9804–9813.

[22]

C. Gao, Q. Liu, Q. Xu, L. Wang, J. Liu, and C. Zou, “SketchyCOCO: Image generation from freehand scene sketches,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5173–5182.

[23]

F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” Statistics, vol. 1050, p. 2, 2017.

[24]

G. Alicioglu and B. Sun, “A survey of visual analytics for explainable artificial intelligence methods,” Comput. Graph., vol. 102, pp. 502–520, Feb. 2022.

Digital Library

[25]

L. A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele, and T. Darrell, “Generating visual explanations,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 3–19.

[26]

L. A. Hendricks, R. Hu, T. Darrell, and Z. Akata, “Generating counterfactual explanations with natural language,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 1–4.

[27]

P. Wang and N. Vasconcelos, “SCOUT: Self-aware discriminant counterfactual explanations,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 8978–8987.

[28]

A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian, “Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2018, pp. 839–847.

[29]

M. Ancona, E. Ceolini, C. Öztireli, and M. Gross, “A unified view of gradient-based attribution methods for deep neural networks,” in Proc. NIPS Workshop Interpreting, Explaining Visualizing Deep Learn.-Now What (NIPS). Zürich, Switzerland: ETH Zürich, 2017.

[30]

M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 3319–3328.

[31]

Z. Qi, S. Khorram, and F. Li, “Visualizing deep networks by optimizing with integrated gradients,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9621–9630.

[32]

C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and J. K. Su, “This looks like that: Deep learning for interpretable image recognition,” in Proc. 33rd Conf. Neural Inf. Process. Syst., vol. 32, 2019, pp. 8928–8939.

[33]

Q. Zhang, R. Cao, F. Shi, Y. N. Wu, and S.-C. Zhu, “Interpreting CNN knowledge via an explanatory graph,” in Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, 2018, pp. 4454–4463.

[34]

R. K. Sarvadevabhatla, I. Dwivedi, A. Biswas, and S. Manocha, “SketchParse: Towards rich descriptions for poorly drawn sketches using multi-task hierarchical deep networks,” in Proc. 25th ACM Int. Conf. Multimedia, Oct. 2017, pp. 10–18.

[35]

A. Kapishnikov, S. Venugopalan, B. Avci, B. Wedin, M. Terry, and T. Bolukbasi, “Guided integrated gradients: An adaptive path method for removing noise,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 5048–5056.

[36]

K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” in Proc. Int. Conf. Learn. Represent., 2014, pp. 1–8.

[37]

A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” in Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 3145–3153.

[38]

A. Kapishnikov, T. Bolukbasi, F. Viegas, and M. Terry, “XRAI: Better attributions through regions,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 4947–4956.

[39]

H. Wanget al., “Score-CAM: Score-weighted visual explanations for convolutional neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2020, pp. 111–119.

[40]

Y. Geet al., “A peek into the reasoning of neural networks: Interpreting with structural visual concepts,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 2195–2204.

[41]

S. Gulshad and A. Smeulders, “Explaining with counter visual attributes and examples,” in Proc. Int. Conf. Multimedia Retr., Jun. 2020, pp. 35–43.

[42]

R. C. Fong and A. Vedaldi, “Interpretable explanations of black boxes by meaningful perturbation,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 3449–3457.

[43]

V. Petsiuk, A. Das, and K. Saenko, “RISE: Randomized input sampling for explanation of black-box models,” in Proc. BMVC, 2018, pp. 151–164.

[44]

J. Wagner, J. M. Köhler, T. Gindele, L. Hetzel, J. T. Wiedemer, and S. Behnke, “Interpretable and fine-grained visual explanations for convolutional neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 9089–9099.

[45]

C.-H. Chang, E. Creager, A. Goldenberg, and D. Duvenaud, “Explaining image classifiers by adaptive dropout and generative in-filling,” 2018, arXiv:1807.08024.

[46]

M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you? Explaining the predictions of any classifier,” in Proc. ACM SIGKDD, 2016, pp. 1135–1144.

[47]

Y. Zhang, M. Jiang, and Q. Zhao, “Query and attention augmentation for knowledge-based explainable reasoning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 15555–15564.

[48]

T. Miller, “Explanation in artificial intelligence: Insights from the social sciences,” Artif. Intell., vol. 267, pp. 1–38, Feb. 2019.

[49]

A. Dhurandharet al., “Explanations based on the missing: Towards contrastive explanations with pertinent negatives,” in Proc. Adv. Neural Inf. Process. Syst., vol. 31, 2018, pp. 590–601.

[50]

Y. Goyal, Z. Wu, J. Ernst, D. Batra, D. Parikh, and S. Lee, “Counterfactual visual explanations,” in Proc. 36th Int. Conf. Mach. Learn., 2019, pp. 2376–2384.

[51]

R. Chen, J. Li, H. Zhang, C. Sheng, L. Liu, and X. Cao, “Sim2Word: Explaining similarity with representative attribute words via counterfactual explanations,” ACM Trans. Multimedia Comput., Commun., Appl., vol. 19, no. 6, pp. 1–22, Nov. 2023.

Digital Library

[52]

A. K. Bhunia, P. N. Chowdhury, Y. Yang, T. M. Hospedales, T. Xiang, and Y.-Z. Song, “Vectorization and rasterization: Self-supervised learning for sketch and handwriting,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 5668–5677.

[53]

Y. Zheng, H. Yao, X. Sun, S. Zhang, S. Zhao, and F. Porikli, “Sketch-specific data augmentation for freehand sketch recognition,” Neurocomputing, vol. 456, pp. 528–539, Oct. 2021.

Digital Library

[54]

H. Li, X. Zhang, Q. Tian, and H. Xiong, “Attribute Mix: Semantic data augmentation for fine grained recognition,” in Proc. IEEE Int. Conf. Vis. Commun. Image Process. (VCIP), Dec. 2020, pp. 243–246.

[55]

Q. Yu, Y. Yang, F. Liu, Y.-Z. Song, T. Xiang, and T. M. Hospedales, “Sketch-a-Net: A deep neural network that beats humans,” Int. J. Comput. Vis., vol. 122, no. 3, pp. 411–425, May 2017.

Digital Library

[56]

Y. Qi, J. Guo, Y.-Z. Song, T. Xiang, H. Zhang, and Z.-H. Tan, “Im2Sketch: Sketch generation by unconflicted perceptual grouping,” Neurocomputing, vol. 165, pp. 338–349, Oct. 2015.

Digital Library

[57]

M. Kucer, D. Oyen, J. Castorena, and J. Wu, “DeepPatent: Large scale patent drawing recognition and retrieval,” in Proc. Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2022, pp. 2309–2318.

[58]

H. Li, X. Jiang, B. Guan, R. Wang, and N. M. Thalmann, “Multistage spatio-temporal networks for robust sketch recognition,” IEEE Trans. Image Process., vol. 31, pp. 2683–2694, 2022.

[59]

S. Cheng, Y. Ren, and Y. Yang, “SSR-GNNs: Stroke-based sketch representation with graph neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2022, pp. 5127–5137.

[60]

H. Samma, S. A. Suandi, and J. Mohamad-Saleh, “A hybrid deep learning model for face sketch recognition,” in Proc. 11th Int. Conf. Robot., Vis., Signal Process. Power Appl., 2022, pp. 545–551.

[61]

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

Digital Library

[62]

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local Nash equilibrium,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 30, 2017, pp. 6626–6637.

[63]

Final Report from the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment, document ITU-T SG09, Video Quality Experts Group, Ottawa, ON, Canada, Mar. 2000.

[64]

S. Xu, S. Venugopalan, and M. Sundararajan, “Attribution in scale and space,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9677–9686.

[65]

H. Chefer, S. Gur, and L. Wolf, “Transformer interpretability beyond attention visualization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 782–791.

[66]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.

[67]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. NIPS, vol. 25. 2012, pp. 1106–1114.

[68]

J. Zhanget al., “Generative domain-migration hashing for sketch-to-image retrieval,” in Proc. Eur. Conf. Comput. Vis. ECCV, 2018, pp. 297–314.

[69]

H. Zhang, P. She, Y. Liu, J. Gan, X. Cao, and H. Foroosh, “Learning structural representations via dynamic object landmarks discovery for sketch recognition and retrieval,” IEEE Trans. Image Process., vol. 28, no. 9, pp. 4486–4499, Sep. 2019.

Cited By

Li YCai XWu CLin XCao G(2024)A Trustworthy Counterfactual Explanation Method With Latent Space SmoothingIEEE Transactions on Image Processing10.1109/TIP.2024.344261433(4584-4599)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3442614

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing

IEEE Transactions on Image Processing Volume 32, Issue

2023

5324 pages

ISSN:1057-7149

Issue’s Table of Contents

1941-0042 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 January 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YCai XWu CLin XCao G(2024)A Trustworthy Counterfactual Explanation Method With Latent Space SmoothingIEEE Transactions on Image Processing10.1109/TIP.2024.344261433(4584-4599)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3442614

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents