Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Prediction With Visual Evidence: Sketch Classification Explanation via Stroke-Level Attributions

Published: 01 January 2023 Publication History

Abstract

Sketch classification models have been extensively investigated by designing a task-driven deep neural network. Despite their successful performances, few works have attempted to explain the prediction of sketch classifiers. To explain the prediction of classifiers, an intuitive way is to visualize the activation maps via computing the gradients. However, visualization based explanations are constrained by several factors when directly applying them to interpret the sketch classifiers: (i) low-semantic visualization regions for human understanding. and (ii) neglecting of the inter-class correlations among distinct categories. To address these issues, we introduce a novel explanation method to interpret the decision of sketch classifiers with stroke-level evidences. Specifically, to achieve stroke-level semantic regions, we first develop a sketch parser that parses the sketch into strokes while preserving their geometric structures. Then, we design a counterfactual map generator to discover the stroke-level principal components for a specific category. Finally, based on the counterfactual feature maps, our model could explain the question of “why the sketch is classified as X” by providing positive and negative semantic explanation evidences. Experiments conducted on two public sketch benchmarks, Sketchy-COCO and TU-Berlin, demonstrate the effectiveness of our proposed model. Furthermore, our model could provide more discriminative and human understandable explanations compared with these existing works.

References

[1]
D. Liu, J. Li, N. Wang, C. Peng, and X. Gao, “Composite components-based face sketch recognition,” Neurocomputing, vol. 302, pp. 46–54, Aug. 2018.
[2]
M. Eitz, J. Hays, and M. Alexa, “How do humans sketch objects?” ACM Trans. Graph., vol. 31, no. 4, pp. 1–10, Aug. 2012.
[3]
R. G. Schneider and T. Tuytelaarsy, “Sketch classification and classification-driven analysis using Fisher vectors,” ACM Trans. Graph., vol. 33, no. 6, pp. 174:1–174:9, Nov. 2014.
[4]
H. Zhang, S. Liu, C. Zhang, W. Ren, R. Wang, and X. Cao, “SketchNet: Sketch classification with web images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1105–1113.
[5]
J. Guo, C. Wang, E. Roman-Rangel, H. Chao, and Y. Rui, “Building hierarchical representations for Oracle character and sketch recognition,” IEEE Trans. Image Process., vol. 25, no. 1, pp. 104–118, Jan. 2016.
[6]
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 618–626.
[7]
K. Schulz, L. Sixt, F. Tombari, and T. Landgraf, “Restricting the flow: Information bottlenecks for attribution,” in Proc. ICLR, 2020, pp. 1–18.
[8]
S. Desai and H. G. Ramaswamy, “Ablation-CAM: Visual explanations for deep convolutional network via gradient-free localization,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2020, pp. 972–980.
[9]
J. Borowskiet al., “Exemplary natural images explain CNN activations better than state-of-the-art feature visualization,” in Proc. ICLR, 2021, pp. 1–41.
[10]
V. Shitole, F. Li, M. Kahng, P. Tadepalli, and A. Fern, “One explanation is not enough: Structured attention graphs for image classification,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 11352–11363.
[11]
D. Slack, A. Hilgard, S. Singh, and H. Lakkaraju, “Reliable post hoc explanations: Modeling uncertainty in explainability,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 9391–9404.
[12]
A. Khakzar, P. Khorsandi, R. Nobahari, and N. Navab, “Do explanations explain? Model knows best,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 10234–10243.
[13]
N. Garau, N. Bisagno, Z. Sambugaro, and N. Conci, “Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 13679–13688.
[14]
W. Jin, X. Li, and G. Hamarneh, “Evaluating explainable AI on a multi-modal medical imaging task: Can existing algorithms fulfill clinical requirements?” in Proc. AAAI Conf. Artif. Intell., 2022, pp. 11945–11953.
[15]
D. Hong, S. S. Baek, and T. Wang, “Interpretable sequence classification via prototype trajectory,” 2020, arXiv:2007.01777.
[16]
S. Liuet al., “What makes a good movie trailer? Interpretation from simultaneous EEG and eyetracker recording,” in Proc. ACM MM, 2016, pp. 82–86.
[17]
Y. Rao, G. Chen, J. Lu, and J. Zhou, “Counterfactual attention learning for fine-grained visual categorization and re-identification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 1005–1014.
[18]
Q. Zheng, Z. Wang, J. Zhou, and J. Lu, “Shap-CAM: Visual explanations for convolutional neural networks based on Shapley value,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2022, pp. 459–474.
[19]
S. Verma, J. Dickerson, and K. Hines, “Counterfactual explanations for machine learning: A review,” in Proc. NeurIPS Workshop, 2020, pp. 1–22.
[20]
J. Xu, G. Chen, J. Lu, and J. Zhou, “Unintentional action localization via counterfactual examples,” IEEE Trans. Image Process., vol. 31, pp. 3281–3294, 2022.
[21]
G. Chen, J. Li, J. Lu, and J. Zhou, “Human trajectory prediction via counterfactual analysis,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 9804–9813.
[22]
C. Gao, Q. Liu, Q. Xu, L. Wang, J. Liu, and C. Zou, “SketchyCOCO: Image generation from freehand scene sketches,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5173–5182.
[23]
F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” Statistics, vol. 1050, p. 2, 2017.
[24]
G. Alicioglu and B. Sun, “A survey of visual analytics for explainable artificial intelligence methods,” Comput. Graph., vol. 102, pp. 502–520, Feb. 2022.
[25]
L. A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele, and T. Darrell, “Generating visual explanations,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 3–19.
[26]
L. A. Hendricks, R. Hu, T. Darrell, and Z. Akata, “Generating counterfactual explanations with natural language,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 1–4.
[27]
P. Wang and N. Vasconcelos, “SCOUT: Self-aware discriminant counterfactual explanations,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 8978–8987.
[28]
A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian, “Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2018, pp. 839–847.
[29]
M. Ancona, E. Ceolini, C. Öztireli, and M. Gross, “A unified view of gradient-based attribution methods for deep neural networks,” in Proc. NIPS Workshop Interpreting, Explaining Visualizing Deep Learn.-Now What (NIPS). Zürich, Switzerland: ETH Zürich, 2017.
[30]
M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 3319–3328.
[31]
Z. Qi, S. Khorram, and F. Li, “Visualizing deep networks by optimizing with integrated gradients,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9621–9630.
[32]
C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and J. K. Su, “This looks like that: Deep learning for interpretable image recognition,” in Proc. 33rd Conf. Neural Inf. Process. Syst., vol. 32, 2019, pp. 8928–8939.
[33]
Q. Zhang, R. Cao, F. Shi, Y. N. Wu, and S.-C. Zhu, “Interpreting CNN knowledge via an explanatory graph,” in Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, 2018, pp. 4454–4463.
[34]
R. K. Sarvadevabhatla, I. Dwivedi, A. Biswas, and S. Manocha, “SketchParse: Towards rich descriptions for poorly drawn sketches using multi-task hierarchical deep networks,” in Proc. 25th ACM Int. Conf. Multimedia, Oct. 2017, pp. 10–18.
[35]
A. Kapishnikov, S. Venugopalan, B. Avci, B. Wedin, M. Terry, and T. Bolukbasi, “Guided integrated gradients: An adaptive path method for removing noise,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 5048–5056.
[36]
K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” in Proc. Int. Conf. Learn. Represent., 2014, pp. 1–8.
[37]
A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” in Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 3145–3153.
[38]
A. Kapishnikov, T. Bolukbasi, F. Viegas, and M. Terry, “XRAI: Better attributions through regions,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 4947–4956.
[39]
H. Wanget al., “Score-CAM: Score-weighted visual explanations for convolutional neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2020, pp. 111–119.
[40]
Y. Geet al., “A peek into the reasoning of neural networks: Interpreting with structural visual concepts,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 2195–2204.
[41]
S. Gulshad and A. Smeulders, “Explaining with counter visual attributes and examples,” in Proc. Int. Conf. Multimedia Retr., Jun. 2020, pp. 35–43.
[42]
R. C. Fong and A. Vedaldi, “Interpretable explanations of black boxes by meaningful perturbation,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 3449–3457.
[43]
V. Petsiuk, A. Das, and K. Saenko, “RISE: Randomized input sampling for explanation of black-box models,” in Proc. BMVC, 2018, pp. 151–164.
[44]
J. Wagner, J. M. Köhler, T. Gindele, L. Hetzel, J. T. Wiedemer, and S. Behnke, “Interpretable and fine-grained visual explanations for convolutional neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 9089–9099.
[45]
C.-H. Chang, E. Creager, A. Goldenberg, and D. Duvenaud, “Explaining image classifiers by adaptive dropout and generative in-filling,” 2018, arXiv:1807.08024.
[46]
M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you? Explaining the predictions of any classifier,” in Proc. ACM SIGKDD, 2016, pp. 1135–1144.
[47]
Y. Zhang, M. Jiang, and Q. Zhao, “Query and attention augmentation for knowledge-based explainable reasoning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 15555–15564.
[48]
T. Miller, “Explanation in artificial intelligence: Insights from the social sciences,” Artif. Intell., vol. 267, pp. 1–38, Feb. 2019.
[49]
A. Dhurandharet al., “Explanations based on the missing: Towards contrastive explanations with pertinent negatives,” in Proc. Adv. Neural Inf. Process. Syst., vol. 31, 2018, pp. 590–601.
[50]
Y. Goyal, Z. Wu, J. Ernst, D. Batra, D. Parikh, and S. Lee, “Counterfactual visual explanations,” in Proc. 36th Int. Conf. Mach. Learn., 2019, pp. 2376–2384.
[51]
R. Chen, J. Li, H. Zhang, C. Sheng, L. Liu, and X. Cao, “Sim2Word: Explaining similarity with representative attribute words via counterfactual explanations,” ACM Trans. Multimedia Comput., Commun., Appl., vol. 19, no. 6, pp. 1–22, Nov. 2023.
[52]
A. K. Bhunia, P. N. Chowdhury, Y. Yang, T. M. Hospedales, T. Xiang, and Y.-Z. Song, “Vectorization and rasterization: Self-supervised learning for sketch and handwriting,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 5668–5677.
[53]
Y. Zheng, H. Yao, X. Sun, S. Zhang, S. Zhao, and F. Porikli, “Sketch-specific data augmentation for freehand sketch recognition,” Neurocomputing, vol. 456, pp. 528–539, Oct. 2021.
[54]
H. Li, X. Zhang, Q. Tian, and H. Xiong, “Attribute Mix: Semantic data augmentation for fine grained recognition,” in Proc. IEEE Int. Conf. Vis. Commun. Image Process. (VCIP), Dec. 2020, pp. 243–246.
[55]
Q. Yu, Y. Yang, F. Liu, Y.-Z. Song, T. Xiang, and T. M. Hospedales, “Sketch-a-Net: A deep neural network that beats humans,” Int. J. Comput. Vis., vol. 122, no. 3, pp. 411–425, May 2017.
[56]
Y. Qi, J. Guo, Y.-Z. Song, T. Xiang, H. Zhang, and Z.-H. Tan, “Im2Sketch: Sketch generation by unconflicted perceptual grouping,” Neurocomputing, vol. 165, pp. 338–349, Oct. 2015.
[57]
M. Kucer, D. Oyen, J. Castorena, and J. Wu, “DeepPatent: Large scale patent drawing recognition and retrieval,” in Proc. Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2022, pp. 2309–2318.
[58]
H. Li, X. Jiang, B. Guan, R. Wang, and N. M. Thalmann, “Multistage spatio-temporal networks for robust sketch recognition,” IEEE Trans. Image Process., vol. 31, pp. 2683–2694, 2022.
[59]
S. Cheng, Y. Ren, and Y. Yang, “SSR-GNNs: Stroke-based sketch representation with graph neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2022, pp. 5127–5137.
[60]
H. Samma, S. A. Suandi, and J. Mohamad-Saleh, “A hybrid deep learning model for face sketch recognition,” in Proc. 11th Int. Conf. Robot., Vis., Signal Process. Power Appl., 2022, pp. 545–551.
[61]
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[62]
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local Nash equilibrium,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 30, 2017, pp. 6626–6637.
[63]
Final Report from the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment, document ITU-T SG09, Video Quality Experts Group, Ottawa, ON, Canada, Mar. 2000.
[64]
S. Xu, S. Venugopalan, and M. Sundararajan, “Attribution in scale and space,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9677–9686.
[65]
H. Chefer, S. Gur, and L. Wolf, “Transformer interpretability beyond attention visualization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 782–791.
[66]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
[67]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. NIPS, vol. 25. 2012, pp. 1106–1114.
[68]
J. Zhanget al., “Generative domain-migration hashing for sketch-to-image retrieval,” in Proc. Eur. Conf. Comput. Vis. ECCV, 2018, pp. 297–314.
[69]
H. Zhang, P. She, Y. Liu, J. Gan, X. Cao, and H. Foroosh, “Learning structural representations via dynamic object landmarks discovery for sketch recognition and retrieval,” IEEE Trans. Image Process., vol. 28, no. 9, pp. 4486–4499, Sep. 2019.

Cited By

View all
  • (2024)A Trustworthy Counterfactual Explanation Method With Latent Space SmoothingIEEE Transactions on Image Processing10.1109/TIP.2024.344261433(4584-4599)Online publication date: 19-Aug-2024

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing
IEEE Transactions on Image Processing  Volume 32, Issue
2023
5324 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Trustworthy Counterfactual Explanation Method With Latent Space SmoothingIEEE Transactions on Image Processing10.1109/TIP.2024.344261433(4584-4599)Online publication date: 19-Aug-2024

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media