Sketch classification models have been extensively investigated by designing a task-driven deep neural network. Despite their successful performances, few works have attempted to explain the prediction of sketch classifiers. To explain the prediction of classifiers, an intuitive way is to visualize the activation maps via computing the gradients. However, visualization based explanations are constrained by several factors when directly applying them to interpret the sketch classifiers: (i) low-semantic visualization regions for human understanding. and (ii) neglecting of the inter-class correlations among distinct categories. To address these issues, we introduce a novel explanation method to interpret the decision of sketch classifiers with stroke-level evidences. Specifically, to achieve stroke-level semantic regions, we first develop a sketch parser that parses the sketch into strokes while preserving their geometric structures. Then, we design a counterfactual map generator to discover the stroke-level principal components for a specific category. Finally, based on the counterfactual feature maps, our model could explain the question of “why the sketch is classified as X” by providing positive and negative semantic explanation evidences. Experiments conducted on two public sketch benchmarks, Sketchy-COCO and TU-Berlin, demonstrate the effectiveness of our proposed model. Furthermore, our model could provide more discriminative and human understandable explanations compared with these existing works.
References
[1]
D. Liu, J. Li, N. Wang, C. Peng, and X. Gao, “Composite components-based face sketch recognition,” Neurocomputing, vol. 302, pp. 46–54, Aug. 2018.
R. G. Schneider and T. Tuytelaarsy, “Sketch classification and classification-driven analysis using Fisher vectors,” ACM Trans. Graph., vol. 33, no. 6, pp. 174:1–174:9, Nov. 2014.
H. Zhang, S. Liu, C. Zhang, W. Ren, R. Wang, and X. Cao, “SketchNet: Sketch classification with web images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1105–1113.
J. Guo, C. Wang, E. Roman-Rangel, H. Chao, and Y. Rui, “Building hierarchical representations for Oracle character and sketch recognition,” IEEE Trans. Image Process., vol. 25, no. 1, pp. 104–118, Jan. 2016.
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 618–626.
S. Desai and H. G. Ramaswamy, “Ablation-CAM: Visual explanations for deep convolutional network via gradient-free localization,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2020, pp. 972–980.
J. Borowskiet al., “Exemplary natural images explain CNN activations better than state-of-the-art feature visualization,” in Proc. ICLR, 2021, pp. 1–41.
V. Shitole, F. Li, M. Kahng, P. Tadepalli, and A. Fern, “One explanation is not enough: Structured attention graphs for image classification,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 11352–11363.
D. Slack, A. Hilgard, S. Singh, and H. Lakkaraju, “Reliable post hoc explanations: Modeling uncertainty in explainability,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 9391–9404.
A. Khakzar, P. Khorsandi, R. Nobahari, and N. Navab, “Do explanations explain? Model knows best,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 10234–10243.
N. Garau, N. Bisagno, Z. Sambugaro, and N. Conci, “Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 13679–13688.
W. Jin, X. Li, and G. Hamarneh, “Evaluating explainable AI on a multi-modal medical imaging task: Can existing algorithms fulfill clinical requirements?” in Proc. AAAI Conf. Artif. Intell., 2022, pp. 11945–11953.
Y. Rao, G. Chen, J. Lu, and J. Zhou, “Counterfactual attention learning for fine-grained visual categorization and re-identification,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 1005–1014.
Q. Zheng, Z. Wang, J. Zhou, and J. Lu, “Shap-CAM: Visual explanations for convolutional neural networks based on Shapley value,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2022, pp. 459–474.
J. Xu, G. Chen, J. Lu, and J. Zhou, “Unintentional action localization via counterfactual examples,” IEEE Trans. Image Process., vol. 31, pp. 3281–3294, 2022.
G. Chen, J. Li, J. Lu, and J. Zhou, “Human trajectory prediction via counterfactual analysis,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 9804–9813.
C. Gao, Q. Liu, Q. Xu, L. Wang, J. Liu, and C. Zou, “SketchyCOCO: Image generation from freehand scene sketches,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5173–5182.
G. Alicioglu and B. Sun, “A survey of visual analytics for explainable artificial intelligence methods,” Comput. Graph., vol. 102, pp. 502–520, Feb. 2022.
L. A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele, and T. Darrell, “Generating visual explanations,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 3–19.
L. A. Hendricks, R. Hu, T. Darrell, and Z. Akata, “Generating counterfactual explanations with natural language,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 1–4.
P. Wang and N. Vasconcelos, “SCOUT: Self-aware discriminant counterfactual explanations,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 8978–8987.
A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian, “Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2018, pp. 839–847.
M. Ancona, E. Ceolini, C. Öztireli, and M. Gross, “A unified view of gradient-based attribution methods for deep neural networks,” in Proc. NIPS Workshop Interpreting, Explaining Visualizing Deep Learn.-Now What (NIPS). Zürich, Switzerland: ETH Zürich, 2017.
Z. Qi, S. Khorram, and F. Li, “Visualizing deep networks by optimizing with integrated gradients,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9621–9630.
C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and J. K. Su, “This looks like that: Deep learning for interpretable image recognition,” in Proc. 33rd Conf. Neural Inf. Process. Syst., vol. 32, 2019, pp. 8928–8939.
Q. Zhang, R. Cao, F. Shi, Y. N. Wu, and S.-C. Zhu, “Interpreting CNN knowledge via an explanatory graph,” in Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, 2018, pp. 4454–4463.
R. K. Sarvadevabhatla, I. Dwivedi, A. Biswas, and S. Manocha, “SketchParse: Towards rich descriptions for poorly drawn sketches using multi-task hierarchical deep networks,” in Proc. 25th ACM Int. Conf. Multimedia, Oct. 2017, pp. 10–18.
A. Kapishnikov, S. Venugopalan, B. Avci, B. Wedin, M. Terry, and T. Bolukbasi, “Guided integrated gradients: An adaptive path method for removing noise,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 5048–5056.
K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” in Proc. Int. Conf. Learn. Represent., 2014, pp. 1–8.
A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” in Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 3145–3153.
A. Kapishnikov, T. Bolukbasi, F. Viegas, and M. Terry, “XRAI: Better attributions through regions,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 4947–4956.
H. Wanget al., “Score-CAM: Score-weighted visual explanations for convolutional neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2020, pp. 111–119.
Y. Geet al., “A peek into the reasoning of neural networks: Interpreting with structural visual concepts,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 2195–2204.
R. C. Fong and A. Vedaldi, “Interpretable explanations of black boxes by meaningful perturbation,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 3449–3457.
J. Wagner, J. M. Köhler, T. Gindele, L. Hetzel, J. T. Wiedemer, and S. Behnke, “Interpretable and fine-grained visual explanations for convolutional neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 9089–9099.
C.-H. Chang, E. Creager, A. Goldenberg, and D. Duvenaud, “Explaining image classifiers by adaptive dropout and generative in-filling,” 2018, arXiv:1807.08024.
M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you? Explaining the predictions of any classifier,” in Proc. ACM SIGKDD, 2016, pp. 1135–1144.
Y. Zhang, M. Jiang, and Q. Zhao, “Query and attention augmentation for knowledge-based explainable reasoning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 15555–15564.
A. Dhurandharet al., “Explanations based on the missing: Towards contrastive explanations with pertinent negatives,” in Proc. Adv. Neural Inf. Process. Syst., vol. 31, 2018, pp. 590–601.
Y. Goyal, Z. Wu, J. Ernst, D. Batra, D. Parikh, and S. Lee, “Counterfactual visual explanations,” in Proc. 36th Int. Conf. Mach. Learn., 2019, pp. 2376–2384.
R. Chen, J. Li, H. Zhang, C. Sheng, L. Liu, and X. Cao, “Sim2Word: Explaining similarity with representative attribute words via counterfactual explanations,” ACM Trans. Multimedia Comput., Commun., Appl., vol. 19, no. 6, pp. 1–22, Nov. 2023.
A. K. Bhunia, P. N. Chowdhury, Y. Yang, T. M. Hospedales, T. Xiang, and Y.-Z. Song, “Vectorization and rasterization: Self-supervised learning for sketch and handwriting,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 5668–5677.
Y. Zheng, H. Yao, X. Sun, S. Zhang, S. Zhao, and F. Porikli, “Sketch-specific data augmentation for freehand sketch recognition,” Neurocomputing, vol. 456, pp. 528–539, Oct. 2021.
H. Li, X. Zhang, Q. Tian, and H. Xiong, “Attribute Mix: Semantic data augmentation for fine grained recognition,” in Proc. IEEE Int. Conf. Vis. Commun. Image Process. (VCIP), Dec. 2020, pp. 243–246.
Q. Yu, Y. Yang, F. Liu, Y.-Z. Song, T. Xiang, and T. M. Hospedales, “Sketch-a-Net: A deep neural network that beats humans,” Int. J. Comput. Vis., vol. 122, no. 3, pp. 411–425, May 2017.
Y. Qi, J. Guo, Y.-Z. Song, T. Xiang, H. Zhang, and Z.-H. Tan, “Im2Sketch: Sketch generation by unconflicted perceptual grouping,” Neurocomputing, vol. 165, pp. 338–349, Oct. 2015.
M. Kucer, D. Oyen, J. Castorena, and J. Wu, “DeepPatent: Large scale patent drawing recognition and retrieval,” in Proc. Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2022, pp. 2309–2318.
H. Li, X. Jiang, B. Guan, R. Wang, and N. M. Thalmann, “Multistage spatio-temporal networks for robust sketch recognition,” IEEE Trans. Image Process., vol. 31, pp. 2683–2694, 2022.
S. Cheng, Y. Ren, and Y. Yang, “SSR-GNNs: Stroke-based sketch representation with graph neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2022, pp. 5127–5137.
H. Samma, S. A. Suandi, and J. Mohamad-Saleh, “A hybrid deep learning model for face sketch recognition,” in Proc. 11th Int. Conf. Robot., Vis., Signal Process. Power Appl., 2022, pp. 545–551.
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local Nash equilibrium,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), vol. 30, 2017, pp. 6626–6637.
Final Report from the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment, document ITU-T SG09, Video Quality Experts Group, Ottawa, ON, Canada, Mar. 2000.
S. Xu, S. Venugopalan, and M. Sundararajan, “Attribution in scale and space,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9677–9686.
H. Chefer, S. Gur, and L. Wolf, “Transformer interpretability beyond attention visualization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 782–791.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. NIPS, vol. 25. 2012, pp. 1106–1114.
H. Zhang, P. She, Y. Liu, J. Gan, X. Cao, and H. Foroosh, “Learning structural representations via dynamic object landmarks discovery for sketch recognition and retrieval,” IEEE Trans. Image Process., vol. 28, no. 9, pp. 4486–4499, Sep. 2019.