Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3627673.3679538acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Open access

HiLite: Hierarchical Level-implemented Architecture Attaining Part-Whole Interpretability

Published: 21 October 2024 Publication History

Abstract

Beyond the traditional CNN structure, we have recently witnessed lots of breakthroughs in computer vision architectures such as Vision Transformer, MLP-Mixer, SNN-MLP, and so on. However, many efforts in developing novel architectures for vision tasks are heavily focused on achieving powerful performances, and how to attain interpretability in a trained neural network remains an open question. Inspired by the imaginary system GLOM, we present HiLite : <u>Hi</u>erarchical <u>L</u>evel-<u>i</u>mplemented Archi<u>te</u>cture attaining Part-Whole Interpretability, where islands of identical vectors can provide unprecedented interpretability. In our column-like structure, each level is a layer of a part-whole hierarchy composed of multiple neurons, and the function to define the neural field along an image input patch is initialized as the level vector inside the model. We propose two-column networks (Top-Down (TD) and Bottom-Up (BU)) that allow inter-level communication between adjacent levels on a specific patch and propose Gated Consensus Attention to perform intra-level communication on different patches within the level. At each time step, the level vector and outputs from different networks are combined into a weighted sum and passed to the next step, and outputs from the final time step are utilized as representation vectors. Here, supervised contrastive learning is used to find the relationship of meaningful patches in each class, where negative examples contribute to preventing representation collapse between neighboring patches. HiLite shows a possibility of performance through a quantitative evaluation on four image classification datasets as well as two metrics for assessing representation quality and showcases the intrinsic interpretability by simply generating a visual cue. We believe that our work is a solid step towards novel research on neural architectures attaining interpretability.

References

[1]
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Learning to Compose Neural Networks for Question Answering. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Kevin Knight, Ani Nenkova, and Owen Rambow (Eds.). Association for Computational Linguistics, San Diego, California, 1545--1554. https://doi.org/10.18653/v1/N16--1181
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
[3]
Luca Bertinetto, Romain Mueller, Konstantinos Tertikas, Sina Samangooei, and Nicholas A Lord. 2020. Making better mistakes: Leveraging class hierarchies with deep networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12506--12515.
[4]
Luca Bertinetto et al. 2020. Making better mistakes: Leveraging class hierarchies with deep networks. In CVPR. 12506--12515.
[5]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.
[6]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213--229.
[7]
Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. 2021. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12299--12310.
[8]
Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In International conference on machine learning. PMLR, 1691--1703.
[9]
Shoufa Chen, Enze Xie, Chongjian Ge, Runjian Chen, Ding Liang, and Ping Luo. 2021. Cyclemlp: A mlp-like architecture for dense prediction. arXiv preprint arXiv:2107.10224 (2021).
[10]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.
[11]
Francois Chollet. 2017. Xception: Deep Learning With Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1251--1258.
[12]
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
[13]
Ching-Yao Chuang, Joshua Robinson, Yen-Chen Lin, Antonio Torralba, and Stefanie Jegelka. 2020. Debiased contrastive learning. Advances in neural information processing systems, Vol. 33 (2020), 8765--8775.
[14]
Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).
[15]
Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 702--703.
[16]
Toon Van de Maele, Tim Verbelen, Ozan Catal, and Bart Dhoedt. 2021. Disentangling what and where for 3d object-centric representations through active inference. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 701--714.
[17]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[18]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[19]
Nicola Garau, Niccoló Bisagno, Zeno Sambugaro, and Nicola Conci. 2022. Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13689--13698.
[20]
Bryce Goodman and Seth Flaxman. 2017. European Union regulations on algorithmic decision-making and a 'right to explanation'. AI magazine, Vol. 38, 3 (2017), 50--57.
[21]
Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, and Matt Gardner. 2019. Neural module networks for reasoning over text. arXiv preprint arXiv:1912.04971 (2019).
[22]
Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, and Yunhe Wang. 2021. Transformer in transformer. Advances in Neural Information Processing Systems, Vol. 34 (2021), 15908--15919.
[23]
John A Hartigan and Manchek A Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics), Vol. 28, 1 (1979), 100--108.
[24]
Jeff Hawkins. 2021. A thousand brains: A new theory of intelligence. Basic Books.
[25]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729--9738.
[26]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[27]
Geoffrey Hinton. 2022. How to represent part-whole hierarchies in a neural network. Neural Computation (2022), 1--40.
[28]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[29]
Kjell Jørgen Hole and Subutai Ahmad. 2021. A thousand brains: toward biologically constrained AI. SN Applied Sciences, Vol. 3, 8 (2021), 743.
[30]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.
[31]
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).
[32]
Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016).
[33]
Yifan Jiang, Shiyu Chang, and Zhangyang Wang. 2021. Transgan: Two pure transformers can make one strong gan, and that can scale up. Advances in Neural Information Processing Systems, Vol. 34 (2021), 14745--14758.
[34]
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. Advances in neural information processing systems, Vol. 33 (2020), 18661--18673.
[35]
Seonggyeom Kim and Dong-Kyu Chae. 2022. ExMeshCNN: An explainable convolutional neural network architecture for 3d shape analysis. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 795--803.
[36]
Seonggyeom Kim and Dong-Kyu Chae. 2024. What Does a Model Really Look at?: Extracting Model-Oriented Concepts for Explaining Deep Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
[37]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[38]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[39]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, Vol. 25 (2012).
[40]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.
[41]
Kehan Li, Runyi Yu, Zhennan Wang, Li Yuan, Guoli Song, and Jie Chen. 2022. Locality guidance for improving vision transformers on tiny datasets. In Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXIV. Springer, 110--127.
[42]
Wenshuo Li, Hanting Chen, Jianyuan Guo, Ziyang Zhang, and Yunhe Wang. 2022. Brain-inspired multilayer perceptron with spiking neurons. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 783--793.
[43]
Dongze Lian, Zehao Yu, Xing Sun, and Shenghua Gao. 2021. As-mlp: An axial shifted mlp architecture for vision. arXiv preprint arXiv:2107.08391 (2021).
[44]
Zachary C Lipton. 2018. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, Vol. 16, 3 (2018), 31--57.
[45]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[46]
Yahui et al. Liu. 2021. Efficient training of visual transformers with small datasets. NeurIPS (2021).
[47]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012--10022.
[48]
Zhiying Lu, Hongtao Xie, Chuanbin Liu, and Yongdong Zhang. 2022. Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets. arXiv preprint arXiv:2210.05958 (2022).
[49]
Andreas Madsen, Himabindu Lakkaraju, Siva Reddy, and Sarath Chandar. 2024. Interpretability Needs a New Paradigm. arXiv preprint arXiv:2405.05386 (2024).
[50]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM, Vol. 65, 1 (2021), 99--106.
[51]
Nithesh Naik, BM Hameed, Dasharathraj K Shetty, Dishant Swain, Milap Shah, Rahul Paul, Kaivalya Aggarwal, Sufyan Ibrahim, Vathsala Patil, Komal Smriti, et al. 2022. Legal and ethical consideration in artificial intelligence in healthcare: who takes responsibility? Frontiers in surgery, Vol. 9 (2022), 266.
[52]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, Vol. 21, 1 (2020), 5485--5551.
[53]
Tilman Räuker, Anson Ho, Stephen Casper, and Dylan Hadfield-Menell. 2023. Toward transparent ai: A survey on interpreting the inner structures of deep neural networks. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 464--483.
[54]
David N Reshef, Yakir A Reshef, Hilary K Finucane, Sharon R Grossman, Gilean McVean, Peter J Turnbaugh, Eric S Lander, Michael Mitzenmacher, and Pardis C Sabeti. 2011. Detecting novel associations in large data sets. science, Vol. 334, 6062 (2011), 1518--1524.
[55]
David N Reshef, Yakir A Reshef, Hilary K Finucane, Sharon R Grossman, Gilean McVean, Peter J Turnbaugh, Eric S Lander, Michael Mitzenmacher, and Pardis C Sabeti. 2011. Detecting novel associations in large data sets. science, Vol. 334, 6062 (2011), 1518--1524.
[56]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.
[57]
Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, Vol. 1, 5 (2019), 206--215.
[58]
Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. 2017. Dynamic routing between capsules. Advances in neural information processing systems, Vol. 30 (2017).
[59]
Johannes Schmidt-Hieber. 2021. The Kolmogorov--Arnold representation theorem revisited. Neural networks, Vol. 137 (2021), 119--126.
[60]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.
[61]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[62]
Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. 2020. Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems, Vol. 33 (2020), 7462--7473.
[63]
Leslie N Smith and Nicholay Topin. 2019. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial intelligence and machine learning for multi-domain operations applications, Vol. 11006. SPIE, 369--386.
[64]
Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Yanxi Li, Chao Xu, and Yunhe Wang. 2022. An image patch is a wave: Phase-aware vision mlp. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10935--10944.
[65]
Daniel W Tigard. 2021. There is no techno-responsibility gap. Philosophy & Technology, Vol. 34, 3 (2021), 589--607.
[66]
Ilya O Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, et al. 2021. Mlp-mixer: An all-mlp architecture for vision. Advances in neural information processing systems, Vol. 34 (2021), 24261--24272.
[67]
Daniel Vale, Ali El-Sharif, and Muhammed Ali. 2022. Explainable artificial intelligence (XAI) post-hoc explainability methods: Risks and limitations in non-discrimination law. AI and Ethics, Vol. 2, 4 (2022), 815--826.
[68]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[69]
Murray J White. 1969. Laterality differences in perception: a review. Psychological Bulletin, Vol. 72, 6 (1969), 387--405.
[70]
Murray J White. 1976. Order of processing in visual perception. Canadian Journal of Psychology/Revue canadienne de psychologie, Vol. 30, 3 (1976), 140--156.
[71]
Svante Wold, Kim Esbensen, and Paul Geladi. 1987. Principal component analysis. Chemometrics and intelligent laboratory systems, Vol. 2, 1--3 (1987), 37--52.
[72]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
[73]
Zhiyu Yao, Xinyang Chen, Sinan Wang, Qinyan Dai, Yumeng Li, Tanchao Zhu, and Mingsheng Long. 2022. Recommender Transformers with Behavior Pathways. arXiv preprint arXiv:2206.06804 (2022).
[74]
Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, and Ping Li. 2022. S2-mlp: Spatial-shift mlp architecture for vision. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 297--306.
[75]
Éloi Zablocki, Hédi Ben-Younes, Patrick Pérez, and Matthieu Cord. 2022. Explainability of deep vision-based autonomous driving systems: Review and challenges. International Journal of Computer Vision, Vol. 130, 10 (2022), 2425--2452.
[76]
Haoran Zhu, Boyuan Chen, and Carter Yang. 2023. Understanding Why ViT Trains Badly on Small Datasets: An Intuitive Perspective. arXiv preprint arXiv:2302.03751 (2023).
[77]
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020).

Index Terms

  1. HiLite: Hierarchical Level-implemented Architecture Attaining Part-Whole Interpretability

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
    October 2024
    5705 pages
    ISBN:9798400704369
    DOI:10.1145/3627673
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 October 2024

    Check for updates

    Author Tags

    1. explainable AI
    2. hierarchical architecture
    3. neural networks with interpretability

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CIKM '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 97
      Total Downloads
    • Downloads (Last 12 months)97
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 28 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media