Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3581783.3613832acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion

Published: 27 October 2023 Publication History

Abstract

Reconstructing visual stimuli from brain recordings has been a meaningful and challenging task. Especially, the achievement of precise and controllable image reconstruction bears great significance in propelling the progress and utilization of brain-computer interfaces. Despite the advancements in complex image reconstruction techniques, the challenge persists in achieving a cohesive alignment of both semantic (concepts and objects) and structure (position, orientation, and size) with the image stimuli. To address the aforementioned issue, we propose a two-stage image reconstruction model called MindDiffuser1. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion, which yields a preliminary image that contains semantic information. In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpropagation to align the structural information. The results of both qualitative and quantitative analyses demonstrate that our model has surpassed the current state-of-the-art models on Natural Scenes Dataset (NSD). The subsequent experimental findings corroborate the neurobiological plausibility of the model, as evidenced by the interpretability of the multimodal feature employed, which align with the corresponding brain responses.

References

[1]
Zarina Rakhimberdina, Quentin Jodelet, Xin Liu, and Tsuyoshi Murata. Natural image reconstruction from fMRI using deep learning: A survey. Frontiers in neuroscience, 15:795488, 2021.
[2]
Nicolas Pinto, David Doukhan, James J DiCarlo, and David D Cox. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS computational biology, 5(11):e1000579, 2009.
[3]
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
[4]
Martin Schrimpf, Jonas Kubilius, Ha Hong, Najib J Majaj, Rishi Rajalingham, Elias B Issa, Kohitij Kar, Pouya Bashivan, Jonathan Prescott-Roy, Franziska Geiger, et al. Brain-score: Which artificial neural network for object recognition is most brain-like? BioRxiv, page 407007, 2018.
[5]
Guohua Shen, Tomoyasu Horikawa, Kei Majima, and Yukiyasu Kamitani. Deep image reconstruction from human brain activity. PLoS computational biology, 15(1):e1006633, 2019.
[6]
Milan Ilic. Auto-encoding variational bayes. 2019.
[7]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139--144, 2020.
[8]
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
[9]
James V Haxby, M Ida Gobbini, Maura L Furey, Alumit Ishai, Jennifer L Schouten, and Pietro Pietrini. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539):2425--2430, 2001.
[10]
Marcel AJ Van Gerven, Botond Cseke, Floris P De Lange, and Tom Heskes. Efficient Bayesian multivariate fMRI analysis using a sparsifying spatio-temporal prior. NeuroImage, 50(1):150--161, 2010.
[11]
Saudamini Roy Damarla and Marcel Adam Just. Decoding the representation of numerical values from brain activation patterns. Human brain mapping, 34(10):2624--2634, 2013.
[12]
Elahe' Yargholi and Gholam-Ali Hossein-Zadeh. Brain decoding-classification of hand written digits from fMRI data employing Bayesian networks. Frontiers in human neuroscience, 10:351, 2016.
[13]
Changde Du, Kaicheng Fu, Jinpeng Li, and Huiguang He. Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1--17, 2023.
[14]
John-Dylan Haynes and Geraint Rees. Decoding mental states from brain activity in humans. Nature reviews neuroscience, 7(7):523--534, 2006.
[15]
Kendrick N Kay, Thomas Naselaris, Ryan J Prenger, and Jack L Gallant. Identifying natural images from human brain activity. Nature, 452(7185):352--355, 2008.
[16]
Thomas Naselaris, Ryan J Prenger, Kendrick N Kay, Michael Oliver, and Jack L Gallant. Bayesian reconstruction of natural images from human brain activity. Neuron, 63(6):902--915, 2009.
[17]
Tomoyasu Horikawa and Yukiyasu Kamitani. Generic decoding of seen and imagined objects using hierarchical visual features. Nature communications, 8(1):15037, 2017.
[18]
Shinji Nishimoto, An T Vu, Thomas Naselaris, Yuval Benjamini, Bin Yu, and Jack L Gallant. Reconstructing visual experiences from brain activity evoked by natural movies. Current biology, 21(19):1641--1646, 2011.
[19]
Guohua Shen, Kshitij Dwivedi, Kei Majima, Tomoyasu Horikawa, and Yukiyasu Kamitani. End-to-end deep image reconstruction from human brain activity. Frontiers in Computational Neuroscience, 13, 2019.
[20]
Roman Beliy, Guy Gaziv, Assaf Hoogi, Francesca Strappini, Tal Golan, and Michal Irani. From voxels to pixels and back: Self-supervision in natural-image reconstruction from fMRI. Advances in Neural Information Processing Systems, 32, 2019.
[21]
Guy Gaziv, Roman Beliy, Niv Granot, Assaf Hoogi, Francesca Strappini, Tal Golan, and Michal Irani. Self-supervised natural image reconstruction and rich semantic classification from brain activity. bioRxiv, 6(9):2020, 2020.
[22]
Tao Fang, Yu Qi, and Gang Pan. Reconstructing perceptive images from brain activity by shape-semantic GAN. Advances in Neural Information Processing Systems, 33:13038--13048, 2020.
[23]
Ziqi Ren, Jie Li, Xuetong Xue, Xin Li, Fan Yang, Zhicheng Jiao, and Xinbo Gao. Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning. NeuroImage, 228:117602, 2021.
[24]
KN Kay. Naselaris T, Prenger RJ, Gallant JL. Identifying natural images from human brain activity. nature, 452:352--355, 2008.
[25]
Yusuke Fujiwara, Yoichi Miyawaki, and Yukiyasu Kamitani. Modular encoding and decoding models derived from Bayesian canonical correlation analysis. Neural computation, 25(4):979--1005, 2013.
[26]
Guy Gaziv, Roman Beliy, Niv Granot, Assaf Hoogi, Francesca Strappini, Tal Golan, and Michal Irani. Self-supervised natural image reconstruction and large-scale semantic classification from brain activity. NeuroImage, 254:119121, 2022.
[27]
Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20(3):542--542, 2009.
[28]
Changde Du, Changying Du, Lijie Huang, and Huiguang He. Reconstructing perceived images from human brain activities with Bayesian deep multiview learning. IEEE transactions on neural networks and learning systems, 30(8):2310--2323, 2018.
[29]
Changde Du, Changying Du, Lijie Huang, and Huiguang He. Conditional generative neural decoding with structured CNN feature prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 2629--2636, 2020.
[30]
Changde Du, Changying Du, Lijie Huang, Haibao Wang, and Huiguang He. Structured neural decoding with multitask transfer learning of deep neural network representations. IEEE Transactions on Neural Networks and Learning Systems, 33(2):600--614, 2020.
[31]
Zijiao Chen, Jiaxin Qing, Tiange Xiang, Wan Lin Yue, and Juan Helen Zhou. Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding. arXiv preprint arXiv:2211.06956, 2022.
[32]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000--16009, 2022.
[33]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684--10695, 2022.
[34]
Furkan Ozcelik, Bhavin Choksi, Milad Mozafari, Leila Reddy, and Rufin Van-Rullen. Reconstruction of perceived images from fMRI patterns and semantic brain exploration using instance-conditioned GANs. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1--8. IEEE, 2022.
[35]
Zijin Gu, Keith Jamison, Amy Kuceyeski, and Mert Sabuncu. Decoding natural image stimuli from fMRI data with a surface-based convolutional network. arXiv preprint arXiv:2212.02409, 2022.
[36]
Arantxa Casanova, Marlene Careil, Jakob Verbeek, Michal Drozdzal, and Adriana Romero Soriano. Instance-Conditioned GAN. Advances in Neural Information Processing Systems, 34:27517--27529, 2021.
[37]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748--8763. PMLR, 2021.
[38]
Sikun Lin, Thomas Sprague, and Ambuj K Singh. Mind Reader: Reconstructing complex images from brain activities. arXiv preprint arXiv:2210.01769, 2022.
[39]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110--8119, 2020.
[40]
Yu Takagi and Shinji Nishimoto. High-resolution image reconstruction with latent diffusion models from human brain activity. bioRxiv, pages 2022--11, 2022.
[41]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 2022.
[42]
Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. StyleCLIP: Text-driven manipulation of StyleGAN imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2085--2094, 2021.
[43]
Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, and Ming-Hsuan Yang. GAN Inversion: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
[44]
Yael Vinker, Ehsan Pajouheshgar, Jessica Y Bo, Roman Christian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir. CLIPasso: Semantically-Aware Object Sketching. ACM Transactions on Graphics (TOG), 41(4):1--11, 2022.
[45]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention--MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234--241. Springer, 2015.
[46]
Aria Y. Wang, Kendrick Kay, Thomas Naselaris, Michael J. Tarr, and Leila Wehbe. Incorporating natural language into vision models improves prediction and understanding of higher visual cortex. bioRxiv, 2022.
[47]
Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L Breedlove, Jacob S Prince, Logan T Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature neuroscience, 25(1):116--126, 2022.
[48]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft COCO: Common objects in context. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740--755. Springer, 2014.
[49]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[50]
Alexander L Cohen, Damien A Fair, Nico UF Dosenbach, Francis M Miezin, Donna Dierker, David C Van Essen, Bradley L Schlaggar, and Steven E Petersen. Defining functional areas in individual human brains using resting functional connectivity mri. Neuroimage, 41(1):45--57, 2008.
[51]
Daniel J Felleman and David C Van Essen. Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex (New York, NY: 1991), 1(1):1--47, 1991.
[52]
Adam Gazzaley and Anna C Nobre. Top-down modulation: bridging selective attention and working memory. Trends in cognitive sciences, 16(2):129--135, 2012.
[53]
James S. Gao, Alexander G. Huth, Mark D. Lescroart, and Jack L. Gallant. Pycortex: an interactive surface visualizer for fmri. Frontiers in Neuroinformatics, 9, 2015.

Cited By

View all
  • (2025)NeuralDiffuser: Neuroscience-Inspired Diffusion Guidance for fMRI Visual ReconstructionIEEE Transactions on Image Processing10.1109/TIP.2025.352605134(552-565)Online publication date: 2025
  • (2024)FedMinds: Privacy-Preserving Personalized Brain Visual DecodingProceedings of the 2024 4th International Joint Conference on Robotics and Artificial Intelligence10.1145/3696474.3698034(174-177)Online publication date: 13-Sep-2024
  • (2024)Query Augmentation with Brain SignalsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681658(7561-7570)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '23: Proceedings of the 31st ACM International Conference on Multimedia
      October 2023
      9913 pages
      ISBN:9798400701085
      DOI:10.1145/3581783
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2023

      Check for updates

      Author Tags

      1. brain-computer interface (bci)
      2. controlled image reconstruction
      3. fmri
      4. probabilistic diffusion model

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MM '23
      Sponsor:
      MM '23: The 31st ACM International Conference on Multimedia
      October 29 - November 3, 2023
      Ottawa ON, Canada

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)931
      • Downloads (Last 6 weeks)80
      Reflects downloads up to 14 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)NeuralDiffuser: Neuroscience-Inspired Diffusion Guidance for fMRI Visual ReconstructionIEEE Transactions on Image Processing10.1109/TIP.2025.352605134(552-565)Online publication date: 2025
      • (2024)FedMinds: Privacy-Preserving Personalized Brain Visual DecodingProceedings of the 2024 4th International Joint Conference on Robotics and Artificial Intelligence10.1145/3696474.3698034(174-177)Online publication date: 13-Sep-2024
      • (2024)Query Augmentation with Brain SignalsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681658(7561-7570)Online publication date: 28-Oct-2024
      • (2024)BrainRAM: Cross-Modality Retrieval-Augmented Image Reconstruction from Human Brain ActivityProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681296(3994-4003)Online publication date: 28-Oct-2024
      • (2024)DREAM: Visual Decoding from REversing HumAn Visual SysteM2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00804(8211-8220)Online publication date: 3-Jan-2024
      • (2024)A Weighted Co-Training Framework for Emotion Recognition Based on EEG Data Generation Using Frequency-Spatial Diffusion TransformerIEEE Transactions on Affective Computing10.1109/TAFFC.2024.339535915:4(2055-2069)Online publication date: Oct-2024
      • (2024)VTVBrain: A Two-stage Brain Encoding Model for Decoding Key Neural Responses in Multimodal Contexts2024 Photonics & Electromagnetics Research Symposium (PIERS)10.1109/PIERS62282.2024.10618584(1-9)Online publication date: 21-Apr-2024
      • (2024)Mind Artist: Creating Artistic Snapshots with Human Thought2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02569(27197-27207)Online publication date: 16-Jun-2024
      • (2024)MindBridge: A Cross-Subject Brain Decoding Framework2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01077(11333-11342)Online publication date: 16-Jun-2024
      • (2024)Mental image reconstruction from human brain activityNeural Networks10.1016/j.neunet.2023.11.024170:C(349-363)Online publication date: 12-Apr-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media