research-article

Attention, please! A survey of neural attention models in deep learning

Authors:

Alana de Santana Correia,

Esther Luna ColombiniAuthors Info & Claims

Artificial Intelligence Review, Volume 55, Issue 8

Pages 6037 - 6124

https://doi.org/10.1007/s10462-022-10148-x

Published: 01 December 2022 Publication History

Abstract

In humans, Attention is a core property of all perceptual and cognitive operations. Given our limited ability to process competing sources, attention mechanisms select, modulate, and focus on the information most relevant to behavior. For decades, concepts and functions of attention have been studied in philosophy, psychology, neuroscience, and computing. For the last 6 years, this property has been widely explored in deep neural networks. Currently, the state-of-the-art in Deep Learning is represented by neural attention models in several application domains. This survey provides a comprehensive overview and analysis of developments in neural attention models. We systematically reviewed hundreds of architectures in the area, identifying and discussing those in which attention has shown a significant impact. We also developed and made public an automated methodology to facilitate the development of reviews in the area. By critically analyzing 650 works, we describe the primary uses of attention in convolutional, recurrent networks, and generative models, identifying common subgroups of uses and applications. Furthermore, we describe the impact of attention in different application domains and their impact on neural networks’ interpretability. Finally, we list possible trends and opportunities for further research, hoping that this review will provide a succinct overview of the main attentional models in the area and guide researchers in developing future approaches that will drive further improvements.

References

[1]

Abdulnabi AH, Shuai B, Winkler S, Wang G (2017) Episodic camn: contextual attention-based memory networks with iterative feedback for scene labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5561–5570

[2]

Abolghasemi P, Mazaheri A, Shah M, Boloni L (2019) Pay attention!-robustifying a deep visuomotor policy through task-focused visual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4254–4262

[3]

Abu-El-Haija S, Perozzi B, Al-Rfou R, Alemi AA (2018) Watch your step: learning node embeddings via graph attention. In: Advances in neural information processing systems, pp 9180–9190

[4]

Agrawal A, Batra D, Parikh D, Kembhavi A (2018) Don’t just assume; look and answer: overcoming priors for visual question answering. In: Proceedings of the IEEE CVPR, pp 4971–4980

[5]

Ahmadi S (2017) Attention-based encoder-decoder networks for spelling and grammatical error correction. PhD thesis, Paris Descartes University

[6]

Ahmadi AHK, Hassani K, Moradi P, Lee L, Morris Q (2020) Memory-based graph networks. In: 8th International conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020, OpenReview.net. https://openreview.net/forum?id=r1laNeBYPB

[7]

Allamanis M, Peng H, Sutton C (2016) A convolutional attention network for extreme summarization of source code. In: International conference on machine learning, pp 2091–2100

[8]

Alt C, Hübner M, Hennig L (2018) Improving relation extraction by pre-trained language representations. In: Automated knowledge base construction (AKBC)

[9]

Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6077–6086

[10]

Andreas J, Rohrbach M, Darrell T, Klein D (2015) Deep compositional question answering with neural module networks. arXiv:abs/1511.02799

[11]

Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D (2015) Vqa: visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 2425–2433

[12]

Arjovsky M (2020) Out of distribution generalization in machine learning. PhD thesis, New York University

[13]

Ba J, Mnih V, Kavukcuoglu K (2015) Multiple object recognition with visual attention. In: ICLR (Poster). arXiv:1412.7755

[14]

Baevski A, Auli M (2019) Adaptive input representations for neural language modeling. In: International conference on learning representations. https://openreview.net/forum?id=ByxZX20qFQ

[15]

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7=-9, 2015, conference track proceedings

[16]

Bahdanau D, Chorowski J, Serdyuk D, Brakel P, Bengio Y (2016) End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4945–4949

[17]

Baker B, Kanitscheider I, Markov T, Wu Y, Powell G, McGrew B, Mordatch I (2020) Emergent tool use from multi-agent autocurricula. In: International conference on learning representations. https://openreview.net/forum?id=SkxpxJBKwS

[18]

Baluja S and Pomerleau DA Expectation-based selective attention for visual monitoring and control of a robot vehicle Robot Auton Syst 1997 22 3–4 329-344

[19]

Baraldi L, Grana C, Cucchiara R (2017) Hierarchical boundary-aware neural encoder for video captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1657–1666

[20]

Bartunov S, Vetrov DP (2017) Fast adaptation in generative models with generative matching networks. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, Workshop track proceedings, OpenReview.net. https://openreview.net/forum?id=r1IvyjVYl

[21]

Bastings J, Titov I, Aziz W, Marcheggiani D, Sima’an K (2017) Graph convolutional encoders for syntax-aware neural machine translation. In: Proceedings of the 2017 conference on empirical methods in natural language processing, association for computational linguistics, Copenhagen, Denmark, pp 1957–1967.

[22]

Bauer L, Wang Y, Bansal M (2018) Commonsense for generative multi-hop question answering tasks. In: Proceedings of the empirical methods in natural language processing

[23]

Baziotis C, Pelekis N, Doulkeridis C (2017) Datastories at semeval-2017 task 4: deep lstm with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp 747–754

[24]

Bazzani L, Larochelle H, Torresani L (2017) Recurrent mixture density network for spatiotemporal visual attention. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference track proceedings, OpenReview.net. https://openreview.net/forum?id=SJRpRfKxx

[25]

Bello I, Zoph B, Vaswani A, Shlens J, Le QV (2019) Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3286–3295

[26]

Ben-Shimon D, Tsikinovsky A, Friedmann M, Shapira B, Rokach L, Hoerle J (2015) Recsys challenge 2015 and the yoochoose dataset. In: Proceedings of the 9th ACM conference on recommender systems, pp 357–358

[27]

Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, and Pal U Script identification in natural scene image and video frames using an attention based convolutional-lstm network Pattern Recognit 2019 85 172-184

[28]

Bielski A, Trzcinski T (2018) Pay attention to virality: understanding popularity of social media videos with the attention mechanism. In: Proceedings of the IEEE CVPR workshops, pp 2335–2337

[29]

Bin Y, Yang Y, Shen F, Xie N, Shen HT, and Li X Describing video with attention-based bidirectional lstm IEEE Trans Cybern 2018 49 7 2631-2641

[30]

Bornschein J, Mnih A, Zoran D, Rezende DJ (2017) Variational memory addressing in generative models. In: Proceedings of the 31st international conference on neural information processing systems, pp 3923–3932

[31]

Breazeal C, Scassellati B (1999) A context-dependent attention system for a social robot. rn 255:3

[32]

Broadbent DE Perception and communication 2013 New York Elsevier

[33]

Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. In: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H (eds) Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6–12, 2020, virtual

[34]

Burgess CP, Matthey L, Watters N, Kabra R, Higgins I, Botvinick M, Lerchner A (2019) Monet: unsupervised scene decomposition and representation. arXiv:abs/1901.11390

[35]

Cao Q, Lin L, Shi Y, Liang X, Li G (2017) Attention-aware face hallucination via deep reinforcement learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 690–698

[36]

Chan W, Jaitly N, Le Q, Vinyals O (2016) Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. 2016 IEEE international conference on acoustics. speech and signal processing (ICASSP). IEEE, pp 4960–4964

[37]

Chaudhari S, Polatkan G, Ramanath R, Mithal V (2019) An attentive survey of attention models. arXiv:190402874

[38]

Chen J (2016) Multi-modal learning: study on a large-scale micro-video data collection. In: Proceedings of the 24th ACM international conference on Multimedia, pp 1454–1458

[39]

Chen H and Li Y Three-stream attention-aware network for rgb-d salient object detection IEEE Trans Image Process 2019 28 6 2825-2835

[40]

Chen H, Sun M, Tu C, Lin Y, Liu Z (2016a) Neural sentiment classification with user and product attention. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1650–1659

[41]

Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016b) Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3640–3649

[42]

Chen J, Zhang H, He X, Nie L, Liu W, Chua TS (2017a) Attentive collaborative filtering: multimedia recommendation with item-and component-level attention. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 335–344

[43]

Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua TS (2017b) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: The IEEE conference on computer vision and pattern recognition (CVPR)

[44]

Chen P, Sun Z, Bing L, Yang W (2017c) Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 452–461

[45]

Chen D, Li H, Xiao T, Yi S, Wang X (2018a) Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1169–1178

[46]

Chen S, Tan X, Wang B, Hu X (2018b) Reverse attention for salient object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 234–250

[47]

Chen X, Li LJ, Fei-Fei L, Gupta A (2018c) Iterative visual reasoning beyond convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7239–7248

[48]

Chen X, Xu C, Yang X, Tao D (2018d) Attention-gan for object transfiguration in wild images. In: Proceedings of the European conference on computer vision (ECCV), pp 164–180

[49]

Chen Y, Kalantidis Y, Li J, Yan S, Feng J (2018e) A

^{2}

-nets: Double attention networks. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates, Inc., pp 352–361. http://papers.nips.cc/paper/7318-a2-nets-double-attention-networks.pdf

[50]

Chen C, Liu Y, Kreiss S, Alahi A (2019a) Crowd-robot interaction: crowd-aware robot navigation with attention-based deep reinforcement learning. In: 2019 IEEE ICRA. IEEE, pp 6015–6022

[51]

Chen X, Zhang R, Yan P (2019b) Feature fusion encoder decoder network for automatic liver lesion segmentation. In: 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019). IEEE, pp 430–433

[52]

Chen M, Radford A, Child R, Wu J, Jun H, Dhariwal P, Luan D, Sutskever I (2020) Generative pretraining from pixels. In: Proceedings of the 37th international conference on machine learning, vol 1

[53]

Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. In: Su J, Carreras X, Duh K (eds) Proceedings of the 2016 conference on empirical methods in natural language processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, The Association for Computational Linguistics, pp 551–561.

[54]

Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017a) Focusing attention: towards accurate text recognition in natural images. In: 2017 IEEE international conference on computer vision (ICCV), pp 5086–5094,

[55]

Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017b) Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE ICCV, pp 5076–5084

[56]

Chiang T, Huang C, Su S, and Chen Y Learning multi-level information for dialogue response selection by highway recurrent transformer Comput Speech Lang 2020 63 101073

[57]

Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014a) On the properties of neural machine translation: encoder-decoder approaches. In: Wu D, Carpuat M, Carreras X, Vecchi EM (eds) Proceedings of SSST@EMNLP 2014, Eighth workshop on syntax, semantics and structure in statistical translation, Doha, Qatar, 25 October 2014, Association for Computational Linguistics, pp 103–111., https://www.aclweb.org/anthology/W14-4012/

[58]

Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014b) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, ACL, pp 1724–1734.

[59]

Cho K, Courville A, and Bengio Y Describing multimedia content using attention-based encoder-decoder networks IEEE Trans Multimed 2015 17 11 1875-1886

[60]

Choi J, Chang HJ, Yun S, Fischer T, Demiris Y, Choi JY (2017) Attentional correlation filter network for adaptive visual tracking. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 4828–4837.

[61]

Choi J, Seo H, Im S, Kang M (2019a) Attention routing between capsules. In: Proceedings of the IEEE/CVF ICCV workshops, pp 0–0

[62]

Choi M, Park C, Yang S, Kim Y, Choo J, Hong SR (2019b) Aila: Attentive interactive labeling assistant for document classification through attention-based deep neural networks. In: Proceedings of the 2019 CHI conference on human factors in computing systems, pp 1–12

[63]

Chopra S, Auli M, Rush AM (2016) Abstractive sentence summarization with attentive recurrent neural networks. In: Proceedings of the 2016 conference of the north American chapter of the association for computational linguistics: human language technologies, association for computational linguistics, San Diego, California, pp 93–98., http://aclweb.org/anthology/N16-1012

[64]

Chorowski J, Bahdanau D, Cho K, Bengio Y (2014) End-to-end continuous speech recognition using attention-based recurrent NN: first results. arXiv:abs/1412.1602

[65]

Chorowski J, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. In: Proceedings of the 28th international conference on neural information processing systems, Vol 1, pp 577–585

[66]

Chowdhury FRR, Wang Q, Moreno IL, Wan L, (2018) Attention-based models for text-dependent speaker verification. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5359–5363

[67]

Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017a) Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE international conference on computer vision, pp 4836–4845

[68]

Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017b) Multi-context attention for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1831–1840

[69]

Chun MM, Golomb JD, and Turk-Browne NB A taxonomy of external and internal attention Ann Rev Psychol 2011 62 73-101

[70]

Clark JJ, Ferrier NJ (1988) Modal control of an attentive vision system. In: IEEE ICCV. IEEE, pp 514–523

[71]

Clark JJ, Ferrier NJ (1992) Attentive visual servoing. In: Active vision, Citeseer

[72]

Clark K, Khandelwal U, Levy O, Manning CD (2019) What does bert look at? An analysis of Bert’s attention. In: Proceedings of the 2019 ACL workshop BlackboxNLP: analyzing and interpreting neural networks for NLP, pp 276–286

[73]

Cohan A, Dernoncourt F, Kim DS, Bui T, Kim S, Chang W, Goharian N (2018) A discourse-aware attention model for abstractive summarization of long documents. In: Walker MA, Ji H, Stent A (eds) Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1–6, 2018, Volume 2 (Short Papers), Association for Computational Linguistics, pp 615–621.

[74]

Colombini EL, da Silva Simoes A, Ribeiro C (2014) An attentional model for intelligent robotics agents. PhD thesis, Instituto Tecnológico de Aeronáutica, São José dos Campos, Brazil

[75]

Cornia M, Baraldi L, Serra G, and Cucchiara R Predicting human eye fixations via an lstm-based saliency attentive model IEEE Trans Image Process 2018 27 10 5142-5154

[76]

Cui Y, Liu T, Chen Z, Wang S, Hu G (2016) Consensus attention-based neural networks for Chinese reading comprehension. In: Calzolari N, Matsumoto Y, Prasad R (eds) COLING 2016, 26th international conference on computational linguistics, Proceedings of the conference: technical papers, December 11–16, 2016, Osaka, Japan, ACL, pp 1777–1786. https://www.aclweb.org/anthology/C16-1167/

[77]

Cui Y, Chen Z, Wei S, Wang S, Liu T, Hu G (2017) Attention-over-attention neural networks for reading comprehension. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 593–602

[78]

Dai N, Liang J, Qiu X, Huang X (2019a) Style transformer: unpaired text style transfer without disentangled latent representation. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019, volume 1: long papers. Association for Computational Linguistics, pp 5997–6007.

[79]

Dai Z, Yang Z, Yang Y, Carbonell JG, Le QV, Salakhutdinov R (2019b) Transformer-xl: attentive language models beyond a fixed-length context. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019, volume 1: long papers. Association for Computational Linguistics, pp 2978–2988.

[80]

Damirchi H, Khorrambakht R, Taghirad HD (2020) Exploring self-attention for visual odometry. arXiv:abs/2011.08634

[81]

Daniluk M, Rocktäschel T, Welbl J, Riedel S (2017) Frustratingly short attention spans in neural language modeling. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=ByIAPUcee

[82]

Das R, Neelakantan A, Belanger D, McCallum A (2017) Chains of reasoning over entities, relations, and text using recurrent neural networks. In: Lapata M, Blunsom P, Koller A (eds) Proceedings of the 15th conference of the european chapter of the association for computational linguistics, EACL 2017, Valencia, Spain, April 3–7, 2017, volume 1: long papers. Association for Computational Linguistics, pp 132–141.

[83]

Dehghani M, Gouws S, Vinyals O, Uszkoreit J, Kaiser L (2019) Universal transformers. In: 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019. OpenReview.net. https://openreview.net/forum?id=HyzdRiR9Y7

[84]

Deng Y, Kim Y, Chiu J, Guo D, Rush A (2018) Latent alignment and variational attention. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates, Inc., pp 9712–9724. http://papers.nips.cc/paper/8179-latent-alignment-and-variational-attention.pdf

[85]

Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, volume 1 (long and short papers). Association for Computational Linguistics, pp 4171–4186.

[86]

Dhingra B, Liu H, Yang Z, Cohen WW, Salakhutdinov R (2017) Gated-attention readers for text comprehension. In: Barzilay R, Kan M (eds) Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30–August 4, volume 1: long papers, association for computational linguistics. pp 1832–1846.

[87]

Dong L, Lapata M (2016) Language to logical form with neural attention. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 33–43

[88]

Dong L, Wang F, Xu B (2019) Self-attention aligner: a latency-control end-to-end model for asr using self-attention network and chunk-hopping. ICASSP 2019–2019 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 5656–5660

[89]

Doughty H, Mayol-Cuevas W, Damen D (2019) The pros and cons: rank-aware temporal attention for skill determination in long videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7862–7871

[90]

Dozat T, Manning CD (2017) Deep biaffine attention for neural dependency parsing. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings. OpenReview.net. https://openreview.net/forum?id=Hk95PK9le

[91]

Du W, Wang Y, Qiao Y (2017) Rpan: an end-to-end recurrent pose-attention network for action recognition in videos. In: Proceedings of the IEEE international conference on computer vision, pp 3725–3734

[92]

Du J, Gui L, He Y, Xu R, and Wang X Convolution-based neural attention with applications to sentiment classification IEEE Access 2019 7 27983-27992

[93]

Duan Y, Andrychowicz M, Stadie B, Ho OJ, Schneider J, Sutskever I, Abbeel P, Zaremba W (2017) One-shot imitation learning. In: Advances in neural information processing systems, pp 1087–1098

[94]

Edel M, Lausch J (2016) Capacity visual attention networks. In: GCAI, pp 72–80

[95]

Eriguchi A, Hashimoto K, Tsuruoka Y (2016) Tree-to-sequence attentional neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 823–833

[96]

Escolano C, Costa-jussà MR, Fonollosa JAR (2018) (self-attentive) autoencoder-based universal language representation for machine translation. arXiv:abs/1810.06351

[97]

Eyzaguirre C, Soto A (2020) Differentiable adaptive computation time for visual reasoning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12817–12825

[98]

Fakoor R, Mohamed A, Mitchell M, Kang SB, Kohli P (2016) Memory-augmented attention modelling for videos. arXiv:abs/1611.02261

[99]

Fan A, Lewis M, Dauphin YN (2018a) Hierarchical neural story generation. In: Gurevych I, Miyao Y (eds) Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, Melbourne, Australia, July 15–20, 2018, volume 1: long papers. Association for Computational Linguistics, pp 889–898., https://www.aclweb.org/anthology/P18-1082/

[100]

Fan Z, Zhao X, Lin T, and Su H Attention-based multiview re-observation fusion network for skeletal action recognition IEEE Trans Multimed 2018 21 2 363-374

[101]

Fang H, Gupta S, Iandola F, Srivastava RK, Deng L, Dollár P, Gao J, He X, Mitchell M, Platt JC, et al. (2015) From captions to visual concepts and back. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1473–1482

[102]

Fang B, Li Y, Zhang H, and Chan JCW Hyperspectral images classification based on dense convolutional networks with spectral-wise attention mechanism Remote Sens 2019 11 2 159

[103]

Fang K, Toshev A, Fei-Fei L, Savarese S (2019b) Scene memory transformer for embodied agents in long-horizon tasks. In: Proceedings of the IEEE CVPR, pp 538–547

[104]

Feng S, Wang Y, Liu L, Wang D, and Yu G Attention based hierarchical lstm network for context-aware microblog sentiment classification World Wide Web 2019 22 1 59-81

[105]

Ferret J, Marinier R, Geist M, Pietquin O (2020) Self-attentional credit assignment for transfer in reinforcement learning. In: Bessiere C (ed) Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI 2020, ijcai.org, pp 2655–2661.

[106]

Figurnov M, Collins MD, Zhu Y, Zhang L, Huang J, Vetrov D, Salakhutdinov R (2017) Spatially adaptive computation time for residual networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1039–1048

[107]

Firat O, Cho K, Bengio Y (2016) Multi-way, multilingual neural machine translation with a shared attention mechanism. In: Knight K, Nenkova A, Rambow O (eds) NAACL HLT 2016, The 2016 conference of the north american chapter of the association for computational linguistics: human language technologies, San Diego California, USA, June 12–17, 2016. The Association for Computational Linguistics, pp 866–875.

[108]

Frintrop S VOCUS: a visual attention system for object detection and goal-directed search 2006 Berlin Springer

[109]

Frintrop S and Jensfelt P Attentional landmarks and active gaze control for visual slam IEEE Trans Robot 2008 24 5 1054-1065

[110]

Frintrop S, Rome E, and Christensen H Computational visual attention systems and their cognitive foundations: a survey ACM Trans Appl Percept 2010

[111]

Frintrop S, Rome E, and Christensen HI Computational visual attention systems and their cognitive foundations: a survey ACM Trans Appl Percept (TAP) 2010 7 1 1-39

[112]

Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, pp 4476–4484., http://ieeexplore.ieee.org/document/8099959/

[113]

Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, June 16–20, 2019, Computer Vision Foundation/IEEE, pp 3146–3154.

[114]

Fukui H, Hirakawa T, Yamashita T, Fujiyoshi H (2019a) Attention branch network: learning of attention mechanism for visual explanation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10705–10714

[115]

Fukui H, Hirakawa T, Yamashita T, Fujiyoshi H (2019b) Attention branch network: Learning of attention mechanism for visual explanation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10705–10714

[116]

Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. In: IEEE transactions on neural networks and learning systems

[117]

Gammulle H, Denman S, Sridharan S, Fookes C (2017) Two stream lstm: A deep fusion framework for human action recognition. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 177–186

[118]

Gan Z, Cheng Y, Kholy AE, Li L, Liu J, Gao J (2019) Multi-step reasoning via recurrent dual attention for visual dialog. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, July 28– August 2, 2019, volume 1: long papers. Association for Computational Linguistics, pp 6463–6474.

[119]

Ganea O, Hofmann T (2017) Deep joint entity disambiguation with local neural attention. In: Palmer M, Hwa R, Riedel S (eds) Proceedings of the 2017 conference on empirical methods in natural language processing, EMNLP 2017, Copenhagen, September 9–11, 2017. Association for Computational Linguistics, pp 2619–2629.

[120]

Gao L, Guo Z, Zhang H, Xu X, and Shen HT Video captioning with attention-based lstm and semantic consistency IEEE Trans Multimed 2017 19 9 2045-2055

[121]

Gao F, Yu J, Shen H, Wang Y, Yang H (2020a) Attentional separation-and-aggregation network for self-supervised depth-pose learning in dynamic scenes. arXiv:abs/2011.09369

[122]

Gao J, Li P, Chen Z, and Zhang J A survey on deep learning for multimodal data fusion Neural Comput 2020 32 5 829-864

[123]

Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: The kitti dataset. Int J Robot Res (IJRR)

[124]

Geng X, Zhang H, Bian J, Chua TS (2015) Learning image and user features for recommendation in social networks. In: Proceedings of the IEEE international conference on computer vision, pp 4274–4282

[125]

Girdhar R, Ramanan D (2017) Attentional pooling for action recognition. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, pp 34–45

[126]

Girdhar R, Carreira J, Doersch C, Zisserman A (2019) Video action transformer network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 244–253

[127]

Goyal A, Lamb A, Hoffmann J, Sodhani S, Levine S, Bengio Y, Schölkopf B (2019) Recurrent independent mechanisms. arXiv:190910893

[128]

Graves A (2016) Adaptive computation time for recurrent neural networks. arXiv:abs/1603.08983

[129]

Graves A, Wayne G, Danihelka I (2014) Neural turing machines. arXiv:abs/1410.5401

[130]

Grefenstette E, Hermann KM, Suleyman M, Blunsom P (2015) Learning to transduce with unbounded memory. In: Advances in neural information processing systems, pp 1828–1836

[131]

Gregor K, Danihelka I, Mnih A, Blundell C, Wierstra D (2014) Deep autoregressive networks. In: International conference on machine learning, PMLR, pp 1242–1250

[132]

Gregor K, Danihelka I, Graves A, Rezende DJ, Wierstra D (2015) DRAW: A recurrent neural network for image generation. In: Bach FR, Blei DM (eds) Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6–11 July 2015, JMLR.org, JMLR workshop and conference proceedings, vol 37, pp 1462–1471. http://proceedings.mlr.press/v37/gregor15.html

[133]

Guan Q, Huang Y, Zhong Z, Zheng Z, Zheng L, Yang Y (2018) Diagnose like a radiologist: attention guided convolutional neural network for thorax disease classification. arXiv:abs/1801.09927

[134]

Gülçehre Ç, Denil M, Malinowski M, Razavi A, Pascanu R, Hermann KM, Battaglia PW, Bapst V, Raposo D, Santoro A, de Freitas N (2019) Hyperbolic attention networks. In: 7th International conference on learning representations, ICLR 2019, New Orleans, May 6–9, 2019, OpenReview.net. https://openreview.net/forum?id=rJxHsjRqFQ

[135]

Guo Q, Qiu X, Liu P, Shao Y, Xue X, Zhang Z (2019a) Star-transformer. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the North American Chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, volume 1 (long and short papers). Association for Computational Linguistics, pp 1315–1325.

[136]

Guo X, Zhang H, Yang H, Xu L, and Ye Z A single attention-based combination of cnn and rnn for relation classification IEEE Access 2019 7 12467-12475

[137]

Gupta A, Johnson J, Fei-Fei L, Savarese S, Alahi A (2018) Social gan: Socially acceptable trajectories with generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2255–2264

[138]

Hackel T, Usvyatsov M, Galliani S, Wegner JD, Schindler K (2018) Inference, learning and attention mechanisms that exploit and preserve sparsity in convolutional networks. arXiv:abs/1801.10585

[139]

Hamker FH The emergence of attention by population-based inference and its role in distributed processing and cognitive control of vision Comput Vis Image Underst 2005 100 1–2 64-106

[140]

Hamker FH Modeling feature-based attention as an active top-down inference process BioSystems 2006 86 1–3 91-99

[141]

Han K, Guo J, Zhang C, Zhu M (2018) Attribute-aware attention model for fine-grained representation learning. In: Boll S, Lee KM, Luo J, Zhu W, Byun H, Chen CW, Lienhart R, Mei T (eds) 2018 ACM Multimedia conference on multimedia conference, MM 2018, Seoul, Republic of Korea, October 22–26, 2018. ACM, pp 2040–2048.

[142]

Han Z, Wang X, Vong C, Liu Y, Zwicker M, Chen CLP (2019) 3dviewgraph: learning global features for 3d shapes from A graph of unordered views with attention. In: Kraus S (ed) Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, China, August 10–16, 2019, ijcai.org, pp 758–765.

[143]

Hao Y, Zhang Y, Liu K, He S, Liu Z, Wu H, Zhao J (2017) An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 221–231

[144]

Hao J, Wang X, Yang B, Wang L, Zhang J, Tu Z (2019) Modeling recurrence for transformer. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, June 2–7, 2019, volume 1 (long and short papers). Association for Computational Linguistics, pp 1198–1207.

[145]

Harsha Vardhan LV, Jia G, and Kok S Probabilistic logic graph attention networks for reasoning Companion Proc Web Conf 2020 2020 669-673

[146]

He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: Proceedings of the IEEE international conference on computer vision, pp 3047–3055

[147]

He A, Luo C, Tian X, Zeng W (2018a) A twofold siamese network for real-time object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4834–4843

[148]

He T, Tian Z, Huang W, Shen C, Qiao Y, Sun C (2018b) An end-to-end textspotter with explicit alignment and attention. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, June 18–22, 2018. IEEE Computer Society, pp 5020–5029., http://openaccess.thecvf.com/content_cvpr_2018/html/He_An_End-to-End_TextSpotter_CVPR_2018_paper.html

[149]

He X, Yang Y, Shi B, and Bai X Vd-san: visual-densely semantic attention network for image caption generation Neurocomputing 2019 328 48-55

[150]

He Z, Zuo W, Kan M, Shan S, and Chen X Attgan: facial attribute editing by only changing what you want IEEE Trans Image Process 2019 28 11 5464-5478

[151]

Hendrycks D, Liu X, Wallace E, Dziedzic A, Krishnan R, Song D (2020) Pretrained transformers improve out-of-distribution robustness. arXiv:200406100

[152]

Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. In: Advances in neural information processing systems, pp 1693–1701

[153]

Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2015) Session-based recommendations with recurrent neural networks. arXiv:151106939

[154]

Hieber F, Domhan T, Denkowski M, Vilar D (2020) Sockeye 2: A toolkit for neural machine translation. In: Forcada ML, Martins A, Moniz H, Turchi M, Bisazza A, Moorkens J, Arenas AG, Nurminen M, Marg L, Fumega S, Martins B, Batista F, Coheur L, Escartín CP, Trancoso I (eds) Proceedings of the 22nd annual conference of the european association for machine translation, EAMT 2020, Lisboa, Portugal, November 3–5, 2020. European Association for Machine Translation, pp 457–458. https://www.aclweb.org/anthology/2020.eamt-1.50/

[155]

Hole KJ and Ahmad S A thousand brains: toward biologically constrained ai SN Appl Sci 2021 3 8 1-14

[156]

Hoogi A, Wilcox B, Gupta Y, Rubin DL (2019) Self-attention capsule networks for image classification. arXiv:abs/1904.12483

[157]

Hori C, Hori T, Lee T, Zhang Z, Harsham B, Hershey JR, Marks TK, Sumi K (2017a) Attention-based multimodal fusion for video description. In: IEEE international conference on computer vision, ICCV 2017, Venice, Italy, October 22–29, 2017. IEEE Computer Society, pp 4203–4212., http://doi.ieeecomputersociety.org/10.1109/ICCV.2017.450

[158]

Hori T, Watanabe S, Zhang Y, Chan W (2017b) Advances in joint ctc-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM. In: Lacerda F (ed) Interspeech 2017, 18th annual conference of the international speech communication association. Stockholm, Sweden, August 20–24, 2017, ISCA, pp 949–953. http://www.isca-speech.org/archive/Interspeech_2017/abstracts/1296.html

[159]

Hossain M, Hosseinzadeh M, Chanda O, Wang Y (2019) Crowd counting using scale-aware attention networks. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1280–1288

[160]

Hu D (2019) An introductory survey on attention mechanisms in nlp problems. In: Proceedings of SAI intelligent systems conference. Springer, pp 432–448

[161]

Hu X, Zhu L, Fu CW, Qin J, Heng PA (2018) Direction-aware spatial context features for shadow detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7454–7462

[162]

Hu D, Zhou S, Shen Q, Zheng S, Zhao Z, and Fan Y Digital image steganalysis based on visual attention and deep reinforcement learning IEEE Access 2019 7 25924-25935

[163]

Hu H, Xiao A, Zhang S, Li Y, Shi X, Jiang T, Zhang L, Zhang L, and Zeng J Deephint: understanding hiv-1 integration via deep learning with attention Bioinformatics 2019 35 10 1660-1667

[164]

Hu X, Yang K, Fei L, Wang K (2019c) Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 1440–1444

[165]

Hu J, Shen L, Albanie S, Sun G, and Wu E Squeeze-and-excitation networks IEEE Trans Pattern Anal Mach Intell 2020 42 8 2011-2023

[166]

Hu X, Fu CW, Zhu L, Wang T, Heng PA (2020b) Sac-net: Spatial attenuation context for salient object detection. IEEE Trans Circuits Syst Video Technol

[167]

Huang W and Zhou F Da-capsnet: dual attention mechanism capsule network Sci Rep 2020 10 1 1-13

[168]

Huang CZA, Vaswani A, Uszkoreit J, Simon I, Hawthorne C, Shazeer N, Dai AM, Hoffman MD, Dinculescu M, Eck D (2018a) Music transformer: generating music with long-term structure. In: International conference on learning representations

[169]

Huang F, Zhang X, Zhao Z, and Li Z Bi-directional spatial-semantic attention networks for image-text matching IEEE Trans Image Process 2018 28 4 2008-2020

[170]

Huang H, Zhu C, Shen Y, Chen W (2018c) Fusionnet: Fusing via fully-aware attention with application to machine comprehension. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=BJIgi_eCZ

[171]

Huang J, Zhou W, Zhang Q, Li H, Li W (2018d) Video-based sign language recognition without temporal segmentation. In: Thirty-second AAAI conference on artificial intelligence

[172]

Huang KY, Wu CH, and Su MH Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited speech responses Pattern Recognit 2019 88 668-678

[173]

Hudson DA, Manning CD (2018) Compositional attention networks for machine reasoning. In: 6th International conference on learning representations, ICLR 2018, Vancouver. April 30–May 3, 2018, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=S1Euwz-Rb

[174]

Hudson DA, Manning CD (2019) Gqa: A new dataset for real-world visual reasoning and compositional question answering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6700–6709

[175]

Ilse M, Tomczak JM, Welling M (2018) Attention-based deep multiple instance learning. In: Dy JG, Krause A (eds) Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, PMLR, proceedings of machine learning research, vol 80, pp 2132–2141. http://proceedings.mlr.press/v80/ilse18a.html

[176]

Irie K, Zeyer A, Schlüter R, Ney H (2019) Language modeling with deep transformers. In: Kubin G, Kacic Z (eds) Interspeech 2019, 20th annual conference of the international speech communication association, Graz, Austria, 15–19 September 2019, ISCA, pp 3905–3909.

[177]

Itti L, Koch C, and Niebur E A model of saliency-based visual attention for rapid scene analysis IEEE PAMI 1998 20 11 1254-1259

[178]

Jaderberg M, Simonyan K, Zisserman A, kavukcuoglu k (2015) Spatial transformer networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28. Curran Associates, Inc., pp 2017–2025. http://papers.nips.cc/paper/5854-spatial-transformer-networks.pdf

[179]

Jain S, Wallace BC (2019) Attention is not explanation. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, volume 1 (long and short papers). Association for Computational Linguistics, pp 3543–3556.

[180]

James W The principles of psychology 1890 New York Dover Publications

[181]

Jetley S, Lord NA, Lee N, Torr PHS (2018) Learn to pay attention. In: 6th International conference on learning representations, ICLR 2018, Vancouver, April 30–May 3, 2018, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=HyzbhfWRW

[182]

Ji G, Liu K, He S, Zhao J (2017) Distant supervision for relation extraction with sentence-level attention and entity descriptions. In: Proceedings of the AAAI conference on artificial intelligence

[183]

Ji Z, Fu Y, Guo J, Pang Y, Zhang ZM, et al. (2018) Stacked semantics-guided attention model for fine-grained zero-shot learning. In: Advances in neural information processing systems, pp 5995–6004

[184]

Ji Z, Xiong K, Pang Y, and Li X Video summarization with attention-based encoder-decoder networks IEEE Trans Circuits Syst Video Technol 2019 30 6 1709-1717

[185]

Jiang H, Shi T, Bai Z, and Huang L Ahcnet: an application of attention mechanism and hybrid connection for liver tumor segmentation in ct volumes IEEE Access 2019 7 24898-24909

[186]

Jiang M, Chen S, Yang J, Zhao Q (2020) Fantastic answers and where to find them: Immersive question-directed visual attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2980–2989

[187]

Jin S, Yao H, Sun X, Zhou S, Zhang L, and Hua X Deep saliency hashing for fine-grained retrieval IEEE Trans Image Process 2020 29 5336-5351

[188]

Johansen-Berg H and Lloyd DM The physiology and psychology of selective attention to touch Front Biosci 2000 5 D894-D904

[189]

Johnston A, Carneiro G (2020) Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4756–4765

[190]

Joulin A, Mikolov T (2015) Inferring algorithmic patterns with stack-augmented recurrent nets. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7–12, 2015, Montreal, pp 190–198

[191]

Kadlec R, Schmid M, Bajgar O, Kleindienst J (2016) Text understanding with the attention sum reader network. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 908–918

[192]

Kahneman D (1973) Attention and effort, vol 1063. Citeseer

[193]

Kaiser L, Gomez AN, Shazeer N, Vaswani A, Parmar N, Jones L, Uszkoreit J (2017) One model to learn them all. arXiv:abs/1706.05137

[194]

Kang G, Zheng L, Yan Y, Yang Y (2018) Deep adversarial attention alignment for unsupervised domain adaptation: the benefit of target expectation maximization. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision-ECCV 2018—15th european conference, Munich, September 8–14, 2018, Proceedings, Part XI, Springer, Lecture Notes in Computer Science, vol 11215, pp 420–436.

[195]

Kastaniotis D, Ntinou I, Tsourounis D, Economou G, Fotopoulos S (2018) Attention-aware generative adversarial networks (ata-gans). In: 2018 IEEE 13th image, video, and multidimensional signal processing workshop (IVMSP). IEEE, pp 1–5

[196]

Ke NR, Goyal A, Bilaniuk O, Binas J, Mozer MC, Pal C, Bengio Y (2018) Sparse attentive backtracking: temporal credit assignment through reminding. In: Proceedings of the 32nd international conference on neural information processing systems, pp 7651–7662

[197]

Kim JH, Lee SW, Kwak D, Heo MO, Kim J, Ha JW, Zhang BT (2016) Multimodal residual learning for visual qa. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29. Curran Associates, Inc., pp 361–369. http://papers.nips.cc/paper/6446-multimodal-residual-learning-for-visual-qa.pdf

[198]

Kim S, Hori T, Watanabe S (2017a) Joint ctc-attention based end-to-end speech recognition using multi-task learning. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4835–4839

[199]

Kim Y, Denton C, Hoang L, Rush AM (2017b) Structured attention networks. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=HkE0Nvqlg

[200]

Kim JH, Jun J, Zhang BT (2018a) Bilinear attention networks. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates, Inc., pp 1564–1574. http://papers.nips.cc/paper/7429-bilinear-attention-networks.pdf

[201]

Kim W, Goyal B, Chawla K, Lee J, Kwon K (2018b) Attention-based ensemble for deep metric learning. In: Proceedings of the European conference on computer vision (ECCV), pp 736–751

[202]

Kim Y, Kim D, Kumar A, Sarikaya R (2018c) Efficient large-scale neural domain classification with personalized attention. In: Gurevych I, Miyao Y (eds) Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, Melbourne, July 15–20, 2018, volume 1: long papers. Association for Computational Linguistics, pp 2214–2224., https://www.aclweb.org/anthology/P18-1206/

[203]

Kim S, Kang I, Kwak N (2019) Semantic sentence matching with densely-connected recurrent and co-attentive information. In: Proceedings of the AAAI conference on artificial intelligence, pp 6586–6593

[204]

Kim ES, Kang WY, On KW, Heo YJ, Zhang BT (2020a) Hypergraph attention networks for multimodal learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14581–14590

[205]

Kim J, Ma M, Pham T, Kim K, Yoo CD (2020b) Modality shifting attention network for multi-modal video question answering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10106–10115

[206]

Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language models. In: International conference on machine learning, PMLR, pp 595–603

[207]

Koch C, Ullman S (1987) Shifts in selective visual attention: towards the underlying neural circuitry. In: Matters of intelligence. Springer, pp 115–141

[208]

Kong S, Fowlkes C (2019) Pixel-wise attentional gating for scene parsing. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1024–1033

[209]

Kong Q, Xu Y, Wang W, Plumbley MD (2018a) Audio set classification with attention model: a probabilistic perspective. In: 2018 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 316–320

[210]

Kong T, Sun F, Tan C, Liu H, Huang W (2018b) Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 169–185

[211]

Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 426–434

[212]

Krishna R, Hata K, Ren F, Fei-Fei L, Carlos Niebles J (2017) Dense-captioning events in videos. In: Proceedings of the IEEE international conference on computer vision, pp 706–715

[213]

Krizhevsky A, Hinton G, et al. (2009) Learning multiple layers of features from tiny images

[214]

Kuen J, Wang Z, Wang G (2016) Recurrent attentional networks for saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3668–3677

[215]

Kumar A, Irsoy O, Ondruska P, Iyyer M, Bradbury J, Gulrajani I, Zhong V, Paulus R, Socher R (2016) Ask me anything: Dynamic memory networks for natural language processing. In: International conference on machine learning, PMLR, pp 1378–1387

[216]

Kumar A, Sangwan SR, Arora A, Nayyar A, Abdel-Basset M, et al. Sarcasm detection using soft attention-based bidirectional long short-term memory model with convolution network IEEE Access 2019 7 23319-23328

[217]

Kuncoro A, Ballesteros M, Kong L, Dyer C, Neubig G, Smith NA (2017) What do recurrent neural network grammars learn about syntax? In: Lapata M, Blunsom P, Koller A (eds) Proceedings of the 15th conference of the european chapter of the association for computational linguistics, EACL 2017, Valencia, April 3–7, 2017, volume 1: long papers. Association for Computational Linguistics, pp 1249–1258.,

[218]

Kuo XY, Liu C, Lin KC, Lee CY (2020) Dynamic attention-based visual odometry. In: Proceedings of the IEEE/CVF CVPR workshops, pp 36–37

[219]

LeCun Y, Bottou L, Bengio Y, and Haffner P Gradient-based learning applied to document recognition Proc IEEE 1998 86 11 2278-2324

[220]

LeCun Y, Bengio Y, and Hinton G Deep learning Nature 2015 521 7553 436-444

[221]

Lee CY, Osindero S (2016) Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2231–2239

[222]

Lee JB, Rossi R, Kong X (2018a) Graph classification using structural attention. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1666–1674

[223]

Lee KH, Chen X, Hua G, Hu H, He X (2018b) Stacked cross attention for image-text matching. In: Proceedings of the European conference on computer vision (ECCV), pp 201–216

[224]

Lee J, Lee Y, Kim J, Kosiorek AR, Choi S, Teh YW (2019a) Set transformer: A framework for attention-based permutation-invariant neural networks. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA, PMLR, Proceedings of machine learning research, vol 97, pp 3744–3753. http://proceedings.mlr.press/v97/lee19d.html

[225]

Lee JB, Rossi RA, Kim S, Ahmed NK, and Koh E Attention models in graphs: a survey ACM Trans Knowl Discov Data 2019 13 6 62:1-62:25

[226]

Li X, Loy CC (2018) Video object segmentation with joint re-identification and attention-aware mask propagation. In: Proceedings of the European conference on computer vision (ECCV), pp 90–105

[227]

Li C, Zhu J, Zhang B (2016a) Learning to generate with memory. In: International conference on machine learning, pp 1177–1186

[228]

Li J, Monroe W, Jurafsky D (2016b) Understanding neural networks through representation erasure. arXiv:abs/1612.08220

[229]

Li J, Ren P, Chen Z, Ren Z, Lian T, Ma J (2017) Neural attentive session-based recommendation. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 1419–1428

[230]

Li D, Yao T, Duan LY, Mei T, and Rui Y Unified spatio-temporal attention networks for action recognition in videos IEEE Trans Multimed 2018 21 2 416-428

[231]

Li G, Gan Y, Wu H, Xiao N, and Lin L Cross-modal attentional context learning for rgb-d object detection IEEE Trans Image Process 2018 28 4 1591-1601

[232]

Li H, Xiong P, An J, Wang L (2018c) Pyramid attention network for semantic segmentation. In: British machine vision conference 2018, BMVC 2018, Newcastle, UK, September 3–6, 2018, BMVA Press, p 285. http://bmvc2018.org/contents/papers/1120.pdf

[233]

Li K, Wu Z, Peng K, Ernst J, Fu Y (2018d) Tell me where to look: Guided attention inference network. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, June 18–22, 2018. IEEE Computer Society, pp 9215–9223.

[234]

Li S, Bak S, Carr P, Wang X (2018e) Diversity regularized spatiotemporal attention for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 369–378

[235]

Li W, Zhu X, Gong S (2018f) Harmonious attention network for person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR)

[236]

Li Y, Zeng J, Shan S, and Chen X Occlusion aware facial expression recognition using cnn with attention mechanism IEEE Trans Image Process 2018 28 5 2439-2450

[237]

Li Z, Gavrilyuk K, Gavves E, Jain M, and Snoek CG Videolstm convolves, attends and flows for action recognition Comput Vis Image Underst 2018 166 41-50

[238]

Li H, Chen J, Hu R, Yu M, Chen H, Xu Z (2019a) Action recognition using visual attention with reinforcement learning. In: Kompatsiaris I, Huet B, Mezaris V, Gurrin C, Cheng WH, Vrochidis S (eds) MultiMedia modeling. Lecture Notes in Computer Science. Springer, pp 365–376

[239]

Li H, Wang P, Shen C, Zhang G (2019b) Show, attend and read: A simple and strong baseline for irregular text recognition. In: Proceedings of the AAAI conference on artificial intelligence, pp 8610–8617

[240]

Li J, Yang B, Dou ZY, Wang X, Lyu MR, Tu Z (2019c) Information aggregation for multi-head attention with routing-by-agreement. In: NAACL-HLT (1), pp 3566–3575. https://aclweb.org/anthology/papers/N/N19/N19-1359/

[241]

Li J, Yang J, Hertzmann A, Zhang J, Xu T (2019d) Layoutgan: Generating graphic layouts with wireframe discriminators. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019. OpenReview.net. https://openreview.net/forum?id=HJxB5sRcFQ

[242]

Li L, Gan Z, Cheng Y, Liu J (2019e) Relation-aware graph attention network for visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 10313–10322

[243]

Li N, Liu S, Liu Y, Zhao S, Liu M (2019f) Neural speech synthesis with transformer network. In: Proceedings of the AAAI conference on artificial intelligence, pp 6706–6713

[244]

Li R, Li M, Li J (2019g) Connection sensitive attention U-NET for accurate retinal vessel segmentation. arXiv:abs/1903.05558

[245]

Li X, Chebiyyam V, Kirchhoff K (2019h) Multi-stream network with temporal attention for environmental sound classification. In: Kubin G, Kacic Z (eds) Interspeech 2019, 20th Annual conference of the international speech communication Association, Graz, Austria, 15–19 September 2019, ISCA, pp 3604–3608.

[246]

Li X, Xiong H, Wang H, Rao Y, Liu L, Huan J (2019i) Delta: deep learning transfer using feature map with attention for convolutional networks. In: 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019, OpenReview.net. https://openreview.net/forum?id=rkgbwsAcYm

[247]

Li X, Zhou Z, Chen L, and Gao L Residual attention-based lstm for video captioning World Wide Web 2019 22 2 621-636

[248]

Li Y, Yao T, Pan Y, Chao H, Mei T (2019k) Pointing novel objects in image captioning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12497–12506

[249]

Li X, Hou Y, Wang P, Gao Z, Xu M, Li W (2021) Transformer guided geometry model for flow-based unsupervised visual odometry. Neural Comput Appl 1–12

[250]

Liang X, Hu Z, Zhang H, Gan C, Xing EP (2017) Recurrent topic-transition gan for visual paragraph generation. In: Proceedings of the IEEE ICCV, pp 3362–3371

[251]

Liang J, Jiang L, Cao L, Li J, Haupmann A (2018a) Focal visual-text attention for visual question answering. In: CVPR

[252]

Liang Y, Ke S, Zhang J, Yi X, Zheng Y (2018b) Geoman: multi-level attention networks for geo-sensory time series prediction. In: IJCAI, pp 3428–3434

[253]

Liang J, Jiang L, Cao L, Kalantidis Y, Li LJ, and Hauptmann AG Focal visual-text attention for memex question answering IEEE PAMI 2019 41 8 1893-1908

[254]

Liao X, He L, Yang Z, Zhang C (2018) Video-based person re-identification via 3d convolutional networks and non-local attention. In: Asian conference on computer vision. Springer, pp 620–634

[255]

Libovický J, Helcl J, Marecek D (2018) Input combination strategies for multi-source transformer decoder. In: Bojar O, Chatterjee R, Federmann C, Fishel M, Graham Y, Haddow B, Huck M, Jimeno-Yepes A, Koehn P, Monz C, Negri M, Névéol A, Neves ML, Post M, Specia L, Turchi M, Verspoor K (eds) Proceedings of the third conference on machine translation: research papers. WMT 2018, Belgium, Brussels, October 31–November 1, 2018. Association for Computational Linguistics, pp 253–260.,

[256]

Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755

[257]

Lin Y, Shen S, Liu Z, Luan H, Sun M (2016) Neural relation extraction with selective attention over instances. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 2124–2133

[258]

Lin Z, Feng M, dos Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference track proceedings. OpenReview.net, https://openreview.net/forum?id=BJC_jUqxe

[259]

Liu B, Lane I (2016) Attention-based recurrent neural network models for joint intent detection and slot filling. In: Morgan N (ed) Interspeech 2016, 17th annual conference of the international speech communication association, San Francisco, September 8–12, 2016, ISCA, pp 685–689.,

[260]

Liu J, Zhang Y (2017) Attention modeling for targeted sentiment. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics: volume 2, short papers, pp 572–577

[261]

Liu G and Guo J Bidirectional lstm with attention mechanism and convolutional layer for text classification Neurocomputing 2019 337 325-338

[262]

Liu Y, Lapata M (2019) Hierarchical transformers for multi-document summarization. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019, volume 1: long papers. Association for Computational Linguistics, pp 5070–5081.

[263]

Liu Y, Sun C, Lin L, Wang X (2016) Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv:abs/1605.09090

[264]

Liu H, Feng J, Qi M, Jiang J, and Yan S End-to-end comparative attention networks for person re-identification IEEE Trans Image Process 2017 26 7 3492-3506

[265]

Liu J, Wang G, Duan LY, Abdiyeva K, and Kot AC Skeleton-based human action recognition with global context-aware attention lstm networks IEEE Trans Image Process 2017 27 4 1586-1599

[266]

Liu J, Wang G, Hu P, Duan LY, Kot AC (2017c) Global context-aware attention lstm networks for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1647–1656

[267]

Liu J, Wang G, Hu P, Duan LY, Kot AC (2017d) Global context-aware attention lstm networks for 3d action recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR)

[268]

Liu X, Zhao H, Tian M, Sheng L, Shao J, Yi S, Yan J, Wang X (2017e) Hydraplus-net: attentive deep features for pedestrian analysis. In: Proceedings of the IEEE international conference on computer vision, pp 350–359

[269]

Liu J, Gao C, Meng D, Hauptmann AG (2018a) Decidenet: Counting varying density crowds through attention guided detection and density estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5197–5206

[270]

Liu L, Wang H, Li G, Ouyang W, Lin L (2018b) Crowd counting using deep recurrent spatial-aware network. In: Lang J (ed) Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, July 13–19, 2018, Stockholm, Sweden, ijcai.org, pp 849–855.

[271]

Liu N, Han J, Yang MH (2018c) Picanet: Learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3089–3098

[272]

Liu Q, Zeng Y, Mokhosi R, Zhang H (2018d) Stamp: short-term attention/memory priority model for session-based recommendation. In: Proceedings of the 24th ACM SIGKDD international conference on on knowledge discovery and data mining, pp 1831–1839

[273]

Liu N, Long Y, Zou C, Niu Q, Pan L, Wu H (2019a) Adcrowdnet: an attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3225–3234

[274]

Liu S, Johns E, Davison AJ (2019b) End-to-end multi-task learning with attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1871–1880

[275]

Liu S, Johns E, Davison AJ (2019c) End-to-end multi-task learning with attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1871–1880

[276]

Liu S, Zhang S, Zhang X, and Wang H R-trans: Rnn transformer network for Chinese machine reading comprehension IEEE Access 2019 7 27736-27745

[277]

Liu X, Wang Z, Shao J, Wang X, Li H (2019e) Improving referring expression grounding with cross-modal attention-guided erasing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1950–1959

[278]

Long X, Gan C, De Melo G, Wu J, Liu X, Wen S (2018) Attention clusters: Purely attention based local feature integration for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7834–7843

[279]

Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29. Curran Associates, Inc., pp 289–297

[280]

Lu J, Kannan A, Yang J, Parikh D, Batra D (2017a) Best of both worlds: transferring knowledge from discriminative learning to a generative visual dialog model. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in Neural Information Processing Systems 30: Annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 314–324

[281]

Lu J, Xiong C, Parikh D, Socher R (2017b) Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 375–383

[282]

Lu N, Wu Y, Feng L, and Song J Deep learning for fall detection: three-dimensional cnn combined with lstm on video kinematic data IEEE J Biomed Health Inf 2018 23 1 314-323

[283]

Lu X, Wang W, Danelljan M, Zhou T, Shen J, Gool LV (2020) Video object segmentation with episodic graph memory networks. In: Vedaldi A, Bischof H, Brox T, Frahm J (eds) Computer vision-ECCV 2020—16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III, Springer, Lecture Notes in Computer Science, vol 12348, pp 661–679.

[284]

Luo C, Jin L, and Sun Z Moran: a multi-object rectified attention network for scene text recognition Pattern Recogn 2019 90 109-118

[285]

Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 1412–1421., http://aclweb.org/anthology/D15-1166

[286]

Lüscher C, Beck E, Irie K, Kitza M, Michel W, Zeyer A, Schlüter R, and Ney H Rwth asr systems for librispeech: hybrid vs attention Proc Interspeech 2019 2019 231-235

[287]

Ma D, Li S, Zhang X, Wang H (2017) Interactive attention networks for aspect-level sentiment classification. In: Sierra C (ed) Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI 2017, Melbourne, August 19–25, 2017. ijcai.org, pp 4068–4074.

[288]

Ma C, Kadav A, Melvin I, Kira Z, AlRegib G, Graf HP (2018a) Attend and interact: higher-order object interactions for video understanding. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, June 18–22, 2018. IEEE Computer Society, pp 6790–6800.

[289]

Ma S, Fu J, Chen CW, Mei T (2018b) Da-gan: Instance-level image translation by deep attention generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5657–5666

[290]

Ma Y, Peng H, Cambria E (2018c) Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive lstm. In: Proceedings of the AAAI conference on artificial intelligence

[291]

Majumder N, Poria S, Hazarika D, Mihalcea R, Gelbukh A, Cambria E (2019) Dialoguernn: an attentive rnn for emotion detection in conversations. In: Proceedings of the AAAI conference on artificial intelligence, pp 6818–6825

[292]

Matthews D, Behne T, Lieven E, and Tomasello M Origins of the human pointing gesture: a training study Dev Sci 2012 15 6 817-829

[293]

Mei X, Pan E, Ma Y, Dai X, Huang J, Fan F, Du Q, Zheng H, and Ma J Spectral-spatial attention networks for hyperspectral image classification Remote Sens 2019 11 8 963

[294]

Mejjati YA, Richardt C, Tompkin J, Cosker D, Kim KI (2018) Unsupervised attention-guided image-to-image translation. In: Advances in neural information processing systems, pp 3693–3703

[295]

Meng X, Deng X, Zhu S, Liu S, Wang C, Chen C, Zeng B (2018) Mganet: a robust model for quality enhancement of compressed video. arXiv:181109150

[296]

Mensch A, Blondel M (2018) Differentiable dynamic programming for structured prediction and attention. In: Dy JG, Krause A (eds) Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, PMLR, proceedings of machine learning research, vol 80, pp 3459–3468. http://proceedings.mlr.press/v80/mensch18a.html

[297]

Miller AH, Fisch A, Dodge J, Karimi A, Bordes A, Weston J (2016) Key-value memory networks for directly reading documents. In: Su J, Carreras X, Duh K (eds) Proceedings of the 2016 conference on empirical methods in natural language processing, EMNLP 2016, Austin, Texas, USA, November 1–4, 2016. The Association for Computational Linguistics, pp 1400–1409.

[298]

Minaee S, Abdolrashidi A (2019) Deep-emotion: facial expression recognition using attentional convolutional network. arXiv:190201019

[299]

Mirsamadi S, Barsoum E, Zhang C (2017) Automatic speech emotion recognition using recurrent neural networks with local attention. In: 2017 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 2227–2231

[300]

Mishra N, Rohaninejad M, Chen X, Abbeel P (2018) A simple neural attentive meta-learner. In: 6th International conference on learning representations, ICLR 2018, Vancouver, April 30–May 3, 2018, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=B1DmUzWAW

[301]

Mittal S, Lamb A, Goyal A, Voleti V, Shanahan M, Lajoie G, Mozer M, Bengio Y (2020) Learning to combine top-down and bottom-up signals in recurrent neural networks with attention over modules. In: International conference on machine learning, PMLR, pp 6972–6986

[302]

Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, December 8–13 2014, Montreal, pp 2204–2212

[303]

Moon S, Shah P, Kumar A, Subba R (2019) Memory graph networks for explainable memory-grounded question answering. In: Proceedings of the 23rd conference on computational natural language learning (CoNLL), pp 728–736

[304]

Munkhdalai T, Yu H (2017) Neural tree indexers for text understanding. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, NIH Public Access, vol 1, p 11

[305]

Nallapati R, Zhou B, dos Santos C, Gulçehre Ç, Xiang B (2016) Abstractive text summarization using sequence-to-sequence rnns and beyond. In: Proceedings of The 20th SIGNLL conference on computational natural language learning, pp 280–290

[306]

Nam H, Ha JW, Kim J (2017) Dual attention networks for multimodal reasoning and matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 299–307

[307]

Navalpakkam V, Itti L (2006) An integrated model of top-down and bottom-up attention for optimizing detection speed. In: 2006 IEEE CVPR), vol 2. IEEE, pp 2049–2056

[308]

Neelakantan A, Le QV, Sutskever I (2016) Neural programmer: inducing latent programs with gradient descent. In: Bengio Y, LeCun Y (eds) 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, conference track proceedings. arXiv:1511.04834

[309]

Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning

[310]

Neumann M, Vu NT (2017) Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. In: Lacerda F (ed) Interspeech 2017, 18th annual conference of the international speech communication association, Stockholm, August 20–24, 2017, ISCA, pp 1263–1267. http://www.isca-speech.org/archive/Interspeech_2017/abstracts/0917.html

[311]

Nguyen DK, Okatani T (2018) Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering. In: The IEEE conference on computer vision and pattern recognition (CVPR)

[312]

Nguyen PX, Joty S (2018) Phrase-based attentions. arXiv:181003444

[313]

Norman DA Toward a theory of memory and attention Psychol Rev 1968 75 6 522

[314]

Norouzian A, Mazoure B, Connolly D, Willett D (2019) Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed. In: ICASSP 2019–2019 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 7310–7314

[315]

Oh J, Chockalingam V, Lee H, et al. (2016) Control of memory, active perception, and action in minecraft. In: International conference on machine learning, PMLR, pp 2790–2799

[316]

Oh SW, Lee JY, Xu N, Kim SJ (2019) Video object segmentation using space-time memory networks. In: Proceedings of the IEEE international conference on computer vision, pp 9226–9235

[317]

Okabe K, Koshinaka T, Shinoda K (2018) Attentive statistics pooling for deep speaker embedding. In: Yegnanarayana B (ed) Interspeech 2018, 19th Annual conference of the international speech communication association, Hyderabad, 2–6 September 2018, ISCA, pp 2252–2256.

[318]

Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B, et al. (2018) Attention u-net: learning where to look for the pancreas. arXiv:180403999

[319]

Olivastri S, Singh G, Cuzzolin F (2019) An end-to-end baseline for video captioning. arXiv:190402628

[320]

Osman A and Samek W DRAU: dual recurrent attention units for visual question answering Comput Vis Image Underst 2019 185 24-30

[321]

Ouerhani N (2003) Visual attention: from bio-inspired modeling to real-time implementation. PhD thesis, Université de Neuchâtel

[322]

Ouyang D, Zhang Y, and Shao J Video-based person re-identification via spatio-temporal attentional and two-stream fusion convolutional networks Pattern Recognit Lett 2019 117 153-160

[323]

Pan Y, Yao T, Li Y, Mei T (2020) X-linear attention networks for image captioning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10971–10980

[324]

Parikh AP, Täckström O, Das D, Uszkoreit J (2016) A decomposable attention model for natural language inference. In: Su J, Carreras X, Duh K (eds) Proceedings of the 2016 conference on empirical methods in natural language processing, EMNLP 2016, Austin, Texas, USA, November 1–4, 2016. The Association for Computational Linguistics, pp 2249–2255,

[325]

Parisotto E, Salakhutdinov R (2018) Neural map: structured memory for deep reinforcement learning. In: 6th International conference on learning representations, ICLR 2018, Vancouver, April 30–May 3, 2018, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=Bk9zbyZCZ

[326]

Park D, Kim J, Chun SY (2019) Down-scaling with learned kernels in multi-scale deep neural networks for non-uniform single image deblurring. arXiv:190310157

[327]

Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Shazeer N, Ku A, Tran D (2018) Image transformer. In: International conference on machine learning. PMLR, pp 4055–4064

[328]

Paulus R, Xiong C, Socher R (2018) A deep reinforced model for abstractive summarization. In: 6th International conference on learning representations, ICLR 2018, Vancouver, April 30–May 3, 2018, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=HkAClQgA-

[329]

Pedersoli M, Lucas T, Schmid C, Verbeek J (2017) Areas of attention for image captioning. In: IEEE International conference on computer vision, ICCV 2017, Venice, Italy, October 22–29, 2017. IEEE Computer Society, pp 1251–1259.

[330]

Pei W, Baltrusaitis T, Tax DM, Morency LP (2017) Temporal attention-gated model for robust sequence classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6730–6739

[331]

Peng Y, He X, and Zhao J Object-part attention model for fine-grained image classification IEEE Trans Image Process 2017 27 3 1487-1500

[332]

Perera D, Zimmermann R (2018) LSTM networks for online cross-network recommendations. In: Lang J (ed) Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, July 13–19, 2018, Stockholm. ijcai.org, pp 3825–3833.

[333]

Pesce E, Ypsilantis P, Withey S, Bakewell R, Goh V, Montana G (2017) Learning to detect chest radiographs containing lung nodules using visual attention networks. arXiv:171200996

[334]

Phaf RH, Van der Heijden A, and Hudson PT Slam: a connectionist model for attention in visual selection tasks Cognit Psychol 1990 22 3 273-341

[335]

Poulos J, Valle R (2021) Character-based handwritten text transcription with attention networks. Neural Comput Appl 1–11

[336]

Prabhavalkar R, Sainath T, Wu Y, Nguyen P, Chen Z, Chiu CC, Kannan A (2018) Minimum word error rate training for attention-based sequence-to-sequence models. https://ai.google/research/pubs/pub46670

[337]

Pu Y, Min MR, Gan Z, Carin L (2018) Adaptive feature abstraction for translating video to text. In: Thirty-second AAAI Conference on artificial intelligence

[338]

Qian R, Tan RT, Yang W, Su J, Liu J (2018) Attentive generative adversarial network for raindrop removal from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2482–2491

[339]

Qin Y, Song D, Chen H, Cheng W, Jiang G, Cottrell GW (2017) A dual-stage attention-based recurrent neural network for time series prediction. In: Sierra C (ed) Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI 2017, Melbourne, August 19–25, 2017, ijcai.org, pp 2627–2633.

[340]

Radford A, Wu J, Child R, Luan D, Amodei D, and Sutskever I Language models are unsupervised multitask learners OpenAI blog 2019 1 8 9

[341]

Raffel C, Eck D, Liu P, Weiss RJ, Luong T (2017) Online and linear-time attention by enforcing monotonic alignments. https://ai.google/research/pubs/pub46110

[342]

Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100,000+ questions for machine comprehension of text. arXiv:160605250

[343]

Ramachandram D and Taylor GW Deep multimodal learning: a survey on recent advances and trends IEEE Signal Process Mag 2017 34 6 96-108

[344]

Ramesh A, Pavlov M, Goh G, Gray S (2021) Dall

\cdot

e: creating images from text

[345]

Ran Q, Li P, Hu W, Zhou J (2019) Option comparison network for multiple-choice reading comprehension. arXiv:190303033

[346]

Rao Y, Lu J, Zhou J (2017) Attention-aware deep reinforcement learning for video face recognition. In: Proceedings of the IEEE international conference on computer vision, pp 3931–3940

[347]

Reed SE, Chen Y, Paine T, van den Oord A, Eslami SMA, Rezende DJ, Vinyals O, de Freitas N (2018) Few-shot autoregressive density estimation: towards learning to learn distributions. In: 6th International conference on learning representations, ICLR 2018, Vancouver, April 30–May 3, 2018, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=r1wEFyWCW

[348]

Rekabdar B, Mousas C, Gupta B (2019) Generative adversarial network with policy gradient for text summarization. In: 2019 IEEE 13th international conference on semantic computing (ICSC). IEEE, pp 204–207

[349]

Ren M, Zemel RS (2017) End-to-end instance segmentation with recurrent attention. In: The IEEE conference on computer vision and pattern recognition (CVPR)

[350]

Ren M, Liao R, Fetaya E, Zemel R (2019) Incremental few-shot learning with attention attractor networks. In: Advances in neural information processing systems, pp 5275–5285

[351]

Rensink RA The dynamic representation of scenes Visual Cognit 2000 7 1–3 17-42

[352]

Rezende DJ, Mohamed S, Danihelka I, Gregor K, Wierstra D (2016) One-shot generalization in deep generative models. arXiv:160305106

[353]

Riedl MO Human-centered artificial intelligence and machine learning Human Behav Emerg Technol 2019 1 1 33-36

[354]

Robicquet A, Sadeghian A, Alahi A, Savarese S (2016) Learning social etiquette: Human trajectory understanding in crowded scenes. In: European conference on computer vision. Springer, pp 549–565

[355]

Rocktäschel T, Grefenstette E, Hermann KM, Kociský T, Blunsom P (2016) Reasoning about entailment with neural attention. In: Bengio Y, LeCun Y (eds) 4th International conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, conference track proceedings. http://arxiv.org/abs/1509.06664

[356]

Rodríguez P, Cucurull G, Gonzàlez J, Gonfaus JM, Roca X (2018) A painless attention mechanism for convolutional neural networks. ICLR 2018

[357]

Rohrbach A, Rohrbach M, Hu R, Darrell T, Schiele B (2016) Grounding of textual phrases in images by reconstruction. In: European conference on computer vision. Springer, pp 817–834

[358]

Rossi E, Chamberlain B, Frasca F, Eynard D, Monti F, Bronstein M (2020) Temporal graph networks for deep learning on dynamic graphs. arXiv:200610637

[359]

Rotenstein A, Andreopoulos A, Fazl E, Jacob D, Robinson M, Shubina K, Zhu Y, Tsotsos J (2007) Towards the dream of intelligent, visually-guided wheelchairs. In: Proceedings of the 2nd international conference on technology and aging

[360]

Rudin C (2018) Please stop explaining black box models for high stakes decisions. arXiv:181110154 1

[361]

Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Màrquez L, Callison-Burch C, Su J, Pighin D, Marton Y (eds) Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015. The Association for Computational Linguistics, pp 379–389.

[362]

Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, pp 3856–3866

[363]

Sadeghian A, Legros F, Voisin M, Vesel R, Alahi A, Savarese S (2018) Car-net: clairvoyant attentive recurrent network. In: Proceedings of the European conference on computer vision (ECCV), pp 151–167

[364]

Sadeghian A, Kosaraju V, Sadeghian A, Hirose N, Rezatofighi H, Savarese S (2019) Sophie: An attentive gan for predicting paths compliant to social and physical constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1349–1358

[365]

Salah AA, Alpaydin E, and Akarun L A selective attention-based method for visual pattern recognition with application to handwritten digit recognition and face recognition IEEE PAMI 2002 24 3 420-425

[366]

Salakhutdinov R, Hinton G (2009) Deep Boltzmann machines. In: Artificial intelligence and statistics. PMLR, pp 448–455

[367]

Salazar J, Kirchhoff K, Huang Z (2019) Self-attention networks for connectionist temporal classification in speech recognition. ICASSP 2019–2019 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 7115–7119

[368]

Santoro A, Faulkner R, Raposo D, Rae J, Chrzanowski M, Weber T, Wierstra D, Vinyals O, Pascanu R, Lillicrap T (2018) Relational recurrent neural networks. In: Proceedings of the 32nd international conference on neural information processing systems, pp 7310–7321

[369]

Santos Cd, Tan M, Xiang B, Zhou B (2016) Attentive pooling networks. arXiv:160203609

[370]

Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on World Wide Web, pp 285–295

[371]

Savchuk V, Shults V, Galeev F (2018) Question answering with squad v2. 0

[372]

Schaul T, Glasmachers T, Schmidhuber J (2011) High dimensions and heavy tails for natural evolution strategies. In: Proceedings of the 13th annual conference on Genetic and evolutionary computation, pp 845–852

[373]

Scheier C, Egner S (1997) Visual attention in a mobile robot. In: ISIE’97 Proceeding of the IEEE international symposium on industrial electronics. vol 1. IEEE, pp SS48–SS52

[374]

Schick T, Schütze H (2019) Attentive mimicking: better word embeddings by attending to informative contexts. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, volume 1 (long and short papers). Association for Computational Linguistics, pp 489–494.

[375]

Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B, and Rueckert D Attention gated networks: learning to leverage salient regions in medical images Medical Image Anal 2019 53 197-207

[376]

Schuster M and Paliwal KK Bidirectional recurrent neural networks IEEE Trans Signal Process 1997 45 11 2673-2681

[377]

See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1073–1083

[378]

Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Berlin, pp 1715–1725., https://www.aclweb.org/anthology/P16-1162

[379]

Seo PH, Lin Z, Cohen S, Shen X, Han B (2016) Hierarchical attention networks. arXiv:abs/160602393

[380]

Seo MJ, Kembhavi A, Farhadi A, Hajishirzi H (2017) Bidirectional attention flow for machine comprehension. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=HJ0UKP9ge

[381]

Sermanet P, Frome A, Real E (2015) Attention for fine-grained categorization. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, May 7–9, 2015, workshop track proceedings. arXiv:abs/1412.7054

[382]

Serra J, Suris D, Miron M, Karatzoglou A (2018) Overcoming catastrophic forgetting with hard attention to the task. In: International conference on machine learning. PMLR, pp 4548–4557

[383]

Serrano S, Smith NA (2019) Is attention interpretable? In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, July 28–August 2, 2019, volume 1: long papers. Association for Computational Linguistics, pp 2931–2951.

[384]

Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019

[385]

Shao Y, Gouws S, Britz D, Goldie A, Strope B, Kurzweil R (2017) Generating high-quality and informative conversation responses with sequence-to-sequence models. In: Palmer M, Hwa R, Riedel S (eds) Proceedings of the 2017 conference on empirical methods in natural language processing, EMNLP 2017, Copenhagen, September 9–11, 2017. Association for Computational Linguistics, pp 2210–2219.

[386]

Sharma S, Kiros R, Salakhutdinov R (2015) Action recognition using visual attention. arXiv:151104119

[387]

Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. https://ai.google/research/pubs/pub46989

[388]

She H, Wu B, Wang B, Chi R (2018) Distant supervision for relation extraction with hierarchical attention and entity descriptions. In: Proceedings of the IEEE IJCNN. IEEE, pp 1–8

[389]

Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018a) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the AAAI conference on artificial intelligence

[390]

Shen T, Zhou T, Long G, Jiang J, Wang S, Zhang C (2018b) Reinforced self-attention network: a hybrid of hard and soft attention for sequence modeling. In: Lang J (ed) Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, July 13–19, 2018, Stockholm. ijcai.org, pp 4345–4352.

[391]

Shen T, Zhou T, Long G, Jiang J, Zhang C (2018c) Bi-directional block self-attention for fast and memory-efficient sequence modeling. In: 6th International conference on learning representations, ICLR 2018, Vancouver, April 30–May 3, 2018, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=H1cWzoxA-

[392]

Shih KJ, Singh S, Hoiem D (2016) Where to look: Focus regions for visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4613–4621

[393]

Shin B, Lee T, Choi JD (2017) Lexicon integrated CNN models with attention for sentiment analysis. In: Balahur A, Mohammad SM, van der Goot E (eds) Proceedings of the 8th workshop on computational approaches to subjectivity, sentiment and social media analysis, WASSA@EMNLP 2017, Copenhagen, September 8, 2017. Association for Computational Linguistics, pp 149–158.

[394]

Shuai B, Zuo Z, Wang B, and Wang G Scene segmentation with dag-recurrent neural networks IEEE PAMI 2017 40 6 1480-1493

[395]

Shuang K, Ren X, Yang Q, Li R, and Loo J Aela-dlstms: attention-enabled and location-aware double lstms for aspect-level sentiment classification Neurocomputing 2019 334 25-34

[396]

Si J, Zhang H, Li CG, Kuen J, Kong X, Kot AC, Wang G (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: Proceedings of the IEEE CVPR, pp 5363–5372

[397]

Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1227–1236

[398]

Song J, Gao L, Guo Z, Liu W, Zhang D, Shen HT (2017a) Hierarchical LSTM with adjusted temporal attention for video captioning. In: Sierra C (ed) Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI 2017, Melbourne. August 19–25, 2017, ijcai.org, pp 2737–2743.

[399]

Song S, Lan C, Xing J, Zeng W, Liu J (2017b) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI conference on artificial intelligence

[400]

Song C, Huang Y, Ouyang W, Wang L (2018) Mask-guided contrastive attention model for person re-identification. In: Proceedings of the IEEE CVPR, pp 1179–1188

[401]

Song M, Park H, and Ks Shin Attention-based long short-term memory network using sentiment lexicon embedding for aspect-level sentiment analysis in korean Inf Process Manag 2019 56 3 637-653

[402]

Song Y, Wang J, Jiang T, Liu Z, Rao Y (2019b) Attentional encoder network for targeted sentiment classification. arXiv:190209314

[403]

Sordoni A, Bachman P, Trischler A, Bengio Y (2016) Iterative alternating neural attention for machine reading. arXiv:160602245

[404]

Sperber M, Neubig G, Niehues J, and Waibel A Attention-passing models for robust and data-efficient end-to-end speech translation Trans Assoc Computat Linguist 2019 7 313-325

[405]

Stollenga MF, Masci J, Gomez FJ, Schmidhuber J (2014) Deep networks with internal selective attention through feedback connections. In: NIPS

[406]

Strubell E, Verga P, Andor D, Weiss D, McCallum A (2018) Linguistically-informed self-attention for semantic role labeling. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J (eds) Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, October 31–November 4, 2018. Association for Computational Linguistics, pp 5027–5038. https://www.aclweb.org/anthology/D18-1548/

[407]

Sudhakaran S, Escalera S, Lanz O (2019) Lsta: Long short-term attention for egocentric action recognition. In: Proceeedings of the IEEE/CVF CVPR, pp 9954–9963

[408]

Suganuma M, Liu X, Okatani T (2019) Attention-based adaptive selection of operations for image restoration in the presence of unknown combined distortions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9039–9048

[409]

Sukhbaatar S, Weston J, Fergus R, et al. (2015) End-to-end memory networks. In: Advances in neural information processing systems, pp 2440–2448

[410]

Sukhbaatar S, Grave E, Bojanowski P, Joulin A (2019) Adaptive attention span in transformers. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28– August 2, 2019, volume 1: long papers. Association for Computational Linguistics, pp 331–335.

[411]

Sun B, Zhu Y, Xiao Y, Xiao R, and Wei Y Automatic question tagging with deep neural networks IEEE Trans Learn Technol 2018 12 1 29-43

[412]

Tachibana H, Uenoyama K, Aihara S (2018) Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention. 2018 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 4784–4788

[413]

Tan M, Santos Cd, Xiang B, Zhou B (2015) Lstm-based deep learning models for non-factoid answer selection. arXiv:151104108

[414]

Tan YK, Xu X, Liu Y (2016) Improved recurrent neural networks for session-based recommendations. In: Proceedings of the 1st workshop on deep learning for recommender systems, pp 17–22

[415]

Tan Z, Wang M, Xie J, Chen Y, Shi X (2018) Deep semantic role labeling with self-attention. In: Proceedings of the AAAI conference on artificial intelligence

[416]

Tan ZX, Goel A, Nguyen TS, Ong DC (2019) A multimodal lstm for predicting listener empathic responses over time. In: 2019 14th IEEE international conference on automatic face and gesture recognition (FG 2019). IEEE, pp 1–4

[417]

Tang H, Xu D, Sebe N, Wang Y, Corso JJ, Yan Y (2019a) Multi-channel attention selection gan with cascaded semantic guidance for cross-view image translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2417–2426

[418]

Tang H, Xu D, Sebe N, Yan Y (2019b) Attention-guided generative adversarial networks for unsupervised image-to-image translation. In: 2019 International joint conference on neural networks (IJCNN). IEEE, pp 1–8

[419]

Tay Y, Luu AT, Hui SC, Su J (2018) Densely connected attention propagation for reading comprehension. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates, Inc., pp 4906–4917

[420]

Tenney I, Das D, Pavlick E (2019) BERT rediscovers the classical NLP pipeline. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, July 28–August 2, 2019, volume 1: long papers. Association for Computational Linguistics, pp 4593–4601.

[421]

Tian T and Fang ZF Attention-based autoencoder topic model for short texts Procedia Comput Sci 2019 151 1134-1139

[422]

Tian J, Li C, Shi Z, Xu F (2018a) A diagnostic report generator from ct volumes on liver tumor with semi-supervised attention mechanism. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 702–710

[423]

Tian W, Wang Z, Shen H, Deng W, Meng Y, Chen B, Zhang X, Zhao Y, Huang X (2018b) Learning better features for face detection with feature fusion and segmentation supervision. arXiv:181108557

[424]

Tootell RB, Hadjikhani N, Hall EK, Marrett S, Vanduffel W, Vaughan JT, and Dale AM The retinotopy of visual spatial attention Neuron 1998 21 6 1409-1422

[425]

Treisman AM and Gelade G A feature-integration theory of attention Cognit Psychol 1980 12 1 97-136

[426]

Tsai YH, Srivastava N, Goh H, Salakhutdinov R (2020) Capsules with inverted dot-product attention routing. In: 8th International conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020, OpenReview.net. https://openreview.net/forum?id=HJe6uANtwH

[427]

Van Den Oord A, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. In: Neural information processing systems conference (NIPS 2013), neural information processing systems foundation (NIPS), vol 26

[428]

Vashishth S, Upadhyay S, Tomar GS, Faruqui M (2019) Attention interpretability across nlp tasks. arXiv:190911218

[429]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, pp 5998–6008

[430]

Van der Velde F, de Kamps M, et al. Clam: closed-loop attention model for visual search Neurocomputing 2004 58 607-612

[431]

Veldhuizen MG, Bender G, Constable RT, and Small DM Trying to detect taste in a tasteless solution: modulation of early gustatory cortex by attention to taste Chem Sens 2007 32 6 569-581

[432]

Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: International conference on learning representations. https://openreview.net/forum?id=rJXMpikCZ

[433]

Vemula A, Muelling K, Oh J (2018) Social attention: Modeling attention in human crowds. In: IEEE ICRA. IEEE, pp 1–7

[434]

Verga P, Strubell E, McCallum A (2018) Simultaneously self-attending to all mentions for full-abstract biological relation extraction. In: Walker MA, Ji H, Stent A (eds) Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2018, New Orleans, June 1–6, 2018, volume 1 (long papers). Association for Computational Linguistics, pp 872–884.

[435]

Vig J, Belinkov Y (2019) Analyzing the structure of attention in a transformer language model. In: Proceedings of the 2019 ACL workshop BlackboxNLP: analyzing and interpreting neural networks for NLP, pp 63–76

[436]

Vinyals O, Le Q (2015) A neural conversational model. arXiv:150605869

[437]

Vinyals O, Fortunato M, Jaitly N (2015a) Pointer networks. In: Advances in neural information processing systems, pp 2692–2700

[438]

Vinyals O, Kaiser Ł, Koo T, Petrov S, Sutskever I, Hinton G (2015b) Grammar as a foreign language. In: Advances in neural information processing systems, pp 2773–2781

[439]

Vinyals O, Toshev A, Bengio S, Erhan D (2015c) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

[440]

Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5–10, 2016, Barcelona, pp 3630–3638

[441]

Walther D (2006) Interactions of visual attention and object recognition: computational modeling, algorithms, and psychophysics. PhD thesis, California Institute of Technology

[442]

Walther D, Edgington DR, Koch C (2004) Detection and tracking of objects in underwater video. In: Proceedings of the IEEE CVPR, vol 1. IEEE, pp I–I

[443]

Wang S, Jiang J (2016) Learning natural language inference with LSTM. In: Knight K, Nenkova A, Rambow O (eds) NAACL HLT 2016, The 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, San Diego California, June 12–17, 2016. The Association for Computational Linguistics, pp 1442–1451.

[444]

Wang F, Tax DM (2016) Survey on the attention based rnn model and its applications in computer vision. arXiv:160106823

[445]

Wang B, Liu K, Zhao J (2016a) Inner attention based recurrent neural networks for answer selection. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1288–1297

[446]

Wang L, Cao Z, De Melo G, Liu Z (2016b) Relation classification via multi-level attention cnns. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1298–1307

[447]

Wang X, Gao L, Song J, and Shen H Beyond frame-level cnn: saliency-aware 3-d cnn with lstm for video action recognition IEEE Signal Process Lett 2016 24 4 510-514

[448]

Wang Y, Huang M, Zhu X, Zhao L (2016d) Attention-based lstm for aspect-level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 606–615

[449]

Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017a) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

[450]

Wang W, Pan SJ, Dahlmeier D, Xiao X (2017b) Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In: Proceedings of the AAAI conference on artificial intelligence

[451]

Wang Z, Chen T, Li G, Xu R, Lin L (2017c) Multi-label image recognition by recurrently discovering attentional regions. In: Proceedings of the IEEE ICCV, pp 464–472

[452]

Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2018a) Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv:180407461

[453]

Wang C, Zhang Q, Huang C, Liu W, Wang X (2018b) Mancs: A multi-task attentional network with curriculum sampling for person re-identification. In: Proceedings of the European conference on computer vision (ECCV), pp 365–381

[454]

Wang J, Jiang W, Ma L, Liu W, Xu Y (2018c) Bidirectional attentive fusion with context gating for dense video captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7190–7198

[455]

Wang Q, Liu S, Chanussot J, and Li X Scene classification with recurrent attention of vhr remote sensing images IEEE Trans Geosci Remote Sens 2018 57 2 1155-1167

[456]

Wang W, Xu Y, Shen J, Zhu SC (2018e) Attentive fashion grammar network for fashion landmark detection and clothing category classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4271–4280

[457]

Wang W, Yan M, Wu C (2018f) Multi-granularity hierarchical attention fusion networks for reading comprehension and question answering. In: Proceedings of the 56th annual meeting of the association for computational linguistics, pp 1705–1714

[458]

Wang X, Peng Y, Lu L, Lu Z, Summers RM (2018g) Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9049–9058

[459]

Wang Y, Jiang L, Yang MH, Li LJ, Long M, Fei-Fei L (2018h) Eidetic 3d lstm: a model for video prediction and beyond. In: International conference on learning representations

[460]

Wang L, Huang Y, Hou Y, Zhang S, Shan J (2019a) Graph attention convolution for point cloud semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10296–10305

[461]

Wang P, Han J, Li C, and Pan R Logic attention based neighborhood aggregation for inductive knowledge graph embedding Proc AAAI Conf Artif Intell 2019 33 7152-7159

[462]

Wang X, Cai Z, Gao D, Vasconcelos N (2019c) Towards universal object detection by domain attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7289–7298

[463]

Wang X, Ji H, Shi C, Wang B, Ye Y, Cui P, Yu PS (2019d) Heterogeneous graph attention network. In: The world wide web conference, pp 2022–2032

[464]

Wang X, Li R, Mallidi SH, Hori T, Watanabe S, Hermansky H (2019e) Stream attention-based multi-array end-to-end speech recognition. ICASSP 2019–2019 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 7105–7109

[465]

Wang Y, Fan X, Chen IF, Liu Y, Chen T, Hoffmeister B (2019f) End-to-end anchored speech recognition. In: ICASSP 2019–2019 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 7090–7094

[466]

Watanabe S, Hori T, Kim S, Hershey JR, and Hayashi T Hybrid ctc/attention architecture for end-to-end speech recognition IEEE J Sel Top Signal Process 2017 11 8 1240-1253

[467]

Weston J, Chopra S, Bordes A (2014) Memory networks. arXiv:14103916

[468]

Wiegreffe S, Pinter Y (2019) Attention is not not explanation. In: Inui K, Jiang J, Ng V, Wan X (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, November 3–7, 2019. Association for Computational Linguistics, pp 11–20.

[469]

Wojna Z, Gorban AN, Lee DS, Murphy K, Yu Q, Li Y, Ibarz J (2017) Attention-based extraction of structured information from street view imagery. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 844–850

[470]

Woldorff MG, Gallen CC, Hampson SA, Hillyard SA, Pantev C, Sobel D, and Bloom FE Modulation of early sensory processing in human auditory cortex during auditory selective attention Proc Natl Acad Sci 1993 90 18 8722-8726

[471]

Wolfe JM, Cave KR, and Franzel SL Guided search: an alternative to the feature integration model for visual search J Exp Psychol Hum Percep Perform 1989 15 3 419

[472]

Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

[473]

Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, et al. (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:160908144

[474]

Wu C, Wei Y, Chu X, Weichen S, Su F, and Wang L Hierarchical attention-based multimodal fusion for video captioning Neurocomputing 2018 315 362-370

[475]

Wu L, Wang Y, Li X, and Gao J Deep attention-based spatially recursive networks for fine-grained visual recognition IEEE Trans Cybern 2018 49 5 1791-1802

[476]

Wu W, Chen Y, Xu J, Zhang Y (2018c) Attention-based convolutional neural networks for chinese relation extraction. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, pp 147–158

[477]

Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst

[478]

Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, pp 20–27

[479]

Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE CVPR, pp 842–850

[480]

Xiao S, Feng J, Xing J, Lai H, Yan S, Kassim A (2016) Robust facial landmark detection via recurrent attentive-refinement networks. In: European conference on computer vision. Springer, pp 57–72

[481]

Xiao F, Li J, Zhao H, Wang R, Chen K (2019) Lattice-based transformer encoder for neural machine translation. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, July 28–August 2, 2019, volume 1: long papers. Association for Computational Linguistics, pp 3090–3097.

[482]

Xie D, Deng C, Wang H, Li C, and Tao D Semantic adversarial network with multi-scale pyramid attention for video classification Proc AAAI Confer Artif Intell 2019 33 9030-9037

[483]

Xie H, Fang S, Zha ZJ, Yang Y, Li Y, and Zhang Y Convolutional attention networks for scene text recognition ACM Trans Multimed Comput Commun Appl (TOMM) 2019 15 1s 1-17

[484]

Xie S, Hu H, and Wu Y Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition Pattern Recognit 2019 92 177-191

[485]

Xin M, Zhang H, Sun M, Yuan D (2016) Recurrent temporal sparse autoencoder for attention-based action recognition. In: 2016 International joint conference on neural networks (IJCNN). IEEE, pp 456–463

[486]

Xing C, Wu W, Wu Y, Liu J, Huang Y, Zhou M, Ma WY (2017) Topic aware neural response generation. In: Proceedings of the AAAI conference on artificial intelligence

[487]

Xing C, Wu Y, Wu W, Huang Y, Zhou M (2018) Hierarchical recurrent attention network for response generation. In: Proceedings of the AAAI conference on artificial intelligence

[488]

Xiong C, Merity S, Socher R (2016) Dynamic memory networks for visual and textual question answering. In: Balcan M, Weinberger KQ (eds) Proceedings of the 33nd international conference on machine learning, ICML 2016, New York City, June 19–24, 2016, JMLR.org, JMLR workshop and conference proceedings, vol 48, pp 2397–2406. http://proceedings.mlr.press/v48/xiong16.html

[489]

Xiong C, Zhong V, Socher R (2017) Dynamic coattention networks for question answering. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=rJeKjwvclx

[490]

Xu H, Saenko K (2016) Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In: European conference on computer vision. Springer, pp 451–466

[491]

Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–205. http://proceedings.mlr.press/v37/xuc15.html

[492]

Xu D, Ouyang W, Alameda-Pineda X, Ricci E, Wang X, Sebe N (2017a) Learning deep structured multi-scale features using attention-gated crfs for contour prediction. In: Advances in neural information processing systems, pp 3961–3970

[493]

Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017b) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 4733–4742

[494]

Xu D, Wang W, Tang H, Liu H, Sebe N, Ricci E (2018a) Structured attention guided convolutional neural fields for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3917–3925

[495]

Xu J, Zhao R, Zhu F, Wang H, Ouyang W (2018b) Attention-aware compositional network for person re-identification. In: Proceedings of the IEEE CVPR, pp 2119–2128

[496]

Xu K, Wu L, Wang Z, Feng Y, Witbrock M, Sheinin V (2018c) Graph2seq: graph to sequence learning with attention-based neural networks. arXiv:180400823

[497]

Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018d) Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324

[498]

Xue W, Li T (2018) Aspect based sentiment analysis with gated convolutional networks. In: Gurevych I, Miyao Y (eds) Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, Melbourne, Australia, July 15–20, 2018, volume 1: long papers, association for computational linguistics, pp 2514–2523., https://www.aclweb.org/anthology/P18-1234/

[499]

Xue F, Wang X, Wang J, Zha H (2020) Deep visual odometry with adaptive memory. IEEE Trans Pattern Anal Mach Intell

[500]

Yang B, Mitchell TM (2017) Leveraging knowledge bases in lstms for improving machine reading. In: Barzilay R, Kan M (eds) Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, July 30–August 4, volume 1: long papers. Association for Computational Linguistics, pp 1436–1446.

[501]

Yang L, Ai Q, Guo J, Croft WB (2016a) anmm: Rranking short answer texts with attention-based neural matching model. In: Mukhopadhyay S, Zhai C, Bertino E, Crestani F, Mostafa J, Tang J, Si L, Zhou X, Chang Y, Li Y, Sondhi P (eds) Proceedings of the 25th ACM international conference on information and knowledge management, CIKM 2016, Indianapolis, October 24–28, 2016. ACM, pp 287–296.

[502]

Yang Z, He X, Gao J, Deng L, Smola A (2016b) Stacked attention networks for image question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 21–29

[503]

Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016c) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1480–1489

[504]

Yang Z, Yuan Y, Wu Y, Cohen WW, Salakhutdinov R (2016d) Review networks for caption generation. In: Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5–10, 2016, Barcelona, pp 2361–2369

[505]

Yang F, Yang Z, Cohen WW (2017a) Differentiable learning of logical rules for knowledge base reasoning. In: Advances in neural information processing systems, pp 2319–2328

[506]

Yang J, Ren P, Zhang D, Chen D, Wen F, Li H, Hua G (2017b) Neural aggregation network for video face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4362–4371

[507]

Yang B, Li J, Wong DF, Chao LS, Wang X, Tu Z (2019a) Context-aware self-attention networks. In: Proceedings of the AAAI conference on artificial intelligence, pp 387–394

[508]

Yang B, Wang L, Wong DF, Chao LS, Tu Z (2019b) Convolutional self-attention networks. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pp 4040–4045

[509]

Yang F, Jin L, Lai S, Gao X, and Li Z Fully convolutional sequence recognition network for water meter number reading IEEE Access 2019 7 11679-11687

[510]

Yang Z, Raymond OI, Sun W, and Long J Deep attention-guided hashing IEEE Access 2019 7 11209-11221

[511]

Yao L, Torabi A, Cho K, Ballas N, Pal C, Larochelle H, Courville A (2015) Describing videos by exploiting temporal structure. In: Proceedings of the IEEE international conference on computer vision, pp 4507–4515

[512]

Yao T, Pan Y, Li Y, Mei T (2018) Exploring visual relationship for image captioning. In: Proceedings of the European conference on computer vision (ECCV), pp 684–699

[513]

Yasuda Y, Wang X, Takaki S, Yamagishi J (2019) Investigation of enhanced tacotron text-to-speech synthesis systems with self-attention for pitch accent language. In: ICASSP 2019–2019 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 6905–6909

[514]

Ye HJ, Hu H, Zhan DC, Sha F (2018) Learning embedding adaptation for few-shot learning. arXiv:181203664

[515]

Yeung S, Russakovsky O, Mori G, Fei-Fei L (2016) End-to-end learning of action detection from frame glimpses in videos. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, Las Vegas, pp 2678–2687., http://ieeexplore.ieee.org/document/7780662/

[516]

Yin W, Schütze H, Xiang B, and Zhou B Abcnn: attention-based convolutional neural network for modeling sentence pairs Trans Assoc Comput Linguist 2016 4 259-272

[517]

Yin Q, Wang J, Luo X, Zhai J, Jha SK, and Shi YQ Quaternion convolutional neural network for color image classification and forensics IEEE Access 2019 7 20293-20301

[518]

You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: The IEEE conference on computer vision and pattern recognition (CVPR)

[519]

Yu H, Wang J, Huang Z, Yang Y, Xu W (2016) Video paragraph captioning using hierarchical recurrent neural networks. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016. IEEE Computer Society, pp 4584–4593.

[520]

Yu D, Fu J, Mei T, Rui Y (2017a) Multi-level attention networks for visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4709–4717

[521]

Yu Z, Yu J, Fan J, Tao D (2017b) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 1821–1830

[522]

Yu AW, Dohan D, Luong M, Zhao R, Chen K, Norouzi M, Le QV (2018a) Qanet: Combining local convolution with global self-attention for reading comprehension. In: 6th international conference on learning representations, ICLR 2018, Vancouver, April 30–May 3, 2018, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=B14TlG-RW

[523]

Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018b) Generative image inpainting with contextual attention. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, June 18–22, 2018. IEEE Computer Society, pp 5505–5514,

[524]

Yu L, Lin Z, Shen X, Yang J, Lu X, Bansal M, Berg TL (2018c) Mattnet: Modular attention network for referring expression comprehension. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1307–1315

[525]

Yu Z, Yu J, Xiang C, Fan J, and Tao D Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering IEEE Trans Neural Netw Learn Syst 2018 29 12 5947-5959

[526]

Yu Z, Yu J, Xiang C, Fan J, and Tao D Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering IEEE Trans Neural Netw Learn Syst 2018 29 12 5947-5959

[527]

Yuan Y, Wang J (2018) Ocnet: Object context network for scene parsing. arXiv:180900916

[528]

Yuan Y, Xiong Z, and Wang Q Vssa-net: vertical spatial sequence attention network for traffic sign detection IEEE Trans Image Process 2019 28 7 3423-3434

[529]

Yun K, Honorio J, Chattopadhyay D, Berg TL, Samaras D (2012) Two-person interaction detection using body-pose features and multiple instance learning. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, pp 28–35

[530]

Zadeh A, Liang PP, Mazumder N, Poria S, Cambria E, Morency LP (2018a) Memory fusion network for multi-view sequential learning. In: Thirty-second AAAI conference on artificial intelligence

[531]

Zadeh A, Liang PP, Poria S, Vij P, Cambria E, Morency LP (2018b) Multi-attention recurrent network for human communication comprehension. In: Thirty-second AAAI conference on artificial intelligence

[532]

Zagoruyko S, Komodakis N (2017) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings, OpenReview.net. https://openreview.net/forum?id=Sks9_ajex

[533]

Zambaldi V, Raposo D, Santoro A, Bapst V, Li Y, Babuschkin I, Tuyls K, Reichert D, Lillicrap T, Lockhart E, Shanahan M, Langston V, Pascanu R, Botvinick M, Vinyals O, Battaglia P (2019) Deep reinforcement learning with relational inductive biases. In: International conference on learning representations. https://openreview.net/forum?id=HkxaFoC9KQ

[534]

Zang J, Wang L, Liu Z, Zhang Q, Hua G, Zheng N (2018a) Attention-based temporal weighted convolutional neural network for action recognition. In: IFIP international conference on artificial intelligence applications and innovations. Springer, pp 97–108

[535]

Zang X, Pokle A, Vázquez M, Chen K, Niebles JC, Soto A, Savarese S (2018b) Translating navigation instructions in natural language to a high-level plan for behavioral robot navigation. In: EMNLP

[536]

Zelano C, Bensafi M, Porter J, Mainland J, Johnson B, Bremner E, Telles C, Khan R, and Sobel N Attentional modulation in human primary olfactory cortex Nat Neurosci 2005 8 1 114-120

[537]

Zeng J, Ma X, and Zhou K Enhancing attention-based lstm with position context for aspect-level sentiment classification IEEE Access 2019 7 20462-20471

[538]

Zeng Z, Xie W, Zhang Y, and Lu Y Ric-unet: an improved neural network based on unet for nuclei segmentation in histology images Ieee Access 2019 7 21420-21428

[539]

Zenkel T, Wuebker J, DeNero J (2019) Adding interpretable attention to neural translation models improves word alignment. arXiv:190111359

[540]

Zeyer A, Irie K, Schlüter R, and Ney H Improved training of end-to-end attention models for speech recognition Proc Interspeech 2018 2018 7-11

[541]

Zhang H, Li J, Ji Y, and Yue H Understanding subtitles by character-level sequence-to-sequence learning IEEE Trans Ind Inf 2016 13 2 616-624

[542]

Zhang B, Xiong D, Su J (2017a) Battrae: didimensional attention-based recursive autoencoders for learning bilingual phrase embeddings. In: Proceedings of the AAAI conference on artificial intelligence

[543]

Zhang J, Du J, Dai L (2017b) A gru-based encoder-decoder approach with attention for online handwritten mathematical expression recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 902–907

[544]

Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017c) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision, pp 2117–2126

[545]

Zhang Y, Zhong V, Chen D, Angeli G, Manning CD (2017d) Position-aware attention and supervised data improve slot filling. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 35–45

[546]

Zhang B, Xiong D, Su J, and Zhang M Learning better discourse representation for implicit discourse relation recognition via attention networks Neurocomputing 2018 275 1241-1249

[547]

Zhang J, Bargal SA, Lin Z, Brandt J, Shen X, and Sclaroff S Top-down neural attention by excitation backprop Int J Comput Vis 2018 126 10 1084-1102

[548]

Zhang L, Zhu G, Mei L, Shen P, Shah SAA, Bennamoun M (2018c) Attention in convolutional lstm for gesture recognition. In: Advances in neural information processing systems, pp 1953–1962

[549]

Zhang P, Xue J, Lan C, Zeng W, Gao Z, Zheng N (2018d) Adding attentiveness to the neurons in recurrent neural networks. In: Proceedings of the European conference on computer vision (ECCV), pp 135–151

[550]

Zhang S, Yang J, Schiele B (2018e) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6995–7003

[551]

Zhang T, Liu B, Niu D, Lai K, Xu Y (2018f) Multiresolution graph attention networks for relevance matching. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp 933–942

[552]

Zhang X, Su J, Qin Y, Liu Y, Ji R, Wang H (2018g) Asynchronous bidirectional decoding for neural machine translation. In: Proceedings of the AAAI conference on artificial intelligence

[553]

Zhang X, Wang T, Qi J, Lu H, Wang G (2018h) Progressive attention guided recurrent network for salient object detection. In: IEEE CVPR)

[554]

Zhang Y, Du J, Wang Z, Zhang J, Tu Y (2018i) Attention based fully convolutional network for speech emotion recognition. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 1771–1775

[555]

Zhang Y, Hare J, Prügel-Bennett A (2018j) Learning to count objects in natural images for visual question answering. In: International conference on learning representations. https://openreview.net/forum?id=B12Js_yRb

[556]

Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018k) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European conference on computer vision (ECCV), pp 286–301

[557]

Zhang H, Goodfellow I, Metaxas D, Odena A (2019a) Self-attention generative adversarial networks. In: International conference on machine learning. PMLR, pp 7354–7363

[558]

Zhang JX, Ling ZH, Liu LJ, Jiang Y, and Dai LR Sequence-to-sequence acoustic modeling for voice conversion IEEE/ACM Trans Audio Speech Lang Process 2019 27 3 631-644

[559]

Zhang L, Liu Z, Zhang S, Yang X, Qiao H, Huang K, and Hussain A Cross-modality interactive attention network for multispectral pedestrian detection Inf Fusion 2019 50 20-29

[560]

Zhang M, Wang X, Fang F, Li H, and Yamagishi J Joint training framework for text-to-speech and voice conversion using multi-source tacotron and wavenet Proc Interspeech 2019 2019 1298-1302

[561]

Zhang N, Deng S, Sun Z, Wang G, Chen X, Zhang W, Chen H (2019e) Long-tail relation extraction via knowledge graph embeddings and graph convolution networks. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, June 2–7, 2019, volume 1 (long and short papers). Association for Computational Linguistics, pp 3016–3025.

[562]

Zhang P, Liu W, Wang H, Lei Y, and Lu H Deep gated attention networks for large-scale street-level scene segmentation Pattern Recognit 2019 88 702-714

[563]

Zhang R, Li J, Sun H, Ge Y, Luo P, Wang X, and Lin L Scan: Self-and-collaborative attention network for video person re-identification IEEE Trans Image Process 2019 28 10 4870-4882

[564]

Zhang X, Wang X, Tang X, Zhou H, and Li C Description generation for remote sensing images using attribute attention mechanism Remote Sens 2019 11 6 612

[565]

Zhang XY, Shi H, Li C, Zheng K, Zhu X, Duan L (2019i) Learning transferable self-attentive representations for action recognition in untrimmed videos with weak supervision. In: Proceedings of the AAAI conference on artificial intelligence, pp 9227–9234

[566]

Zhang Y, Li K, Li K, Zhong B, Fu Y (2019j) Residual non-local attention networks for image restoration. In: 7th International conference on learning representations, ICLR 2019, New Orleans, May 6–9, 2019, OpenReview.net. https://openreview.net/forum?id=HkeGhoA5FX

[567]

Zhang Y, Niebles JC, Soto A (2019k) Interpretable visual question answering by visual grounding from attention supervision mining. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 349–357

[568]

Zhang Y, Wang X, Jiang X, Shi C, Ye Y (2019l) Hyperbolic graph attention network. arXiv:191203046

[569]

Zhang Y, Wang ZR, Du J (2019m) Deep fusion: An attention guided factorized bilinear pooling for audio-video emotion recognition. In: 2019 International joint conference on neural networks (IJCNN). IEEE, pp 1–8

[570]

Zhang Y, Xu X, Liu X (2019n) Robust and high performance face detector. arXiv:190102350

[571]

Zhang Y, Zhou C, Chang F, and Kot AC Multi-resolution attention convolutional neural network for crowd counting Neurocomputing 2019 329 144-152

[572]

Zhang Z, Liao L, Huang M, Zhu X, Chua TS (2019p) Neural multimodal belief tracker with adaptive attention for dialogue systems. In: The world wide web conference, pp 2401–2412

[573]

Zhao D, Chen Y, and Lv L Deep reinforcement learning with visual attention for vehicle classification IEEE Trans Cognit Dev Syst 2016 9 4 356-367

[574]

Zhao B, Wu X, Feng J, Peng Q, and Yan S Diversified visual attention networks for fine-grained object classification IEEE Trans Multimed 2017 19 6 1245-1256

[575]

Zhao L, Li X, Zhuang Y, Wang J (2017b) Deeply-learned part-aligned representations for person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 3219–3228

[576]

Zhao B, Li X, Lu X, and Wang Z A cnn-rnn architecture for multi-label weather recognition Neurocomputing 2018 322 47-57

[577]

Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018b) Psanet: Point-wise spatial attention network for scene parsing. In: Proceedings of the European conference on computer vision (ECCV), pp 267–283

[578]

Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE international conference on computer vision, pp 5209–5217

[579]

Zheng Z, Zheng L, and Yang Y Pedestrian alignment network for large-scale person re-identification IEEE Trans Circuits Syst Video Technol 2018 29 10 3037-3045

[580]

Zheng M, Karanam S, Wu Z, Radke RJ (2019a) Re-identification with consistent attentive siamese networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5735–5744

[581]

Zheng W, Li L, Zhang Z, Huang Y, Wang L (2019b) Relational network for skeleton-based action recognition. In: 2019 IEEE international conference on multimedia and expo (ICME). IEEE, pp 826–831

[582]

Zhong V, Xiong C, Keskar NS, Socher R (2019) Coarse-grain fine-grain coattention network for multi-evidence question answering. In: 7th International conference on learning representations, ICLR 2019, New Orleans, May 6–9, 2019, OpenReview.net. https://openreview.net/forum?id=Syl7OsRqY7

[583]

Zhou Y, Shao L (2018) Aware attentive multi-view inference for vehicle re-identification. In: Proceedings of the IEEE CVPR, pp 6489–6498

[584]

Zhou J, Cao Y, Wang X, Li P, and Xu W Deep recurrent models with fast-forward connections for neural machine translation Trans Assoc Comput Linguist 2016 4 371-383

[585]

Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B (2016b) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers), pp 207–212

[586]

Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4747–4756

[587]

Zhou H, Young T, Huang M, Zhao H, Xu J, Zhu X (2018a) Commonsense knowledge aware conversation generation with graph attention. In: IJCAI, pp 4623–4629

[588]

Zhou L, Zhou Y, Corso JJ, Socher R, Xiong C (2018b) End-to-end dense video captioning with masked transformer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8739–8748

[589]

Zhou S, Dong L, Xu S, Xu B (2018c) Syllable-based sequence-to-sequence speech recognition with the transformer in mandarin Chinese. In: Yegnanarayana B (ed) Interspeech 2018, 19th annual conference of the international speech communication association, Hyderabad, India, 2–6 September 2018, ISCA, pp 791–795.

[590]

Zhou X, Li L, Dong D, Liu Y, Chen Y, Zhao WX, Yu D, Wu H (2018d) Multi-turn response selection for chatbots with deep attention matching network. In: Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1118–1127

[591]

Zhu L, Yang Y (2020) Actbert: learning global-local video-text representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8746–8755

[592]

Zhu Y, Groth O, Bernstein MS, Fei-Fei L (2016) Visual7w: Grounded question answering in images. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, June 27–30, 2016. IEEE Computer Society, pp 4995–5004.

[593]

Zhu C, Zeng M, Huang X (2018a) Sdnet: Contextualized attention-based deep network for conversational question answering. arXiv:181203593

[594]

Zhu X, Li L, Liu J, Li Z, Peng H, and Niu X Image captioning with triple-attention and stack parallel lstm Neurocomputing 2018 319 55-65

[595]

Zhu Y, Ko T, Snyder D, Mak B, Povey D (2018c) Self-attentive speaker embeddings for text-independent speaker verification. In: Interspeech, pp 3573–3577

Cited By

Su L(2024)Study on an Intelligent English Translation Method Using an Improved Convolutional Neural Network ModelInternational Journal of e-Collaboration10.4018/IJeC.35755620:1(1-14)Online publication date: 7-Nov-2024
https://dl.acm.org/doi/10.4018/IJeC.357556
Hu ZRadmehr AZhang YPan SNguyen P(2024)IOTeethProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435168:1(1-29)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3643516
Wang JZhang D(2024)Intelligent pest forecasting with meteorological dataExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124137252:PAOnline publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.124137
Show More Cited By

Index Terms

Attention, please! A survey of neural attention models in deep learning
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Knowledge representation and reasoning
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Attention mechanism in neural networks: where it comes and where it goes
Abstract
A long time ago in the machine learning literature, the idea of incorporating a mechanism inspired by the human visual system into neural networks was introduced. This idea is named the attention mechanism, and it has gone through a long ...
An Attentive Survey of Attention Models
Attention Model has now become an important concept in neural networks that has been researched within diverse application domains. This survey provides a structured and comprehensive overview of the developments in modeling attention. In particular, we ...
Attention-based graph neural networks: a survey
Abstract
Graph neural networks (GNNs) aim to learn well-trained representations in a lower-dimension space for downstream tasks while preserving the topological structures. In recent years, attention mechanism, which is brilliant in the fields of natural ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Artificial Intelligence Review

Artificial Intelligence Review Volume 55, Issue 8

Dec 2022

783 pages

ISSN:0269-2821

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Nature B.V. 2022.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Su L(2024)Study on an Intelligent English Translation Method Using an Improved Convolutional Neural Network ModelInternational Journal of e-Collaboration10.4018/IJeC.35755620:1(1-14)Online publication date: 7-Nov-2024
https://dl.acm.org/doi/10.4018/IJeC.357556
Hu ZRadmehr AZhang YPan SNguyen P(2024)IOTeethProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435168:1(1-29)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3643516
Wang JZhang D(2024)Intelligent pest forecasting with meteorological dataExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124137252:PAOnline publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.124137
Yu FLi CShao Y(2024)DNTCEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107319127:PBOnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107319
Jafari MSadeghi DShoeibi AAlinejad-Rokny HBeheshti AGarcía DChen ZAcharya UGorriz J(2024)Empowering precision medicine: AI-driven schizophrenia diagnosis via EEG signals: A comprehensive review from 2002–2023Applied Intelligence10.1007/s10489-023-05155-654:1(35-79)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s10489-023-05155-6
Zeng JChen T(2023)Interactive Model and Application of Joint Knowledge Base Question Answering and Semantic MatchingWeb Information Systems and Applications10.1007/978-981-99-6222-8_18(206-217)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1007/978-981-99-6222-8_18
Xiang YYifan HXuzhou FJie GZhiqiang L(2023)An Industrial Defect Detection Network with Fine-Grained Supervision and Adaptive Contrast EnhancementAdvanced Intelligent Computing Technology and Applications10.1007/978-981-99-4761-4_16(181-192)Online publication date: 10-Aug-2023
https://dl.acm.org/doi/10.1007/978-981-99-4761-4_16
Olorunnimbe KViktor H(2022)Similarity Embedded Temporal Transformers: Enhancing Stock Predictions with Historically Similar TrendsFoundations of Intelligent Systems10.1007/978-3-031-16564-1_37(388-398)Online publication date: 3-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-16564-1_37

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents