research-article

Free access

Transformers aftermath: current research and rising trends

Authors:

Eduardo Souza Dos Reis,

Cristiano André Da Costa,

Diórgenes Eugênio Da Silveira,

Rodrigo Simon Bavaresco,

Rodrigo Da Rosa Righi,

Jorge Luis Victória Barbosa,

Rodolfo Stoffel Antunes,

Márcio Miguel Gomes,

Gustavo FederizziAuthors Info & Claims

Communications of the ACM, Volume 64, Issue 4

Pages 154 - 163

https://doi.org/10.1145/3430937

Published: 22 March 2021 Publication History

All formats PDF

Abstract

Attention, particularly self-attention, is a standard in current NLP literature, but to achieve meaningful models, attention is not enough.

References

[1]

Annervaz, K.M., Chowdhury, S.B.R. and Dukkipati, A. Learning beyond datasets: Knowledge graph augmented neural networks for natural language processing. NAACL-HLT, 2018.

Google Scholar

[2]

Bahdanau, D., Cho, K. and Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR abs/1409.0473, 2014.

Google Scholar

[3]

Brown, T.B.B. et al. Language models are few-shot learners. 2020; arXiv:2005.14165 (2020).

Google Scholar

[4]

Cho, K., van Merriënboer, B., Bahdanau, D., and Bengio, Y. On the properties of neural machine translation: Encoder--decoder approaches. In Proceedings of the 2014 Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, Doha, Qatar, 103--111

Crossref

Google Scholar

[5]

Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation; arXiv:1406.1078 (2014).

Google Scholar

[6]

Dai, Z., Yang, Z., Yang,Y., Carbonell, J.G., Le, Q.V., and Salakhutdinov, R. Transformer-XL: Attentive language models beyond a fixed-length context. ACL (2019).

Google Scholar

[7]

Das, R., Munkhdalai, T., Yuan, X., Trischler, A. and McCallum, A. Building dynamic knowledge graphs from text using machine reading comprehension; arXiv:1810.05682 (2018).

Google Scholar

[8]

Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J. and Kaiser, L. Universal transformers; arXiv:1807.03819 (2018).

Google Scholar

[9]

Devlin, J., Chang, M-W, Lee, K. and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conf. North American Chapter of the ACL: Human Language Technologies 1. Association for Computational Linguistics, Minneapolis, MN, 4171--4186

Crossref

Google Scholar

[10]

Gehring, J., Auli, M., Grangier, D., Yarats, D. and Dauphin, Y.N. Convolutional sequence to sequence learning. In Proceedings of the 34^th Intern. Conf. Machine Learning 70. JMLR. org, 2017, 1243--1252.

Digital Library

Google Scholar

[11]

Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. A survey of methods for explaining black box models. ACM Comput. Surv. 51, 5, Article 93 (Aug. 2018)

Digital Library

Google Scholar

[12]

Hinton, G., Vinyals, O. and Dean, J. Distilling the knowledge in a neural network. In Proceedings of the 2015 NIPS Deep Learning and Representation Learning Workshop.

Google Scholar

[13]

Jain, S. and Wallace, B.C. Attention is not explanation. NAACL-HLT, 2019.

Google Scholar

[14]

Kitchenham, B. and Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering. Technical Report. Keele University, 2007.

Google Scholar

[15]

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceeding of the Intern. Conf. Learning Representations. (2020)

Google Scholar

[16]

Liang, Y. et al. XGLUE: A new benchmark dataset for cross-lingual pre-training, understanding and generation. To be published; https://bit.ly/3m1OLW7

Google Scholar

[17]

Liu, X., He, P., Chen, W. and Gao, J. Improving multi-task deep neural networks via knowledge distillation for natural language understanding; arXiv:1904.09482 (2019).

Google Scholar

[18]

Liu, X., He, P., Chen, W. and Gao, J. Multi-task deep neural networks for natural language understanding. ACL.2019.

Google Scholar

[19]

Liu, Y., Che, W., Zhao, H., Qin, B. and Liu, T. Distilling knowledge for search-based structured prediction. In Proceedings of the 56^th Annual Meeting of the ACL 1. Association for Computational Linguistics, 2018, Melbourne, Australia, 1393--1402

Crossref

Google Scholar

[20]

Liu, Y. et al. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019).

Google Scholar

[21]

Lu, J., Batra, D., Parikh, D. and Lee, S. 2019. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks, 2019; arXiv:cs.CV/1908.02265

Google Scholar

[22]

Mikolov, T., Chen, K., Corrado, G.S. and Dean, J. Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013).

Google Scholar

[23]

Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. Deep contextualized word representations. In Proceedings of the 2018 Conf. North American Chapter of the ACL: Human Language Technologies 1. Association for Computational Linguistics, New Orleans, LA, 2227--2237

Crossref

Google Scholar

[24]

Radford, A., Narasimhan, K., Salimans, T and Sutskever, Improving language understanding by generative pre-training, 2018; https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/languageunderstandingpaper (2018).

Google Scholar

[25]

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskeve, I. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019).

Google Scholar

[26]

Rajani, N.F., McCann, B., Xiong, C. and Socher, R. Explain yourself! Leveraging language models for commonsense reasoning. ACL, 2019.

Crossref

Google Scholar

[27]

Sanh, V., Debut, L., Chaumond, J. and Wolf, T. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter; arXiv:1910.01108 (2019).

Google Scholar

[28]

Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C. and Socher, R. CTRL: A conditional transformer language model for controllable generation, 2019, arXiv:1909.05858.

Google Scholar

[29]

Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J. and Catanzaro, B. Megatron-LM: Training multi-billion parameter language models using model parallelism, 2019; arXiv:cs.CL/1909.08053

Google Scholar

[30]

Song, K., Tan, X., Qin, T., Lu, J. and Liu, T-Y. MASS: Masked sequence to sequence pre-training for language Ggeneration. ICML, 2019; https://bit.ly/3j90xMN

Google Scholar

[31]

Sutskever, I., Vinyals, O. and Le, Q.V. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 2014, 3104--3112.

Digital Library

Google Scholar

[32]

Tang, T., Lu, Y., Liu, L., Mou, L., Vechtomova, O. and Lin, J. Distilling task-specific knowledge from BERT into simple neural networks. arXiv:1903.12136 (2019).

Google Scholar

[33]

Vaswani, A. et al. Attention is all you need. Advances in Neural Information Processing Systems, 2017, 5998--6008.

Digital Library

Google Scholar

[34]

Voita, E., Talbot, D., Moiseev, F., Sennrich, R., and Titov, I. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. ACL, 2019.

Crossref

Google Scholar

[35]

Wang, A. et al . SuperGLUE: A stickier benchmark for general-purpose language understanding systems. CoRR abs/1905.00537 (2019). arXiv:1905.00537 http://arxiv.org/abs/1905.00537

Google Scholar

[36]

Wang, A., Singh, A., Michael, J., Hill, F. Levy, O. and Bowman, S.R. GLUE: A multi-task benchmark and analysis platform for natural language understanding. CoRR abs/1804.07461 (2018).

Google Scholar

[37]

Wu, Y et al. Google's neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016).

Google Scholar

[38]

Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R. and Le, Q.V. XLNet: Generalized autoregressive pretraining for language understanding. NeurIPS, 2019.

Google Scholar

[39]

You, Y. et al. Large batch optimization for deep learning: Training BERT in 76 minutes. In Proceedings of the 2019 Intern. Conf. Learning Representations.

Google Scholar

[40]

Yu, A.W. et al. Qanet: Combining local convolution with global self-attention for reading comprehension. arXiv:1804.09541 (2018).

Google Scholar

Cited By

View all

Da Costa LMelchiades MGirelli VColombelli FAraújo DRigo SRamos GDa Costa CRighi RBarbosa J(2024)Advancing Chatbot Conversations: A Review of Knowledge Update ApproachesJournal of the Brazilian Computer Society10.5753/jbcs.2024.288230:1(55-68)Online publication date: 25-Apr-2024
https://doi.org/10.5753/jbcs.2024.2882
Yuan YLi HSawaengdist A(2024)The impact of ChatGPT on learners in English academic writing: opportunities and challenges in educationLanguage Learning in Higher Education10.1515/cercles-2023-000614:1(41-56)Online publication date: 8-May-2024
https://doi.org/10.1515/cercles-2023-0006
Kumar P(2024)Large language models (LLMs): survey, technical frameworks, and future challengesArtificial Intelligence Review10.1007/s10462-024-10888-y57:10Online publication date: 18-Aug-2024
https://doi.org/10.1007/s10462-024-10888-y
Show More Cited By

Index Terms

Transformers aftermath: current research and rising trends
1. Software and its engineering
  1. Software notations and tools
    1. Context specific languages
      1. Domain specific languages
  2. Software organization and properties
    1. Software system structures
      1. Software architectures

Recommendations

Space or time for video classification transformers
Abstract
Spatial and temporal attention plays an important role in video classification tasks. However, there are few studies about the mechanism of spatial and temporal attention behind classification problems. Transformer owns excellent capabilities at ...
Mastering Bitcoin: The Secret Tips on Making Money Through Bitcoin Mining and Investment
Bitcoin for Dummies: Bitcoin Secrets and Tips that Will Change Your Life in Three Weeks

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Communications of the ACM Volume 64, Issue 4

April 2021

164 pages

ISSN:0001-0782

EISSN:1557-7317

DOI:10.1145/3458337

Editor:
Andrew A. Chien
Association for Computing Machinery, New York, NY

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2021

Published in CACM Volume 64, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Popular
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
2,628
Total Downloads

Downloads (Last 12 months)220
Downloads (Last 6 weeks)30

Reflects downloads up to 18 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Da Costa LMelchiades MGirelli VColombelli FAraújo DRigo SRamos GDa Costa CRighi RBarbosa J(2024)Advancing Chatbot Conversations: A Review of Knowledge Update ApproachesJournal of the Brazilian Computer Society10.5753/jbcs.2024.288230:1(55-68)Online publication date: 25-Apr-2024
https://doi.org/10.5753/jbcs.2024.2882
Yuan YLi HSawaengdist A(2024)The impact of ChatGPT on learners in English academic writing: opportunities and challenges in educationLanguage Learning in Higher Education10.1515/cercles-2023-000614:1(41-56)Online publication date: 8-May-2024
https://doi.org/10.1515/cercles-2023-0006
Kumar P(2024)Large language models (LLMs): survey, technical frameworks, and future challengesArtificial Intelligence Review10.1007/s10462-024-10888-y57:10Online publication date: 18-Aug-2024
https://doi.org/10.1007/s10462-024-10888-y
Rentz DHeckler WBarbosa J(2024)A computational model for assisting individuals with suicidal ideation based on context historiesUniversal Access in the Information Society10.1007/s10209-023-00991-223:3(1447-1466)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s10209-023-00991-2
Combs KLu HBihl T(2023)Transfer Learning and Analogical Inference: A Critical Comparison of Algorithms, Methods, and ApplicationsAlgorithms10.3390/a1603014616:3(146)Online publication date: 7-Mar-2023
https://doi.org/10.3390/a16030146
Shao CLi MLi GZhou MHan D(2022)CRSAtt: By Capturing Relational Span and Using Attention for Relation ClassificationApplied Sciences10.3390/app12211106812:21(11068)Online publication date: 1-Nov-2022
https://doi.org/10.3390/app122111068
Samant RBachute MGite SKotecha K(2022)Framework for Deep Learning-Based Language Models Using Multi-Task Learning in Natural Language Understanding: A Systematic Literature Review and Future DirectionsIEEE Access10.1109/ACCESS.2022.314979810(17078-17097)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3149798
Alshehri AYou F(2021)Paradigm Shift: The Promise of Deep Learning in Molecular Systems Engineering and DesignFrontiers in Chemical Engineering10.3389/fceng.2021.7007173Online publication date: 8-Jun-2021
https://doi.org/10.3389/fceng.2021.700717
Weichselbraun A(2021)Inscriptis - A Python-based HTML to text conversion library optimized for knowledge extraction from the WebJournal of Open Source Software10.21105/joss.035576:66(3557)Online publication date: Oct-2021
https://doi.org/10.21105/joss.03557
Liu XLu HNayak A(2021)A Spam Transformer Model for SMS Spam DetectionIEEE Access10.1109/ACCESS.2021.30814799(80253-80263)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3081479

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

Space or time for video classification transformers

Mastering Bitcoin: The Secret Tips on Making Money Through Bitcoin Mining and Investment

Bitcoin for Dummies: Bitcoin Secrets and Tips that Will Change Your Life in Three Weeks

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Digital Edition

Magazine Site

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations