Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Free access

Transformers aftermath: current research and rising trends

Published: 22 March 2021 Publication History

Abstract

Attention, particularly self-attention, is a standard in current NLP literature, but to achieve meaningful models, attention is not enough.

References

[1]
Annervaz, K.M., Chowdhury, S.B.R. and Dukkipati, A. Learning beyond datasets: Knowledge graph augmented neural networks for natural language processing. NAACL-HLT, 2018.
[2]
Bahdanau, D., Cho, K. and Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR abs/1409.0473, 2014.
[3]
Brown, T.B.B. et al. Language models are few-shot learners. 2020; arXiv:2005.14165 (2020).
[4]
Cho, K., van Merriënboer, B., Bahdanau, D., and Bengio, Y. On the properties of neural machine translation: Encoder--decoder approaches. In Proceedings of the 2014 Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, Doha, Qatar, 103--111
[5]
Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation; arXiv:1406.1078 (2014).
[6]
Dai, Z., Yang, Z., Yang,Y., Carbonell, J.G., Le, Q.V., and Salakhutdinov, R. Transformer-XL: Attentive language models beyond a fixed-length context. ACL (2019).
[7]
Das, R., Munkhdalai, T., Yuan, X., Trischler, A. and McCallum, A. Building dynamic knowledge graphs from text using machine reading comprehension; arXiv:1810.05682 (2018).
[8]
Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J. and Kaiser, L. Universal transformers; arXiv:1807.03819 (2018).
[9]
Devlin, J., Chang, M-W, Lee, K. and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conf. North American Chapter of the ACL: Human Language Technologies 1. Association for Computational Linguistics, Minneapolis, MN, 4171--4186
[10]
Gehring, J., Auli, M., Grangier, D., Yarats, D. and Dauphin, Y.N. Convolutional sequence to sequence learning. In Proceedings of the 34th Intern. Conf. Machine Learning 70. JMLR. org, 2017, 1243--1252.
[11]
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. A survey of methods for explaining black box models. ACM Comput. Surv. 51, 5, Article 93 (Aug. 2018)
[12]
Hinton, G., Vinyals, O. and Dean, J. Distilling the knowledge in a neural network. In Proceedings of the 2015 NIPS Deep Learning and Representation Learning Workshop.
[13]
Jain, S. and Wallace, B.C. Attention is not explanation. NAACL-HLT, 2019.
[14]
Kitchenham, B. and Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering. Technical Report. Keele University, 2007.
[15]
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceeding of the Intern. Conf. Learning Representations. (2020)
[16]
Liang, Y. et al. XGLUE: A new benchmark dataset for cross-lingual pre-training, understanding and generation. To be published; https://bit.ly/3m1OLW7
[17]
Liu, X., He, P., Chen, W. and Gao, J. Improving multi-task deep neural networks via knowledge distillation for natural language understanding; arXiv:1904.09482 (2019).
[18]
Liu, X., He, P., Chen, W. and Gao, J. Multi-task deep neural networks for natural language understanding. ACL.2019.
[19]
Liu, Y., Che, W., Zhao, H., Qin, B. and Liu, T. Distilling knowledge for search-based structured prediction. In Proceedings of the 56th Annual Meeting of the ACL 1. Association for Computational Linguistics, 2018, Melbourne, Australia, 1393--1402
[20]
Liu, Y. et al. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019).
[21]
Lu, J., Batra, D., Parikh, D. and Lee, S. 2019. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks, 2019; arXiv:cs.CV/1908.02265
[22]
Mikolov, T., Chen, K., Corrado, G.S. and Dean, J. Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013).
[23]
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. Deep contextualized word representations. In Proceedings of the 2018 Conf. North American Chapter of the ACL: Human Language Technologies 1. Association for Computational Linguistics, New Orleans, LA, 2227--2237
[24]
Radford, A., Narasimhan, K., Salimans, T and Sutskever, Improving language understanding by generative pre-training, 2018; https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/languageunderstandingpaper (2018).
[25]
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskeve, I. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019).
[26]
Rajani, N.F., McCann, B., Xiong, C. and Socher, R. Explain yourself! Leveraging language models for commonsense reasoning. ACL, 2019.
[27]
Sanh, V., Debut, L., Chaumond, J. and Wolf, T. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter; arXiv:1910.01108 (2019).
[28]
Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C. and Socher, R. CTRL: A conditional transformer language model for controllable generation, 2019, arXiv:1909.05858.
[29]
Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J. and Catanzaro, B. Megatron-LM: Training multi-billion parameter language models using model parallelism, 2019; arXiv:cs.CL/1909.08053
[30]
Song, K., Tan, X., Qin, T., Lu, J. and Liu, T-Y. MASS: Masked sequence to sequence pre-training for language Ggeneration. ICML, 2019; https://bit.ly/3j90xMN
[31]
Sutskever, I., Vinyals, O. and Le, Q.V. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 2014, 3104--3112.
[32]
Tang, T., Lu, Y., Liu, L., Mou, L., Vechtomova, O. and Lin, J. Distilling task-specific knowledge from BERT into simple neural networks. arXiv:1903.12136 (2019).
[33]
Vaswani, A. et al. Attention is all you need. Advances in Neural Information Processing Systems, 2017, 5998--6008.
[34]
Voita, E., Talbot, D., Moiseev, F., Sennrich, R., and Titov, I. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. ACL, 2019.
[35]
Wang, A. et al . SuperGLUE: A stickier benchmark for general-purpose language understanding systems. CoRR abs/1905.00537 (2019). arXiv:1905.00537 http://arxiv.org/abs/1905.00537
[36]
Wang, A., Singh, A., Michael, J., Hill, F. Levy, O. and Bowman, S.R. GLUE: A multi-task benchmark and analysis platform for natural language understanding. CoRR abs/1804.07461 (2018).
[37]
Wu, Y et al. Google's neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016).
[38]
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R. and Le, Q.V. XLNet: Generalized autoregressive pretraining for language understanding. NeurIPS, 2019.
[39]
You, Y. et al. Large batch optimization for deep learning: Training BERT in 76 minutes. In Proceedings of the 2019 Intern. Conf. Learning Representations.
[40]
Yu, A.W. et al. Qanet: Combining local convolution with global self-attention for reading comprehension. arXiv:1804.09541 (2018).

Cited By

View all
  • (2024)Advancing Chatbot Conversations: A Review of Knowledge Update ApproachesJournal of the Brazilian Computer Society10.5753/jbcs.2024.288230:1(55-68)Online publication date: 25-Apr-2024
  • (2024)The impact of ChatGPT on learners in English academic writing: opportunities and challenges in educationLanguage Learning in Higher Education10.1515/cercles-2023-000614:1(41-56)Online publication date: 8-May-2024
  • (2024)Large language models (LLMs): survey, technical frameworks, and future challengesArtificial Intelligence Review10.1007/s10462-024-10888-y57:10Online publication date: 18-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 64, Issue 4
April 2021
164 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3458337
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2021
Published in CACM Volume 64, Issue 4

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)220
  • Downloads (Last 6 weeks)30
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Advancing Chatbot Conversations: A Review of Knowledge Update ApproachesJournal of the Brazilian Computer Society10.5753/jbcs.2024.288230:1(55-68)Online publication date: 25-Apr-2024
  • (2024)The impact of ChatGPT on learners in English academic writing: opportunities and challenges in educationLanguage Learning in Higher Education10.1515/cercles-2023-000614:1(41-56)Online publication date: 8-May-2024
  • (2024)Large language models (LLMs): survey, technical frameworks, and future challengesArtificial Intelligence Review10.1007/s10462-024-10888-y57:10Online publication date: 18-Aug-2024
  • (2024)A computational model for assisting individuals with suicidal ideation based on context historiesUniversal Access in the Information Society10.1007/s10209-023-00991-223:3(1447-1466)Online publication date: 1-Aug-2024
  • (2023)Transfer Learning and Analogical Inference: A Critical Comparison of Algorithms, Methods, and ApplicationsAlgorithms10.3390/a1603014616:3(146)Online publication date: 7-Mar-2023
  • (2022)CRSAtt: By Capturing Relational Span and Using Attention for Relation ClassificationApplied Sciences10.3390/app12211106812:21(11068)Online publication date: 1-Nov-2022
  • (2022)Framework for Deep Learning-Based Language Models Using Multi-Task Learning in Natural Language Understanding: A Systematic Literature Review and Future DirectionsIEEE Access10.1109/ACCESS.2022.314979810(17078-17097)Online publication date: 2022
  • (2021)Paradigm Shift: The Promise of Deep Learning in Molecular Systems Engineering and DesignFrontiers in Chemical Engineering10.3389/fceng.2021.7007173Online publication date: 8-Jun-2021
  • (2021)Inscriptis - A Python-based HTML to text conversion library optimized for knowledge extraction from the WebJournal of Open Source Software10.21105/joss.035576:66(3557)Online publication date: Oct-2021
  • (2021)A Spam Transformer Model for SMS Spam DetectionIEEE Access10.1109/ACCESS.2021.30814799(80253-80263)Online publication date: 2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media