Abstract
Author profiling classifies author characteristics by analyzing how language is shared among people. In this work, we study that task from a low-resource viewpoint: using little or no training data. We explore different zero and few-shot models based on entailment and evaluate our systems on several profiling tasks in Spanish and English. In addition, we study the effect of both the entailment hypothesis and the size of the few-shot training sample. We find that entailment-based models out-perform supervised text classifiers based on roberta-XLM and that we can reach 80% of the accuracy of previous approaches using less than 50% of the training data on average.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
According to https://app.dimensions.ai/, the term “Author profiling” tripled its frequency in research publications in the last ten years.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
The accuracy of the eRisk SOTA system corresponds to the DCHR metric used in the shared task.
References
Argamon, S., Koppel, M., Fine, J., et al.: Gender, genre, and writing style in formal written texts. In: TEXT, pp. 321–346 (2003)
Bobicev, V., Sokolova, M.: Inter-annotator agreement in sentiment analysis: machine learning perspective. In: RANLP, pp. 97–102 (2017)
Bowman, S.R., Angeli, G., Potts, C., et al.: A large annotated corpus for learning natural language inference. In: EMNLP, pp. 632–642 (2015)
Brown, T.B., Mann, B., Ryder, N., et al.: Language models are few-shot learners. In: Advances in NIPS, pp. 1877–1901 (2020)
Burger, J.D., Henderson, J., Kim, G., et al.: Discriminating gender on Twitter. In: EMNLP, pp. 1301–1309 (2011)
Chang, M.W., Ratinov, L.A., Roth, D., et al.: Importance of semantic representation: dataless classification. In: AAAI, pp. 830–835 (2008)
Chu, Z., Stratos, K., Gimpel, K.: Unsupervised label refinement improves dataless text classification. arXiv (2020)
Dagan, I., Glickman, O., Magnini, B.: The Pascal recognising textual entailment challenge. In: Machine Learning Challenges Workshop, pp. 177–190 (2006)
Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)
Franco-Salvador, M., Rangel, F., Rosso, P., et al.: Language variety identification using distributed representations of words and documents. In: CLEF, pp. 28–40 (2015)
Ghanem, B., Rosso, P., Rangel, F.: An emotional analysis of false information in social media and news articles. ACM Trans. Internet Technol. 20, 1–18 (2020)
Halder, K., Akbik, A., Krapac, J., et al.: Task-aware representation of sentences for generic text classification. In: COLING, pp. 3202–3213 (2020)
Henderson, M.L., Al-Rfou, R., Strope, B., et al.: Efficient natural language response suggestion for smart reply. arXiv (2017)
Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Linguist. Comput. 17, 401–412 (2004)
Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: National Conference on Artificial Intelligence, pp. 646–651 (2008)
Liu, Y., Ott, M., Goyal, N., et al.: Roberta: a robustly optimized BERT pretraining approach. arXiv (2019)
Lopez-Monroy, A.P., Montes-Y-Gomez, M., Escalante, H.J., et al.: INAOE’s participation at PAN’13: author profiling task. In: CLEF (2013)
Müller, T., Pérez-Torró, G., Franco-Salvador, M.: Few-shot learning with siamese networks and label tuning. In: ACL (2022, to appear)
Nie, Y., Williams, A., Dinan, E., et al.: Adversarial NLI: a new benchmark for natural language understanding. arXiv (2019)
Parapar, J., Martín-Rodilla, P., Losada, D.E., et al.: Overview of eRisk at CLEF 2021: early risk prediction on the Internet. In: CLEF (2021)
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological aspects of natural language use: our words, our selves. Ann. Rev. Psychol. 54, 547–577 (2003)
Rangel, F., Giachanou, A., Ghanem, B.H.H., et al.: Overview of the 8th author profiling task at pan 2020: profiling fake news spreaders on Twitter. In: CEUR, pp. 1–18 (2020)
Rangel, F., Rosso, P.: On the impact of emotions on author profiling. Inf. Process. Manag. 52, 73–92 (2016)
Rangel, F., Rosso, P.: Overview of the 7th author profiling task at pan 2019: bots and gender profiling in Twitter. In: CEUR, pp. 1–36 (2019)
Rangel, F., Rosso, P., Franco-Salvador, M.: A low dimensionality representation for language variety identification. In: CICLING, pp. 156–169 (2016)
Rangel, F., Rosso, P., Koppel, M., et al.: Overview of the author profiling task at pan 2013. In: CLEF, pp. 352–365 (2013)
Rangel, F., Rosso, P., Potthast, M., et al.: Overview of the 3rd author profiling task at pan 2015. In: CLEF (2015)
Rangel, F., Rosso, P., Potthast, M., et al.: Overview of the 5th author profiling task at pan 2017: gender and language variety identification in Twitter. In: CLEF, pp. 1613–0073 (2017)
Rangel, F., Sarracén, G., Chulvi, B., et al.: Profiling hate speech spreaders on Twitter task at pan 2021. In: CLEF (2021)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese Bert-networks. In: EMNLP (2019)
Schick, T., Schütze, H.: Exploiting cloze-questions for few-shot text classification and natural language inference. In: EACL, pp. 255–269 (2021)
Troiano, E., Padó, S., Klinger, R.: Emotion ratings: how intensity, annotation confidence and agreements are entangled. In: Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 40–49 (2021)
Wang, S., Fang, H., Khabsa, M., et al.: Entailment as few-shot learner. arXiv (2021)
Wolf, T., Debut, L., Sanh, V., et al.: Transformers: state-of-the-art natural language processing. In: EMNLP (2020)
Yin, W., Hay, J., Roth, D.: Benchmarking zero-shot text classification: datasets, evaluation and entailment approach. In: EMNLP, pp. 3914–3923 (2019)
Yin, W., Rajani, N.F., Radev, D., et al.: Universal natural language processing with limited annotations: try few-shot textual entailment as a start. In: EMNLP, pp. 8229–8239 (2020)
Acknowledgments
The authors gratefully acknowledge the support of the Pro\(^2\)Haters - Proactive Profiling of Hate Speech Spreaders (CDTi IDI-20210776), XAI-DisInfodemics: eXplainable AI for disinformation and conspiracy detection during infodemics (MICIN PLEC2021-007681), and DETEMP - Early Detection of Depression Detection in Social Media (IVACE IMINOD/2021/72) R&D grants.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
7 Appendix
7 Appendix
The repository at https://tinyurl.com/ZSandFS-author-profiling contains experimental details and results.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chinea-Rios, M., Müller, T., De la Peña Sarracén, G.L., Rangel, F., Franco-Salvador, M. (2022). Zero and Few-Shot Learning for Author Profiling. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham. https://doi.org/10.1007/978-3-031-08473-7_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-08473-7_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08472-0
Online ISBN: 978-3-031-08473-7
eBook Packages: Computer ScienceComputer Science (R0)