Nothing Special   »   [go: up one dir, main page]

Skip to main content

Zero and Few-Shot Learning for Author Profiling

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2022)

Abstract

Author profiling classifies author characteristics by analyzing how language is shared among people. In this work, we study that task from a low-resource viewpoint: using little or no training data. We explore different zero and few-shot models based on entailment and evaluate our systems on several profiling tasks in Spanish and English. In addition, we study the effect of both the entailment hypothesis and the size of the few-shot training sample. We find that entailment-based models out-perform supervised text classifiers based on roberta-XLM and that we can reach 80% of the accuracy of previous approaches using less than 50% of the training data on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    According to https://app.dimensions.ai/, the term “Author profiling” tripled its frequency in research publications in the last ten years.

  2. 2.

    https://pan.webis.de/.

  3. 3.

    https://tinyurl.com/st-agg-cluster.

  4. 4.

    https://tinyurl.com/bart-large-snli-mnli.

  5. 5.

    https://tinyurl.com/xlm-roberta-large-xnli-anli.

  6. 6.

    https://tinyurl.com/paraphrase-multilingual-mpnet.

  7. 7.

    The standard deviation results of Table 5 are omitted due to space. However, this does not affect our analysis. Omitted values can be found in Appendix (Sect. 7).

  8. 8.

    The accuracy of the eRisk SOTA system corresponds to the DCHR metric used in the shared task.

References

  1. Argamon, S., Koppel, M., Fine, J., et al.: Gender, genre, and writing style in formal written texts. In: TEXT, pp. 321–346 (2003)

    Google Scholar 

  2. Bobicev, V., Sokolova, M.: Inter-annotator agreement in sentiment analysis: machine learning perspective. In: RANLP, pp. 97–102 (2017)

    Google Scholar 

  3. Bowman, S.R., Angeli, G., Potts, C., et al.: A large annotated corpus for learning natural language inference. In: EMNLP, pp. 632–642 (2015)

    Google Scholar 

  4. Brown, T.B., Mann, B., Ryder, N., et al.: Language models are few-shot learners. In: Advances in NIPS, pp. 1877–1901 (2020)

    Google Scholar 

  5. Burger, J.D., Henderson, J., Kim, G., et al.: Discriminating gender on Twitter. In: EMNLP, pp. 1301–1309 (2011)

    Google Scholar 

  6. Chang, M.W., Ratinov, L.A., Roth, D., et al.: Importance of semantic representation: dataless classification. In: AAAI, pp. 830–835 (2008)

    Google Scholar 

  7. Chu, Z., Stratos, K., Gimpel, K.: Unsupervised label refinement improves dataless text classification. arXiv (2020)

    Google Scholar 

  8. Dagan, I., Glickman, O., Magnini, B.: The Pascal recognising textual entailment challenge. In: Machine Learning Challenges Workshop, pp. 177–190 (2006)

    Google Scholar 

  9. Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)

    Google Scholar 

  10. Franco-Salvador, M., Rangel, F., Rosso, P., et al.: Language variety identification using distributed representations of words and documents. In: CLEF, pp. 28–40 (2015)

    Google Scholar 

  11. Ghanem, B., Rosso, P., Rangel, F.: An emotional analysis of false information in social media and news articles. ACM Trans. Internet Technol. 20, 1–18 (2020)

    Google Scholar 

  12. Halder, K., Akbik, A., Krapac, J., et al.: Task-aware representation of sentences for generic text classification. In: COLING, pp. 3202–3213 (2020)

    Google Scholar 

  13. Henderson, M.L., Al-Rfou, R., Strope, B., et al.: Efficient natural language response suggestion for smart reply. arXiv (2017)

    Google Scholar 

  14. Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Linguist. Comput. 17, 401–412 (2004)

    Article  Google Scholar 

  15. Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: National Conference on Artificial Intelligence, pp. 646–651 (2008)

    Google Scholar 

  16. Liu, Y., Ott, M., Goyal, N., et al.: Roberta: a robustly optimized BERT pretraining approach. arXiv (2019)

    Google Scholar 

  17. Lopez-Monroy, A.P., Montes-Y-Gomez, M., Escalante, H.J., et al.: INAOE’s participation at PAN’13: author profiling task. In: CLEF (2013)

    Google Scholar 

  18. Müller, T., Pérez-Torró, G., Franco-Salvador, M.: Few-shot learning with siamese networks and label tuning. In: ACL (2022, to appear)

    Google Scholar 

  19. Nie, Y., Williams, A., Dinan, E., et al.: Adversarial NLI: a new benchmark for natural language understanding. arXiv (2019)

    Google Scholar 

  20. Parapar, J., Martín-Rodilla, P., Losada, D.E., et al.: Overview of eRisk at CLEF 2021: early risk prediction on the Internet. In: CLEF (2021)

    Google Scholar 

  21. Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  22. Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological aspects of natural language use: our words, our selves. Ann. Rev. Psychol. 54, 547–577 (2003)

    Article  Google Scholar 

  23. Rangel, F., Giachanou, A., Ghanem, B.H.H., et al.: Overview of the 8th author profiling task at pan 2020: profiling fake news spreaders on Twitter. In: CEUR, pp. 1–18 (2020)

    Google Scholar 

  24. Rangel, F., Rosso, P.: On the impact of emotions on author profiling. Inf. Process. Manag. 52, 73–92 (2016)

    Article  Google Scholar 

  25. Rangel, F., Rosso, P.: Overview of the 7th author profiling task at pan 2019: bots and gender profiling in Twitter. In: CEUR, pp. 1–36 (2019)

    Google Scholar 

  26. Rangel, F., Rosso, P., Franco-Salvador, M.: A low dimensionality representation for language variety identification. In: CICLING, pp. 156–169 (2016)

    Google Scholar 

  27. Rangel, F., Rosso, P., Koppel, M., et al.: Overview of the author profiling task at pan 2013. In: CLEF, pp. 352–365 (2013)

    Google Scholar 

  28. Rangel, F., Rosso, P., Potthast, M., et al.: Overview of the 3rd author profiling task at pan 2015. In: CLEF (2015)

    Google Scholar 

  29. Rangel, F., Rosso, P., Potthast, M., et al.: Overview of the 5th author profiling task at pan 2017: gender and language variety identification in Twitter. In: CLEF, pp. 1613–0073 (2017)

    Google Scholar 

  30. Rangel, F., Sarracén, G., Chulvi, B., et al.: Profiling hate speech spreaders on Twitter task at pan 2021. In: CLEF (2021)

    Google Scholar 

  31. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese Bert-networks. In: EMNLP (2019)

    Google Scholar 

  32. Schick, T., Schütze, H.: Exploiting cloze-questions for few-shot text classification and natural language inference. In: EACL, pp. 255–269 (2021)

    Google Scholar 

  33. Troiano, E., Padó, S., Klinger, R.: Emotion ratings: how intensity, annotation confidence and agreements are entangled. In: Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 40–49 (2021)

    Google Scholar 

  34. Wang, S., Fang, H., Khabsa, M., et al.: Entailment as few-shot learner. arXiv (2021)

    Google Scholar 

  35. Wolf, T., Debut, L., Sanh, V., et al.: Transformers: state-of-the-art natural language processing. In: EMNLP (2020)

    Google Scholar 

  36. Yin, W., Hay, J., Roth, D.: Benchmarking zero-shot text classification: datasets, evaluation and entailment approach. In: EMNLP, pp. 3914–3923 (2019)

    Google Scholar 

  37. Yin, W., Rajani, N.F., Radev, D., et al.: Universal natural language processing with limited annotations: try few-shot textual entailment as a start. In: EMNLP, pp. 8229–8239 (2020)

    Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge the support of the Pro\(^2\)Haters - Proactive Profiling of Hate Speech Spreaders (CDTi IDI-20210776), XAI-DisInfodemics: eXplainable AI for disinformation and conspiracy detection during infodemics (MICIN PLEC2021-007681), and DETEMP - Early Detection of Depression Detection in Social Media (IVACE IMINOD/2021/72) R&D grants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Franco-Salvador .

Editor information

Editors and Affiliations

7 Appendix

7 Appendix

The repository at https://tinyurl.com/ZSandFS-author-profiling contains experimental details and results.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chinea-Rios, M., Müller, T., De la Peña Sarracén, G.L., Rangel, F., Franco-Salvador, M. (2022). Zero and Few-Shot Learning for Author Profiling. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham. https://doi.org/10.1007/978-3-031-08473-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08473-7_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08472-0

  • Online ISBN: 978-3-031-08473-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics