Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre

  • Project Notes
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Software tools are of vital importance in corpus-based research, but they can also lead to restrictions on the type of supported corpora and the range of analyses that can be performed. For example, corpus analysis tools, as general purpose software, do not include specific features to process corpora of theatre plays. This situation is even worse for parallel corpora of theatrical texts, in that there is currently a lack of software that allows for both the alignment and analysis of parallel corpora here. In this contribution, we will first outline the peculiarities of theatre texts and suggest three software features to address them: annotation of the structural units of plays, alignment at the utterance level, and concordances and statistics using the annotated units. Second, we will present the specific functionalities of TAligner and ACM to build and analyse parallel corpora of play texts, showing how new avenues of research are opening up with the development of these tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Data availability

Not applicable.

Code availability

Software application.

Notes

  1. Following Xiao and Yue (2004, p. 240), parallel corpora are understood as a set of source texts aligned with their translations. From the perspective of corpus tools, parallel corpora are more complex than monolingual or comparable ones, since issues such as alignment need to be considered (Sanjurjo-González, 2018, p. 25). Although the focus of this contribution is on parallel corpora, the adjustments of tools for structural annotation and specific analytical functions may also be useful for monolingual corpora.

  2. TAligner can be accessed at https://addi.ehu.es/handle/10810/42445.

  3. We might mention that, apart from theatre plays, film and TV scripts also belong to the dramatic field (Esslin, 1990, p. 31) and share these structural peculiarities. Therefore, the analysis of these text types could also benefit from the advances in analytical tools suggested here.

  4. Theatre-specific features are marked with asterisks.

  5. http://wp.lancs.ac.uk/shakespearelang/project-resources/data/.

  6. http://wp.lancs.ac.uk/shakespearelang/files/2019/08/ESC-First-Folio-Plus-Manual32483.pdf.

  7. http://wp.lancs.ac.uk/shakespearelang/files/2019/08/ESC-Comparative-Plays-Corpus-Manual32481.pdf.

  8. https://lexically.net/wordsmith/support/shakespeare.html.

  9. https://dracor.org/.

  10. http://corpora.lancs.ac.uk/esc-user-service/.

  11. https://ezlinavis.dracor.org/.

  12. Evert (2014) points out that “cwb-align isn't a particularly sophisticated sentence aligner, so it's likely to get some cases wrong”.

  13. Line breaks might be useful for alignment; however, utterances can have more than one paragraph. In addition, the use of line breaks in plays may be inconsistent. Moreover, features of AntPConc are far from those of the tool’s monolingual version. AntPConc can be considered as a simple parallel concordance.

  14. The application offers frequency lists, but so far they take texts as wholes (Sanz-Villar & Andaluz-Pinedo, 2021).

  15. ACTRES Corpus Manager pending register.

  16. https://doi.org/10.5281/zenodo.1212303.

  17. A recently developed tool for custom annotation, OpenTagger (Sanjurjo-González & Andaluz-Pinedo, 2020), is planned to be integrated into ACM-theatre in the future.

References

  • Anthony, L. (2013). A critical look at software tools in corpus linguistics. Linguistic Research. https://doi.org/10.17250/khisli.30.2.201308.001

  • Anthony, L. (2014). AntPConc (Versión 1.1.0) [Software]. Tokio: Waseda University. Retrieved April 3, 2020, from http://www.laurenceanthony.net/

  • Archer, D., Wilson, A. & Rayson, P. (2002). Introduction to the USAS category system. Benedict project report. http://ucrel.lancs.ac.uk/usas/usas%20guide.pdf

  • Arrula, G. (2018). Autoitzulpenaren teoria eta praktika Euskal Herrian / Theory and practice of self-translation in the Basque Country (Doctoral dissertation). Bilbao: Universidad del País Vasco. Retrieved April 3, 2020, from http://hdl.handle.net/10810/27983

  • Bandín, E. (2007) Traducción, recepción y censura de teatro clásico inglés en la España de Franco. Estudio descriptivo-comparativo del Corpus TRACEtci (1939–1985) (Doctoral dissertation). Universidad de León. Retrieved April 3, 2020, from https://buleria.unileon.es/handle/10612/1885

  • Culpeper, J. (2014). Keywords and Characterization. An Analysis of Six Characters in Romeo and Juliet. In D. L. Hoover, J. Culpeper, & K. O’Halloran (Eds.), Digital literary studies. Corpus approaches to poetry, prose and drama (pp. 9–33). Routledge.

  • Doval, I., & Sánchez-Nieto, T. (2019). Parallel corpora in focus: An account of current achievements and challenges. In I. Doval & M. T. Sánchez Nieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 1–15). John Benjamins.

    Chapter  Google Scholar 

  • Esslin, M. (1990). The Field of Drama. Methuen.

    Google Scholar 

  • Evert, S., & Hardie, A. (2011). Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. In Proceedings of the Corpus Linguistics Conference 2011. University of Birmingham.

  • Evert, S. (2014). [CWB] A question about the aligning using cwb-encoding (CWB mailing list). Retrieved April 3, 2020, from http://liste.sslmit.unibo.it/pipermail/cwb/2014-January/001529.html

  • Evert, S. (2020). The IMS Open Corpus Workbench (CWB) CQP Query Language Tutorial. Retrieved September 2, 2020, from http://cwb.sourceforge.net/files/CQP_Tutorial.pdf

  • Gutiérrez-Lanza, C., Bandín, E., García-González, J. E., & Lobejón-Santos, S. (2015). Desarrollo de software de etiquetado y alineación textual: TRACE Corpus Tagger/Aligner 1.0©. Paper presented at the II Congreso Internacional de Humanidades Digitales Hispánicas: Innovación, globalización e impacto, Madrid, Spain. Retrieved April 3, 2020, from http://hdh2015.linhd.es/ebook/hdh15-gutierrezlanza.xhtml

  • Hardie, A. (2012). CQPweb—Combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics. https://doi.org/10.1075/ijcl.17.3.04har

    Article  Google Scholar 

  • Johansson, S., & Hofland, K. (1994). Towards an English-Norwegian parallel corpus. In U. Fries, G. Tottie, & P. Schneider (Eds.), Creating and Using English Language Corpora (pp. 25–37). Rodopi.

    Google Scholar 

  • Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., & Suchomel, V. (2014). The sketch engine: Ten years on. Lexicography. https://doi.org/10.1007/s40607-014-0009-9

    Article  Google Scholar 

  • Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. Proceedings of Machine Translation Summit, X(5), 79–86.

    Google Scholar 

  • Lavid, J. (2019). Discourse annotation in the MULTINOT corpus: Issues and challenges. In I. Doval & M. T. Sánchez Nieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 159–182). John Benjamins.

    Google Scholar 

  • Marco, J. (2019). Living with parallel corpora: The potentials and limitations of their use in translation research. In I. Doval & M. T. Sánchez Nieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 39–56). John Benjamins.

    Google Scholar 

  • McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge University Press.

    Google Scholar 

  • Merino-Álvarez, R. (2007). La homosexualidad censurada: estudio sobre corpus de teatro TRACEti inglés-español (desde 1960). R. Merino-Álvarez (Ed.), Traducción y censura en España (1939–1985). Estudios sobre corpus TRACE: cine, narrativa, teatro. Universidad de León/Universidad del País Vasco.

  • Merino-Álvarez, R. (1992). Rewriting for the Spanish stage. KOINÉ. Annali della Scuola Superiore per Interpreti e Traduttori San Pellegrino, 2(1–2), 283–289.

    Google Scholar 

  • Merino-Álvarez, R. (1994). Traducción, tradición y manipulación Teatro inglés en España 1950–1990. Universidad de León/Universidad del País Vasco.

    Google Scholar 

  • Merino-Álvarez, R., & Andaluz-Pinedo, O. (2017). Peter Shaffer en la cultura española. Creneida. Anuario De Literaturas Hispánicas, 5, 239–278.

    Article  Google Scholar 

  • Miller, A. (1955). The Crucible. Penguin Books.

    Google Scholar 

  • Molés-Casés, T., & Oster, U. (2019). Indexation and analysis of a parallel corpus using CQPweb: The COVALT PAR_ES Corpus (EN/FR/DE > ES). In I. Doval & M. T. Sánchez Nieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 197–214). John Benjamins.

    Google Scholar 

  • Oksefjell, S. (1999). A description of the English-Norwegian parallel corpus. International Journal of Corpus Linguistics, 4(2), 197–219.

    Article  Google Scholar 

  • Pérez, M. (2004). Traducciones censuradas de teatro norteamericano en la España de Franco (1939–1963) (Doctoral dissertation). Universidad del País Vasco.

    Google Scholar 

  • Rafalovitch, A., & Dale, R. (2009). United Nations general assembly resolutions: A six-language parallel corpus. MT Summit XII (pp. 292–299). AMTA.

    Google Scholar 

  • Sanjurjo-González, H. (2017b). Creación de un framework para el tratamiento de corpus lingüísticos - Development of a framework for corpus linguistic (Doctoral dissertation). Universidad de León, León, Spain. Retrieved April 3, 2020, from https://buleria.unileon.es/handle/10612/6920

  • Sanjurjo-González, H. (2017a). ACTRES Corpus Manager. [Computer software].

  • Sanjurjo-González, H., & Andaluz-Pinedo, O. (2020). OpenTagger: A flexible and user-friendly linguistic tagger. 56th Linguistics Colloquium. http://hdl.handle.net/10810/48683

  • Sanjurjo-González, H. (2018). Creación de un framework para el tratamiento de corpus lingüísticos (Development of a framework for corpus linguistic análisis). Universidad de León, Área de Publicaciones.

    Google Scholar 

  • Sanjurjo-González, H., & Izquierdo, M. (2019). P-ACTRES 2.0: a parallel corpus for cross-linguistic research. In I. Doval & M. T. SánchezNieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 215–232). John Benjamins.

    Google Scholar 

  • Sanz-Villar, Z. (2015). Unitate fraseologikoen itzulpena: alemana-euskara. Literatur testuen corpusean oinarritutako analisia (Doctoral dissertation). University of the Basque Country UPV/EHU. Retrieved April 3, 2020, from http://hdl.handle.net/10810/15128

  • Sanz-Villar, Z. (2019). An overview of basque corpora and the extraction of certain multi-word expressions from a translational corpus. In I. Doval & M. T. SánchezNieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 233–247). John Benjamins.

    Google Scholar 

  • Sanz-Villar, Z., & Andaluz-Pinedo, O. (2021). TAligner 3.0: a tool to create parallel and multilingual corpora. In J. Lavid, C. Maíz-Arévalo, & J. R. Zamorano-Mansilla (Eds.), Corpora in translation research: recent advances and applications (pp. 126–146). John Benjamins.

    Google Scholar 

  • Scott, M. (2012). WordSmith Tools (Versión 6) [Software]. Stroud: Lexical Analysis Software. Retrieved April 3, 2020, from http://www.lexically.net/wordsmith/

  • Stührenberg, M. (2012). The TEI and current standards for structuring linguistic data. An overview. Journal of the Text Encoding Initiative. https://doi.org/10.4000/jtei.523

    Article  Google Scholar 

  • TEI Consortium (2019). Performance Texts. TEI P5: Guidelines for Electronic Text Encoding and Interchange (pp. 234–259). https://tei-c.org/release/doc/tei-p5-doc/en/Guidelines.pdf

  • Xiao, R., & Yue, M. (2009). Using corpora in translation studies: The state of the art. In P. Baker (Ed.), Contemporary Corpus Linguistics (pp. 237–261). Continuum.

  • Zeldes, A., Lüdeling, A., Julia, R., & Chiarcos, C. (2009). ANNIS: a search tool for multi-layer annotated corpora. In M. Mahlberg, V. González Díaz & C. Smith (Eds.), Proceedings of the Corpus Linguistics Conference 2009 (pp. 358–362). University of Liverpool.

  • Zubillaga, N. (2013). Alemanetik euskaratutako haur- eta gazte-literatura: zuzeneko nahiz zeharkako itzulpenen azterketa corpus baten bidez (Doctoral dissertation). Universidad del País Vasco. Retrieved April 3, 2020, from http://hdl.handle.net/10810/12431

  • Zubillaga, N., Sanz-Villar, Z., & Uribarri, I. (2015). Building a trilingual parallel corpus to analyse literary translations from German into Basque. In C. Fantinuoli & F. Zanettin (Eds.), New directions in corpus-based translation studies (pp. 71–92). Language Science Press.

    Google Scholar 

Download references

Acknowledgements

Research group TRALIMA/ITZULIK, University of the Basque Country UPV/EHU, GIU 16/48, Basque Government consolidated research group, IT1209-19. Research group ACTRES. Part of this study has been supported by the Spanish Agency for Research, Development and Innovation (Ministry of Economy and Competitiveness) [FFI2016-75672-R]. Red de Excelencia CorpusNet, funded by Ministry of Economy and Competitiveness project [FFI2016-81934-RED]. At the time of writing, the co-author Olaia Andaluz-Pinedo is a doctoral student funded by the University of the Basque Country UPV/EHU [PIF]. We would like to thank the reviewers for their useful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olaia Andaluz-Pinedo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

Not applicable.

Consent to participate.

Not applicable.

Consent for publication.

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Andaluz-Pinedo, O., Sanjurjo-González, H. Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre. Lang Resources & Evaluation 56, 651–671 (2022). https://doi.org/10.1007/s10579-022-09585-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-022-09585-5

Keywords

Navigation