Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre

263 Accesses
2 Citations
Explore all metrics

Abstract

Software tools are of vital importance in corpus-based research, but they can also lead to restrictions on the type of supported corpora and the range of analyses that can be performed. For example, corpus analysis tools, as general purpose software, do not include specific features to process corpora of theatre plays. This situation is even worse for parallel corpora of theatrical texts, in that there is currently a lack of software that allows for both the alignment and analysis of parallel corpora here. In this contribution, we will first outline the peculiarities of theatre texts and suggest three software features to address them: annotation of the structural units of plays, alignment at the utterance level, and concordances and statistics using the annotated units. Second, we will present the specific functionalities of TAligner and ACM to build and analyse parallel corpora of play texts, showing how new avenues of research are opening up with the development of these tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Data availability

Not applicable.

Code availability

Software application.

Notes

Following Xiao and Yue (2004, p. 240), parallel corpora are understood as a set of source texts aligned with their translations. From the perspective of corpus tools, parallel corpora are more complex than monolingual or comparable ones, since issues such as alignment need to be considered (Sanjurjo-González, 2018, p. 25). Although the focus of this contribution is on parallel corpora, the adjustments of tools for structural annotation and specific analytical functions may also be useful for monolingual corpora.
TAligner can be accessed at https://addi.ehu.es/handle/10810/42445.
We might mention that, apart from theatre plays, film and TV scripts also belong to the dramatic field (Esslin, 1990, p. 31) and share these structural peculiarities. Therefore, the analysis of these text types could also benefit from the advances in analytical tools suggested here.
Theatre-specific features are marked with asterisks.
http://wp.lancs.ac.uk/shakespearelang/project-resources/data/.
http://wp.lancs.ac.uk/shakespearelang/files/2019/08/ESC-First-Folio-Plus-Manual32483.pdf.
http://wp.lancs.ac.uk/shakespearelang/files/2019/08/ESC-Comparative-Plays-Corpus-Manual32481.pdf.
https://lexically.net/wordsmith/support/shakespeare.html.
https://dracor.org/.
http://corpora.lancs.ac.uk/esc-user-service/.
https://ezlinavis.dracor.org/.
Evert (2014) points out that “cwb-align isn't a particularly sophisticated sentence aligner, so it's likely to get some cases wrong”.
Line breaks might be useful for alignment; however, utterances can have more than one paragraph. In addition, the use of line breaks in plays may be inconsistent. Moreover, features of AntPConc are far from those of the tool’s monolingual version. AntPConc can be considered as a simple parallel concordance.
The application offers frequency lists, but so far they take texts as wholes (Sanz-Villar & Andaluz-Pinedo, 2021).
ACTRES Corpus Manager pending register.
https://doi.org/10.5281/zenodo.1212303.
A recently developed tool for custom annotation, OpenTagger (Sanjurjo-González & Andaluz-Pinedo, 2020), is planned to be integrated into ACM-theatre in the future.

References

Anthony, L. (2013). A critical look at software tools in corpus linguistics. Linguistic Research. https://doi.org/10.17250/khisli.30.2.201308.001
Anthony, L. (2014). AntPConc (Versión 1.1.0) [Software]. Tokio: Waseda University. Retrieved April 3, 2020, from http://www.laurenceanthony.net/
Archer, D., Wilson, A. & Rayson, P. (2002). Introduction to the USAS category system. Benedict project report. http://ucrel.lancs.ac.uk/usas/usas%20guide.pdf
Arrula, G. (2018). Autoitzulpenaren teoria eta praktika Euskal Herrian / Theory and practice of self-translation in the Basque Country (Doctoral dissertation). Bilbao: Universidad del País Vasco. Retrieved April 3, 2020, from http://hdl.handle.net/10810/27983
Bandín, E. (2007) Traducción, recepción y censura de teatro clásico inglés en la España de Franco. Estudio descriptivo-comparativo del Corpus TRACEtci (1939–1985) (Doctoral dissertation). Universidad de León. Retrieved April 3, 2020, from https://buleria.unileon.es/handle/10612/1885
Culpeper, J. (2014). Keywords and Characterization. An Analysis of Six Characters in Romeo and Juliet. In D. L. Hoover, J. Culpeper, & K. O’Halloran (Eds.), Digital literary studies. Corpus approaches to poetry, prose and drama (pp. 9–33). Routledge.
Doval, I., & Sánchez-Nieto, T. (2019). Parallel corpora in focus: An account of current achievements and challenges. In I. Doval & M. T. Sánchez Nieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 1–15). John Benjamins.
Chapter Google Scholar
Esslin, M. (1990). The Field of Drama. Methuen.
Google Scholar
Evert, S., & Hardie, A. (2011). Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. In Proceedings of the Corpus Linguistics Conference 2011. University of Birmingham.
Evert, S. (2014). [CWB] A question about the aligning using cwb-encoding (CWB mailing list). Retrieved April 3, 2020, from http://liste.sslmit.unibo.it/pipermail/cwb/2014-January/001529.html
Evert, S. (2020). The IMS Open Corpus Workbench (CWB) CQP Query Language Tutorial. Retrieved September 2, 2020, from http://cwb.sourceforge.net/files/CQP_Tutorial.pdf
Gutiérrez-Lanza, C., Bandín, E., García-González, J. E., & Lobejón-Santos, S. (2015). Desarrollo de software de etiquetado y alineación textual: TRACE Corpus Tagger/Aligner 1.0©. Paper presented at the II Congreso Internacional de Humanidades Digitales Hispánicas: Innovación, globalización e impacto, Madrid, Spain. Retrieved April 3, 2020, from http://hdh2015.linhd.es/ebook/hdh15-gutierrezlanza.xhtml
Hardie, A. (2012). CQPweb—Combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics. https://doi.org/10.1075/ijcl.17.3.04har
Article Google Scholar
Johansson, S., & Hofland, K. (1994). Towards an English-Norwegian parallel corpus. In U. Fries, G. Tottie, & P. Schneider (Eds.), Creating and Using English Language Corpora (pp. 25–37). Rodopi.
Google Scholar
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., & Suchomel, V. (2014). The sketch engine: Ten years on. Lexicography. https://doi.org/10.1007/s40607-014-0009-9
Article Google Scholar
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. Proceedings of Machine Translation Summit, X(5), 79–86.
Google Scholar
Lavid, J. (2019). Discourse annotation in the MULTINOT corpus: Issues and challenges. In I. Doval & M. T. Sánchez Nieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 159–182). John Benjamins.
Google Scholar
Marco, J. (2019). Living with parallel corpora: The potentials and limitations of their use in translation research. In I. Doval & M. T. Sánchez Nieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 39–56). John Benjamins.
Google Scholar
McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge University Press.
Google Scholar
Merino-Álvarez, R. (2007). La homosexualidad censurada: estudio sobre corpus de teatro TRACEti inglés-español (desde 1960). R. Merino-Álvarez (Ed.), Traducción y censura en España (1939–1985). Estudios sobre corpus TRACE: cine, narrativa, teatro. Universidad de León/Universidad del País Vasco.
Merino-Álvarez, R. (1992). Rewriting for the Spanish stage. KOINÉ. Annali della Scuola Superiore per Interpreti e Traduttori San Pellegrino, 2(1–2), 283–289.
Google Scholar
Merino-Álvarez, R. (1994). Traducción, tradición y manipulación Teatro inglés en España 1950–1990. Universidad de León/Universidad del País Vasco.
Google Scholar
Merino-Álvarez, R., & Andaluz-Pinedo, O. (2017). Peter Shaffer en la cultura española. Creneida. Anuario De Literaturas Hispánicas, 5, 239–278.
Article Google Scholar
Miller, A. (1955). The Crucible. Penguin Books.
Google Scholar
Molés-Casés, T., & Oster, U. (2019). Indexation and analysis of a parallel corpus using CQPweb: The COVALT PAR_ES Corpus (EN/FR/DE > ES). In I. Doval & M. T. Sánchez Nieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 197–214). John Benjamins.
Google Scholar
Oksefjell, S. (1999). A description of the English-Norwegian parallel corpus. International Journal of Corpus Linguistics, 4(2), 197–219.
Article Google Scholar
Pérez, M. (2004). Traducciones censuradas de teatro norteamericano en la España de Franco (1939–1963) (Doctoral dissertation). Universidad del País Vasco.
Google Scholar
Rafalovitch, A., & Dale, R. (2009). United Nations general assembly resolutions: A six-language parallel corpus. MT Summit XII (pp. 292–299). AMTA.
Google Scholar
Sanjurjo-González, H. (2017b). Creación de un framework para el tratamiento de corpus lingüísticos - Development of a framework for corpus linguistic (Doctoral dissertation). Universidad de León, León, Spain. Retrieved April 3, 2020, from https://buleria.unileon.es/handle/10612/6920
Sanjurjo-González, H. (2017a). ACTRES Corpus Manager. [Computer software].
Sanjurjo-González, H., & Andaluz-Pinedo, O. (2020). OpenTagger: A flexible and user-friendly linguistic tagger. 56th Linguistics Colloquium. http://hdl.handle.net/10810/48683
Sanjurjo-González, H. (2018). Creación de un framework para el tratamiento de corpus lingüísticos (Development of a framework for corpus linguistic análisis). Universidad de León, Área de Publicaciones.
Google Scholar
Sanjurjo-González, H., & Izquierdo, M. (2019). P-ACTRES 2.0: a parallel corpus for cross-linguistic research. In I. Doval & M. T. SánchezNieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 215–232). John Benjamins.
Google Scholar
Sanz-Villar, Z. (2015). Unitate fraseologikoen itzulpena: alemana-euskara. Literatur testuen corpusean oinarritutako analisia (Doctoral dissertation). University of the Basque Country UPV/EHU. Retrieved April 3, 2020, from http://hdl.handle.net/10810/15128
Sanz-Villar, Z. (2019). An overview of basque corpora and the extraction of certain multi-word expressions from a translational corpus. In I. Doval & M. T. SánchezNieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 233–247). John Benjamins.
Google Scholar
Sanz-Villar, Z., & Andaluz-Pinedo, O. (2021). TAligner 3.0: a tool to create parallel and multilingual corpora. In J. Lavid, C. Maíz-Arévalo, & J. R. Zamorano-Mansilla (Eds.), Corpora in translation research: recent advances and applications (pp. 126–146). John Benjamins.
Google Scholar
Scott, M. (2012). WordSmith Tools (Versión 6) [Software]. Stroud: Lexical Analysis Software. Retrieved April 3, 2020, from http://www.lexically.net/wordsmith/
Stührenberg, M. (2012). The TEI and current standards for structuring linguistic data. An overview. Journal of the Text Encoding Initiative. https://doi.org/10.4000/jtei.523
Article Google Scholar
TEI Consortium (2019). Performance Texts. TEI P5: Guidelines for Electronic Text Encoding and Interchange (pp. 234–259). https://tei-c.org/release/doc/tei-p5-doc/en/Guidelines.pdf
Xiao, R., & Yue, M. (2009). Using corpora in translation studies: The state of the art. In P. Baker (Ed.), Contemporary Corpus Linguistics (pp. 237–261). Continuum.
Zeldes, A., Lüdeling, A., Julia, R., & Chiarcos, C. (2009). ANNIS: a search tool for multi-layer annotated corpora. In M. Mahlberg, V. González Díaz & C. Smith (Eds.), Proceedings of the Corpus Linguistics Conference 2009 (pp. 358–362). University of Liverpool.
Zubillaga, N. (2013). Alemanetik euskaratutako haur- eta gazte-literatura: zuzeneko nahiz zeharkako itzulpenen azterketa corpus baten bidez (Doctoral dissertation). Universidad del País Vasco. Retrieved April 3, 2020, from http://hdl.handle.net/10810/12431
Zubillaga, N., Sanz-Villar, Z., & Uribarri, I. (2015). Building a trilingual parallel corpus to analyse literary translations from German into Basque. In C. Fantinuoli & F. Zanettin (Eds.), New directions in corpus-based translation studies (pp. 71–92). Language Science Press.
Google Scholar

Download references

Acknowledgements

Research group TRALIMA/ITZULIK, University of the Basque Country UPV/EHU, GIU 16/48, Basque Government consolidated research group, IT1209-19. Research group ACTRES. Part of this study has been supported by the Spanish Agency for Research, Development and Innovation (Ministry of Economy and Competitiveness) [FFI2016-75672-R]. Red de Excelencia CorpusNet, funded by Ministry of Economy and Competitiveness project [FFI2016-81934-RED]. At the time of writing, the co-author Olaia Andaluz-Pinedo is a doctoral student funded by the University of the Basque Country UPV/EHU [PIF]. We would like to thank the reviewers for their useful comments.

Author information

Authors and Affiliations

University of the Basque Country UPV/EHU, Leioa, Spain
Olaia Andaluz-Pinedo
University of Deusto, Bilbao, Spain
Hugo Sanjurjo-González

Authors

Olaia Andaluz-Pinedo
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Sanjurjo-González
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olaia Andaluz-Pinedo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

Not applicable.

Consent to participate.

Not applicable.

Consent for publication.

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Andaluz-Pinedo, O., Sanjurjo-González, H. Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre. Lang Resources & Evaluation 56, 651–671 (2022). https://doi.org/10.1007/s10579-022-09585-5

Download citation

Accepted: 31 January 2022
Published: 24 February 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s10579-022-09585-5

Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre

Abstract

Access this article

Subscribe and save

Buy Now

Data availability

Code availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate.

Consent for publication.

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation