Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3549737.3549750acmotherconferencesArticle/Chapter ViewAbstractPublication PagessetnConference Proceedingsconference-collections
research-article

Authorship Attribution in Greek Literature Using Word Adjacencies

Published: 09 September 2022 Publication History

Abstract

Authorship attribution stems from the idea that one can use a text to derive useful conclusions about its author. It is a rather old idea, that has recently gained momentum due to the major technological leaps achieved in computer science. These leaps enabled researchers to process large amounts of information in reasonable time and to employ sophisticated algorithms for extracting textual features that may indicate something useful about the author. In this paper, a method for authorship attribution is investigated that resorts to Word Adjacency Networks (WANs). The method was originally proposed for author attribution in English corpora by Segarra, Eisen, and Ribeiro [23]. The paper builds upon this method by demonstrating its capability to capture the stylometric patterns of authors in Greek literature. Contrary to English, Greek is a strongly synthetic language. The data used in the experiments comprise literature pieces from 20 Greek authors that belong to two different groups. The first 9 (i.e., Eftaliotis, Karkavitsas, Kondylakes, Moraitides, Nirvanas, Papadiamantes, Rhoides, Vikelas, and Viziinos) form the first group. The remaining 11 (i.e., Delta, Empirikos, Karagatsis, Kastanakis, Kontoglou, Mirivilis, Politis, Prevelakis, Terzakis, Theotokas, and Venezis) form the second group whose nucleus is the so-called Generation of 30s’. The problem is formulated mathematically and the attribution algorithm is tested for a wide range of parameter values. The experimental findings show that authorship attribution with WANs can give satisfactory results as long as there are adequate training data. After fine tuning the parameters of the method, the total attribution accuracy was found to be in the first group of authors and in the second group of authors. This demonstrates the potential of the method and its applicability to morphologically and syntactically rich languages.

References

[1]
Γεώργιος Κ. Μικρός. 2015. Υπολογιστική Υφολογία. Ελληνικά Ακαδημαΐκά Ηλεκτρονικά Συγγράμματα και Βοηθήματα, Σύνδεσμος Ελληνικών Ακαδημαϊκών Βιβλιοθηκών, Εθνικό Μετσόβιο Πολυτεχνείο, Ηρώων Πολυτεχνείου 9, 15780 Ζωγράφου. https://repository.kallipos.gr/handle/11419/4860
[2]
Ahmed Abbasi and Hsinchun Chen. 2005. Applying Authorship Analysis to Extremist-Group Web Forum Messages. IEEE Intelligent Systems 20, 5 (Sept. 2005), 67–75. https://doi.org/10.1109/mis.2005.81
[3]
Nikoletta Bassiou and Constantine Kotropoulos. 2011. Long distance bigram models applied to word clustering. Pattern Recognition 44, 1 (Jan. 2011), 145–158. https://doi.org/10.1016/j.patcog.2010.07.006
[4]
John F. Burrows. 1987. Word-Patterns and Story-Shapes: The Statistical Analysis of Narrative Style. Literary and Linguistic Computing 2, 2 (Jan. 1987), 61–70. https://doi.org/10.1093/llc/2.2.61
[5]
Esteban Castillo, Darnes Vilariño, Ofelia Cervantes, and David Pinto. 2015. Author attribution using a graph based representation. In Proceedings of the 2015 International Conference on Electronics, Communications and Computers (CONIELECOMP). IEEE, Cholula, Puebla, Mexico, 135–142. https://doi.org/10.1109/CONIELECOMP.2015.7086940
[6]
Corinna Cortes, Patrick Haffner, and Mehryar Mohri. 2008. A Machine Learning Framework for Spoken-Dialog Classification. In Springer Handbook of Speech Processing. Springer Berlin Heidelberg, Berlin, Heidelberg, 585–596. https://doi.org/10.1007/978-3-540-49127-9_29
[7]
Marc Franco-Salvador, Paolo Rosso, and Manuel Montes y Gómez. 2016. A systematic study of knowledge graph analysis for cross-language plagiarism detection. Information Processing & Management 52, 4 (July 2016), 550–570. https://doi.org/10.1016/j.ipm.2015.12.004
[8]
David L. Hoover. 2003. Frequent Collocations and Authorial Style. Literary and Linguistic Computing 18, 3 (Sept. 2003), 261–286. https://doi.org/10.1093/llc/18.3.261
[9]
Fereshteh Jafariakinabad. 2021. Machine Learning Techniques for Topic Detection and Authorship Attribution in Textual Data. Ph. D. Dissertation. University of Central Florida, Electronic Theses and Dissertations, 2020-. 884. https://stars.library.ucf.edu/etd2020/884
[10]
Patrick Juola. 2007. Authorship Attribution. Foundations and Trends® in Information Retrieval 1, 3(2007), 233–334. https://doi.org/10.1561/1500000005
[11]
George Kesidis and Jean Walrand. 1993. Relative entropy between Markov transition rate matrices. IEEE Transactions on Information Theory 39, 3 (May 1993), 1056–1057. https://doi.org/10.1109/18.256516
[12]
Solomon Kullback and Richard A. Leibler. 1951. On Information and Sufficiency. The Annals of Mathematical Statistics 22, 1 (1951), 79–86. http://www.jstor.org/stable/2236703
[13]
Cyril Labbé and Dominique Labbé. 2012. Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science?Scientometrics 94, 1 (June 2012), 379–396. https://doi.org/10.1007/s11192-012-0781-y
[14]
Nikos Manousakis. 2020. Prometheus Bound – A Separate Authorial Trace in the Aeschylean Corpus. Trends in Classics –Supplementary Volumes edited by Franco Montanari and Antonios Rengakos, Vol. 98. Routledge, Berlin, Boston.
[15]
Peter Martin. 1995. Edmond Malone: A literary biography. Cambridge University Press, Cambridge England.
[16]
Alexander Mehler, Wahed Hemati, Tolga Uslu, and Andy Lücking. 2018. A Multidimensional Model of Syntactic Dependency Trees for Authorship Attribution. In Quantitative Analysis of Dependency Structures, Jingyang Jiang and Haitao Liu (Eds.). De Gruyter Mouton, Berlin, Munich, Boston, 315–348. https://doi.org/10.1515/9783110573565-016
[17]
Thomas C. Mendenhall. 1887. The Characteristic Curves of Composition. Science ns-9, 214s (March 1887), 237–246. https://doi.org/10.1126/science.ns-9.214s.237
[18]
Frederick Mosteller. 1987. A Statistical Study of the Writing Styles of the Authors of ”The Federalist” Papers. Proceedings of the American Philosophical Society 131, 2 (1987), 132–140. http://www.jstor.org/stable/986786
[19]
Evangelia Pantraki, Ioannis Tsingalis, and Constantine Kotropoulos. 2022. Cross-lingual transfer learning: A PARAFAC2 approach. Pattern Recognition Letters 159 (2022), 167–173. https://doi.org/10.1016/j.patrec.2022.05.008
[20]
Pervez Rizvi. 2020. Authorship Attribution for Early Modern Plays Using Function Word Adjacency Networks: A Critical View. ANQ: A Quarterly Journal of Short Articles, Notes, and Reviews 33, 4(2020), 328–331. https://doi.org/10.1080/0895769X.2018.1554473
[21]
Joseph Rudman. 2000. Non-traditional authorship attribution studies: ignis fatuus or Rosetta Stone?Bulletin (Bibliographical Society of Australia and New Zealand) 24, 3(2000), 163–176. https://search.informit.org/doi/10.3316/ielapa.200105720
[22]
Santiago Segarra, Mark Eisen, Gabriel Egan, and Alejandro Ribeiro. 2020. A Response to Rosalind Barber’s Critique of the Word Adjacency Method for Authorship Attribution. ANQ: A Quarterly Journal of Short Articles, Notes, and Reviews 33, 4(2020), 332–337. https://doi.org/10.1080/0895769X.2019.1590797
[23]
Santiago Segarra, Mark Eisen, and Alejandro Ribeiro. 2015. Authorship Attribution Through Function Word Adjacency Networks. IEEE Transactions on Signal Processing 63, 20 (Oct. 2015), 5464–5478. https://doi.org/10.1109/tsp.2015.2451111
[24]
Efstathios Stamatatos. 2009. A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60, 3 (March 2009), 538–556. https://doi.org/10.1002/asi.21001
[25]
Hans van Halteren, Harald Baayen, Fiona Tweedie, Marco Haverkort, and Anneke Neijt. 2005. New Machine Learning Methods Demonstrate the Existence of a Human Stylome. Journal of Quantitative Linguistics 12, 1 (April 2005), 65–77. https://doi.org/10.1080/09296170500055350

Cited By

View all
  • (2023)A Transformer-Based Approach to Authorship Attribution in Classical Arabic TextsApplied Sciences10.3390/app1312725513:12(7255)Online publication date: 18-Jun-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
SETN '22: Proceedings of the 12th Hellenic Conference on Artificial Intelligence
September 2022
450 pages
ISBN:9781450395977
DOI:10.1145/3549737
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Markov chains
  2. authorship attribution
  3. text annotation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SETN 2022

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)2
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Transformer-Based Approach to Authorship Attribution in Classical Arabic TextsApplied Sciences10.3390/app1312725513:12(7255)Online publication date: 18-Jun-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media