Published September 11, 2017
| Version v1
Dataset
Open
PAN17 Author Identification: Clustering
Creators
- 1. Universität Leipzig
- 2. Bauhaus-Universität Weimar
Description
We provide a collection of (up to 50) short documents (paragraphs extracted from larger documents), identify authorship links and groups of documents by the same author. All documents are single-authored, in the same language, and belong to the same genre. However, the topic or text-length of documents may vary. The number of distinct authors whose documents are included in the collection is not given.
More information: Link
Files
pan17-author-clustering-test-and-training.zip
Files
(961.4 kB)
Name | Size | Download all |
---|---|---|
md5:bf825e50ccd9581d72ae09345bd4de65
|
961.4 kB | Preview Download |
Additional details
References
- Martin Potthast, Francisco Rangel, Michael Tschuggnall, Efstathios Stamatatos, Paolo Rosso, and Benno Stein. Overview of PAN 2017: Author Identification, Author Profiling, and Author Obfuscation. In Gareth J. F. Jones et al, editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction. 8th International Conference of the CLEF Initiative (CLEF 2017), Berlin Heidelberg New York, September 2017. Springer.