Text Reuse Detection in Handwritten Documents

A. V. Grabovoy^1,2,
M. S. Kaprielova^1,2,3,
A. S. Kildyakov¹,
I. O. Potyashin¹,
T. B. Seyil¹,
E. L. Finogeev¹ &
…
Yu. V. Chekhovich^1,3

58 Accesses
Explore all metrics

Abstract

Plagiarism detection in scholar assignments becomes more and more relevant nowadays. Rapidly growing popularity of online education, active expansion of online educational platforms for secondary and high school education create demand for development of an automatic reuse detection system for handwritten assignments. The existing approaches to this problem are not usable for searching for potential sources of reuse on large collections, which significantly limits their applicability. Moreover, real-life data are likely to be low-quality photographs taken with mobile devices. We propose an approach that allows detecting text reuse in handwritten documents. Each document is a picture and the search is performed on a large collection of potential sources. The proposed method consists of three stages: handwritten text recognition, candidate search and precise source retrieval. We represent experimental results for the quality and latency estimation of our system. The recall reaches 83.3% in case of better quality pictures and 77.4% in case of pictures of lower quality. The average search time is 3.2 s per document on CPU. The results show that the created system is scalable and can be used in production, where fast reuse detection for hundreds of thousands of scholar assignments on large collection of potential reuse sources is needed. All the experiments were held on HWR200 public dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PlagZap: A Textual Plagiarism Detection System for Student Assignments Built with Open-Source Software

Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts

Plagiarism Detection Software: Promises, Pitfalls, and Practices

REFERENCES

A. V. Nikitov, O. A. Orchakov, and Yu. V. Chekhovich, “Plagiarism in works of undergraduate and graduate students: Problem and methods of counteraction,” Univ.skoe Upr.: Prakt. Anal., No. 5, 61–68 (2012).
R. Miguel, “Avoiding plagiarism, self-plagiarism, and other questionable writing practices: A guide to ethical writing,” (2011).
Yu. V. Chekhovich and O. S. Belen’kaya, “Methodology for the implementation and use of text reuse detection systems in secondary education,” Inf. Obraz., No. 10, 5–14 (2021). https://doi.org/10.32517/0234-0453-2021-36-10-5-14
K. Praveen and C. V. Jawahar, “Matching handwritten document images,” in Computer Vision-ECCV 2016, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling, Lecture Notes in Computer Science, Vol. 9905 (Springer, Cham, 2016), pp. 766–782. https://doi.org/10.1007/978-3-319-46448-0_46
Book Google Scholar
O. Bakhteev, R. Kuznetsova, A. Khazov, A. Ogaltsov, K. Safin, T. Gorlenko, M. Suvorova, A. Ivahnenko, P. Botov, Yu. Chekhovich, and V. Mottl, “Near-duplicate handwritten document detection without text recognition,” in Computational Linguistics and Intellectual Technologies: Annual Int. Conf. “Dialogue” (Russian State University for the Humanities, 2021), pp. 47–57. https://doi.org/10.28995/2075-7182-2021-20-47-57
O. Pandey, I. Gupta, and B. S. P. Mishra, “A robust approach to plagiarism detection in handwritten documents,” in Advances in Visual Computing, Ed. by G. Bebis, Lecture Notes in Computer Science, Vol. 12510 (Springer, Cham, 2020), pp. 682–693.
Google Scholar
D. Coquenet, C. Chatelain, and T. Paquet, “End-to-end handwritten paragraph text recognition using a vertical attention network,” IEEE Trans. Pattern Anal. Mach. Intell. 45, 508–524 (2022). https://doi.org/10.1109/tpami.2022.3144899
Article Google Scholar
V. Rowtula, V. Bhargavan, M. Kumar, and C. V. Jawahar, “Scaling handwritten student assessments with a document image workflow system,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops (IEEE, 2018), pp. 2307–2314.
P. Voigtlaender, P. Doetsch, and H. Ney, “Handwriting recognition with large multidimensional long short-term memory recurrent neural networks,” in 2016 15th Int. Conf. on Frontiers in Handwriting Recognition (ICFHR), Shanzhen, China, 2016 (IEEE, 2016), pp. 228–233. https://doi.org/10.1109/icfhr.2016.0052
A. Shonenkov, D. Karachev, M. Novopoltsev, M. Po-tanin, and D. Dimitro, StackMix and blot augmentation for handwritten text recognition, arXiv Preprint (2021). https://doi.org/10.48550/arXiv.2108.11667
D. Nurseitov, K. Bostanbekov, D. Kurmankhojayev, A. Alimova, A. Abdallah, and R. Tolegenov, “Handwritten Kazakh and Russian (HKR) database for text recognition,” Multimedia Tools Appl. 80, 33075–33097 (2021). https://doi.org/10.1007/s11042-021-11399-6
Article Google Scholar
I. Potyashin, M. Kaprielova, Y. Chekhovich, A. Kildyakov, T. Seil, E. Finogeev, and A. Grabovoy, “HWR200: New open access dataset of handwritten texts images in Russian,” in Computational Linguistics and Intellectual Technologies, 2023. Papers from the Annual Int. Conf. “Dialogue” (2023), Vol. 22, pp. 452–458. https://doi.org/10.28995/2075-7182-2023-22-452-458
Article Google Scholar
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, “Syntactic clustering of the Web,” Comput. Networks ISDN Syst. 29, 1157–1166 (1997). https://doi.org/10.1016/s0169-7552(97)00031-7
Article Google Scholar
A. Z. Broder, “On the resemblance and containment of documents,” in Proc. Compression and Complexity of SEQUENCES 1997, Salerno, Italy, 1997 (IEEE Comput. Soc., 1997), pp. 21–29. https://doi.org/10.1109/sequen.1997.666900
U. Manber and G. Myers, “Suffix arrays: A new method for on-line string searches,” SIAM J. Comput. 22, 935–948 (2003). https://doi.org/10.1137/0222058
Article MathSciNet Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet Google Scholar
U. Marti and H. Bunke, “The IAM-database: An English sentence database for offline handwriting recognition,” Int. J. Document Anal. Recognit. 5, 39–46 (2002). https://doi.org/10.1007/s100320200071
Article Google Scholar
B. Gatos, G. Louloudis, T. Causer, K. Grint, V. Romero, J. A. Sanchez, A. H. Toselli, and E. Vidal, “Ground-truth production in the Transcriptorium project,” in 2014 11th IAPR International Workshop on Document Analysis Systems, Tours, France, 2014 (IEEE, 2014), pp. 237–241. https://doi.org/10.1109/das.2014.23
A. H. Toselli, V. Romero, M. Villegas, E. Vidal, and J. A. Sánchez, “HTR dataset ICFHR 2016 (1.2.0),” Zenodo (2016). https://doi.org/10.5281/zenodo.1297399
M. Potanin, D. Dimitrov, A. Shonenkov, V. Bataev, D. Karachev, M. Novopoltsev, and A. Chertok, “Digital Peter: New dataset, competition and handwriting recognition methods,” in The 6th Int. Workshop on Historical Document Imaging and Processing, Lausanne, Switzerland, 2021 (Association for Computing Machinery, New York, 2021), pp. 43–48. https://doi.org/10.1145/3476887.3476892
“School_notebooks,” (2021). https://github.com/ai-forever/htr_datasets/tree/main/school_notebooks
“IDP-forms,” (2021). https://github.com/ai-forever/htr_datasets/tree/main/IDP-forms
N. Toiganbayeva, M. Kasem, G. Abdimanap, K. Bostanbekov, A. Abdallah, A. Alimova, and D. Nurseitov, “KOHTD: Kazakh offline handwritten text dataset,” Signal Process.: Image Commun. 108, 116827 (2022). https://doi.org/10.1016/j.image.2022.116827

Download references

Funding

The work is supported by the Innovation Promotion Fund, project no. 79068, request no. II-208298.

Author information

Authors and Affiliations

Antiplagiat Company, Moscow, Russia
A. V. Grabovoy, M. S. Kaprielova, A. S. Kildyakov, I. O. Potyashin, T. B. Seyil, E. L. Finogeev & Yu. V. Chekhovich
Moscow Institute of Physics and Technology, Moscow, Russia
A. V. Grabovoy & M. S. Kaprielova
Federal Research Center Computer Science and Control, Russian Academy of Sciences, Moscow, Russia
M. S. Kaprielova & Yu. V. Chekhovich

Authors

A. V. Grabovoy
View author publications
You can also search for this author in PubMed Google Scholar
M. S. Kaprielova
View author publications
You can also search for this author in PubMed Google Scholar
A. S. Kildyakov
View author publications
You can also search for this author in PubMed Google Scholar
I. O. Potyashin
View author publications
You can also search for this author in PubMed Google Scholar
T. B. Seyil
View author publications
You can also search for this author in PubMed Google Scholar
E. L. Finogeev
View author publications
You can also search for this author in PubMed Google Scholar
Yu. V. Chekhovich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to A. V. Grabovoy, M. S. Kaprielova, A. S. Kildyakov, I. O. Potyashin, T. B. Seyil, E. L. Finogeev or Yu. V. Chekhovich.

Ethics declarations

The authors of this work declare that they have no conflicts of interest.

Additional information

Translated by E. Oborin

Publisher’s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grabovoy, A.V., Kaprielova, M.S., Kildyakov, A.S. et al. Text Reuse Detection in Handwritten Documents. Dokl. Math. 108 (Suppl 2), S424–S433 (2023). https://doi.org/10.1134/S106456242370120X

Download citation

Received: 02 September 2023
Revised: 15 September 2023
Accepted: 18 October 2023
Published: 11 March 2024
Issue Date: December 2023
DOI: https://doi.org/10.1134/S106456242370120X

Text Reuse Detection in Handwritten Documents

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

PlagZap: A Textual Plagiarism Detection System for Student Assignments Built with Open-Source Software

Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts

Plagiarism Detection Software: Promises, Pitfalls, and Practices

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Additional information

Publisher’s Note.

Rights and permissions

About this article

Cite this article

Keywords:

Subscribe and save

Buy Now

Navigation

Text Reuse Detection in Handwritten Documents

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

PlagZap: A Textual Plagiarism Detection System for Student Assignments Built with Open-Source Software

Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts

Plagiarism Detection Software: Promises, Pitfalls, and Practices

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Additional information

Publisher’s Note.

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Subscribe and save

Buy Now

Search

Navigation