Abstract
Plagiarism detection is a widely used technique to uniquely identify quality of work. We address in this paper, the problem of predicting similarities amongst a collection of documents. This technique has widespread uses in academic institutions. In this paper, we propose a simple yet effective method for detection of plagiarism by using a robust word detection and segmentation procedure followed by a convolution neural network (CNN)—Bi-directional Long Short Term Memory (biLSTM) pipeline to extract the text. Our approach also extract and encodes common patterns like scratches in handwriting for improving accuracy on real-world use cases. The extracted information from multiple documents using comparison metrics are used to find the documents which have been plagiarized from a source. Extensive experiments in our research show that this approach may help simplify the examining process and can act as a cheap viable alternative to many modern approaches used to detect plagiarism from handwritten documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tripathi, R., Tiwari, P., Nithyanandam, K.: Avoiding plagiarism in research through free online plagiarism tools. In: 4th International Symposium on Emerging Trends and Technologies in Libraries and Information Services, pp. 275–280 (2015)
Rath, T.M., Manmatha, R.: Word spotting for historical documents. IJDAR (2007)
Rodriguez-Serrano, J.A., Perronnin, F.: A model-based sequence similarity with application to handwritten word spotting. PAMI (2012)
Rusinol, M., Aldavert, D., Toledo, R., Llados, J.: Efficient segmentation-free keyword spotting in historical document collections. PR (2015)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Potthast, M., et al.: Overview of the 6th International Competition on Plagiarism Detection. In: CLEF (2014)
Gandhi, A., Jawahar, C.V.: Detection of cut-and-paste in document images. In: ICDAR (2013)
Krishnan, P., Jawahar, C.V.: Matching handwritten document images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol. 9905. Springer, Cham, Switzerland (2016)
Jiao, L., et al.: A survey of deep learning-based object detection. IEEE Access (2019)
Wise, M.J.: YAP3: improved detection of similarities in computer program and other texts. In: Proceedings of SIGCSE’96 Technical Symposium (1996)
Batomalaque, M.B., Camacho, C.M.R., Dalida, M.J.P., Delmo, J.A.B.: Image to text conversion technique for anti-plagiarism system. In: International Journal of Advanced Science and Convergence (2019)
Gitchell, D., Tran, N.: Sim: A utility for detecting similarity in computer programs. In: Proceedings of the 30th SIGCSE Technical Symposium on Computer Science Education (1999)
Zhao, Z.Q., Zheng, P., Zheng, P., Xu, S.T., Wu, X.: Object detection with deep learning: A review. IEEE Trans. Neural. Netw. Learn. Syst. 30(11), 3212–3232 (2019)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Liu, W., et al.: Ssd: Single shot multibox detector. In: ECCV (2016)
Girshick, R.: Fast R-CNN. arXiv:1504.08083 (2015)
Xu, L., Ren, J., Liu, C., Jia, J.: Deep Convolutional Neural Network for Image Deconvolution. In: NIPS (2014)
Ding, Z., Xia, R., Yu, J., Li, X., Yang, J.: Densely connected bidirectional lstm with applications to sentence classification. In: CCF International Conference on Natural Language Processing and Chinese Computing, Springer, Cham (2018)
Loper, E., Bird, S.: NLTK: The Natural Language ToolKit. In: ETMTNLP’02 (2002)
Github Homepage. https://pyenchant.github.io/pyenchant/index.html
Github Homepage. https://github.com/barrust/pyspellchecker
Marti, U., Bunke, H., Bunke, H.: The IAM-database: An english sentence database for off-line handwriting recognition. IJDAR 5 , 39–46 (2002)
Poznanski, A., Wolf, L.: Cnn-n-gram for handwriting word recognition in CVPR (2016)
Castro, D., Bezerra, B.L.D., Valena, M.: Boosting the deep multidimensional long-short-term memory network for handwritten recognition systems. In: ICFHR (2018)
Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. ICDAR (2017)
Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. ICFHR (2016)
Ingle, R., Fujii, Y., Deselaers, T., Baccash, J., Popat, A.C.: A Scalable Handwritten Text Recognition System Google Research (2019)
Balci, B., Saadati, D., Shiferaw, D.: Handwritten Text Recognition using Deep Learning Stanford Edu. (2017)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. NIPS (2013)
Kingma, D.P., Ba, J.L.: Adam: A method for stochastic optimization (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Lahitani, A.R., Permanasari, A.E., Setiawan, N.A.: Cosine similarity to determine similarity measure. In: ICIT (2016)
Ed.gov Homepage. https://files.eric.ed.gov/fulltext/EJ1112609.pdf
p.org Homepage. https://www.plagiarism.org/blog/2017/11/16/what-does-confidence-have-to-do-with-plagiarism
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Pandey, O., Gupta, I., Mishra, B.S.P. (2020). A Robust Approach to Plagiarism Detection in Handwritten Documents. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2020. Lecture Notes in Computer Science(), vol 12510. Springer, Cham. https://doi.org/10.1007/978-3-030-64559-5_54
Download citation
DOI: https://doi.org/10.1007/978-3-030-64559-5_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64558-8
Online ISBN: 978-3-030-64559-5
eBook Packages: Computer ScienceComputer Science (R0)