Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition

Rafaqat Hussain¹,
Hui Gao¹ &
Riaz Ahmed Shaikh²

1115 Accesses
14 Citations
Explore all metrics

Abstract

Over last few years, CAPTCHAs are ubiquitously found on internet as a security mechanism to distinguish between humans and spams. The text-based CAPTCHAs offer users to recognize the distorted text from the challenged images. Having based on hard AI problem, they have emerged as a hot research topic in computer vision and machine learning. The contemporary text-based CAPTCHAs are based on the segmentation problem that involves their decomposition into sub-images of individual characters. This is a challenging task for current OCR programs which is not yet solved to a great extent. In this paper, we present a novel segmentation and recognition method which uses simple image processing techniques including thresholding, thinning and pixel count methods along with an artificial neural network for text-based CAPTCHAs. We attack the popular CCT (Crowded Characters Together) based CAPTCHAs and compare our results with other schemes. As overall, our system achieves an overall precision of 51.3, 27.1 and 53.2% for Taobao, MSN and eBay datasets with 1000,500 and 1000 CAPTCHAs respectively. The benefits of this research are twofold: by recognizing text-based CAPTCHAs, we not only explore the weaknesses in the current design but also find a way to segment and recognize the connected characters from images. The proposed algorithm can be used in digitization of ancient books, handwriting recognition and other similar tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

eCS: Enhanced Character Segmentation – A Structural Approach for Handwritten Kannada Scripts

An Efficient Character Segmentation Algorithm for Connected Handwritten Documents

Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition

Article 31 January 2020

References

Ahn LV, Blum M, John L (2004) Telling humans and computers apart automatically. Commun ACM 47(2):56–60
Article Google Scholar
Blumenstein M, Verma B, Basli H (2003) A novel feature extraction technique for the recognition of segmented handwritten characters. In: Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference (pp. 137–141). IEEE
Bursztein E, Martin M, Mitchell J (2011) Text-based CAPTCHA strengths and weaknesses. In: Proceedings of the 18th ACM conference on Computer and communications security, pp. 125–138. ACM
Chandavale AA, Sapkal A (2012) A new approach towards segmentation for breaking CAPTCHA. In: International Conference on Security in Computer Networks and Distributed Systems (pp. 323–335). Springer Berlin Heidelberg
Chellapilla K, Larson K, Simard PY, Czerwinski M (2005) Building segmentation based human-friendly human interaction proofs (HIPs), Human Interactive Proofs pp. 1–26. Springer, Berlin Heidelberg
Book Google Scholar
El Ahmad AS, Yan J, Tayara M (2011) The robustness of Google CAPTCHA’s. Computing Science, Newcastle University
Fang K, Bu Z, Xia ZY (2012) Segmentation of CAPTCHAs based on complex networks. In: International Conference on Artificial Intelligence and Computational Intelligence (pp. 735–743). Springer Berlin Heidelberg
Gao H, Wang X, Cao F, Zhang Z, Lei L, Qi J, Liu X (2016) Robustness of text-based completely automated public turing test to tell computers and humans apart. IET Inf Secur 10(1):45–52
Article Google Scholar
Gao H, Wang W, Fan Y, Qi J, Liu X (2014) The Robustness of “Connecting Characters Together” CAPTCHAs. J Inf Sci Eng 30(2):347–369
Google Scholar
Gaurav DD, Ramesh R (2012). A feature extraction technique based on character geometry for character recognition. arXiv preprint arXiv:1202.3884
Huang SY, Lee YK, Bell G, Ou ZH (2010) “An efficient segmentation algorithm for CAPTCHAs”, with line cluttering and character warping. Multimed Tools Appl 48(2):267–289
Article Google Scholar
Mori G, Malik J (2003) Recognizing objects in adversarial clutter: Breaking a visual CAPTCHA. In: Computer Vision and Pattern Recognition, (Vol. 1, pp. I-134). Proceedings of IEEE Computer Society Conference IEEE
Otsu N (1975) A threshold selection method from gray-level histograms. Automatica 11:285–296
Article Google Scholar
Simard PY (2004) Using machine learning to break visual human interaction proofs. Adv Neural Inf Proces Syst 17:265–272
Google Scholar
Starostenko O, Cruz-Perez C, Uceda-Ponga F, Alarcon-Aquino V (2015) Breaking text-based CAPTCHAs with variable word and character orientation. Pattern Recogn 48(4):1101–1112
Article Google Scholar
Yan J, El Ahmad AS (2008) A low-cost attack on a microsoft CAPTCHA. In: Proceedings of the 15th ACM conference on Computer and communications security (pp. 543–554) ACM
Zhang TY, Suen CY (1984) A fast parallel algorithm for thinning digital patterns. Commun ACM 27(3):236–239
Article Google Scholar
Zhang H, Wen X (2014) The recognition of CAPTCHA based on fuzzy matching. In: Foundations of Intelligent Systems (pp. 759–768). Springer Berlin Heidelberg

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of Electronics Science and Technology of China, Chengdu, 611731, China
Rafaqat Hussain & Hui Gao
Department of Computer Science, Shah Abdul Latif University, Khairpur, 66020, Pakistan
Riaz Ahmed Shaikh

Authors

Rafaqat Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Hui Gao
View author publications
You can also search for this author in PubMed Google Scholar
Riaz Ahmed Shaikh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafaqat Hussain.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hussain, R., Gao, H. & Shaikh, R.A. Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition. Multimed Tools Appl 76, 25547–25561 (2017). https://doi.org/10.1007/s11042-016-4151-2

Download citation

Received: 31 July 2016
Revised: 08 October 2016
Accepted: 11 November 2016
Published: 18 November 2016
Issue Date: December 2017
DOI: https://doi.org/10.1007/s11042-016-4151-2

Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

eCS: Enhanced Character Segmentation – A Structural Approach for Handwritten Kannada Scripts

An Efficient Character Segmentation Algorithm for Connected Handwritten Documents

Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

eCS: Enhanced Character Segmentation – A Structural Approach for Handwritten Kannada Scripts

An Efficient Character Segmentation Algorithm for Connected Handwritten Documents

Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation