Abstract
Recent many password guessing algorithms based on the Probabilistic Context-Free Grammars (PCFGs) model brought significant improvements in password cracking. These algorithms analyzed common semantic patterns (letter semantic patterns, date patterns, keyboard patterns etc.) from passwords and modeled the construction process of passwords by using PCFGs. However, there still left a large fraction of integral segments in passwords which seem no semantics. Can those segments be deeply analyzed and help to make further improvements on password cracking? Motivated by this challenge, this paper employs Byte Pair Encoding (BPE) algorithm for password segmentation, extracting those non-semantical patterns which are frequently used in passwords subconsciously by people. Based on the segmentation, we propose a BPE-PCFGs model to generate password guesses. Furthermore, we also utilize the existing common semantic patterns and BPE patterns to construct a new Rich-BPE-PCFGs password generator. Experimental results on large-scale password sets show that our Rich-BPE-PCFGs model can obtain a 2.36%–37.56% improvement over the original PCFGs model, which is a good complement to existing password guessing algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Herley, C., Van Oorschot, P.: A research agenda acknowledging the persistence of passwords. IEEE Secur. Priv. 10(1), 28–36 (2012)
Paar, C., Pelzl, J.: Understanding Cryptography: A Textbook for Students and Practitioners. Springer, Heidelberg (2010)
JohntheRipper: Password cracking wordlist. http://www.openwall.com/wordlists/
WikiPedia: Dictionary attack. https://en.wikipedia.org/wiki/Dictionary_attack
Weir, M., Aggarwal, S., De Medeiros, B., Glodek, B.: Password cracking using probabilistic context-free grammars. In: 2009 30th IEEE Symposium on Security and Privacy, pp. 391–405. IEEE (2009)
Veras, R., Collins, C., Thorpe, J.: On semantic patterns of passwords and their security impact. In: NDSS (2014)
Veras, R., Thorpe, J., Collins, C.: Visualizing semantics in passwords: the role of dates. In: Proceedings of the Ninth International Symposium on Visualization for Cyber Security, pp. 88–95. ACM (2012)
Houshmand, S., Aggarwal, S., Flood, R.: Next gen PCFG password cracking. IEEE Trans. Inf. Forensics Secur. 10(8), 1776–1791 (2015)
Li, Z., Han, W., Xu, W.: A large-scale empirical analysis of Chinese web passwords. In: USENIX Security Symposium, pp. 559–574 (2014)
Li, Y., Wang, H., Sun, K.: A study of personal information in human-chosen passwords and its security implications. In: IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, pp. 1–9. IEEE (2016)
Wang, D., Zhang, Z., Wang, P., Yan, J., Huang, X.: Targeted online password guessing: an underestimated threat. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1242–1254. ACM (2016)
Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units (2015). arXiv preprint: arXiv:1508.07909
Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., Arikawa, S.: Byte pair encoding: a text compression scheme that accelerates pattern matching. Technical report DOI-TR-161, Department of Informatics, Kyushu University (1999)
Kelley, P.G., Komanduri, S., Mazurek, M.L., Shay, R., Vidas, T., Bauer, L., Christin, N., Cranor, L.F., Lopez, J.: Guess again (and again and again): measuring password strength by simulating password-cracking algorithms. In: 2012 IEEE Symposium on Security and Privacy (SP), pp. 523–537. IEEE (2012)
Zhang, Y., Monrose, F., Reiter, M.K.: The security of modern password expiration: an algorithmic framework and empirical analysis. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 176–186. ACM (2010)
Bonneau, J.: The science of guessing: analyzing an anonymized corpus of 70 million passwords. In: 2012 IEEE Symposium on Security and Privacy (SP), pp. 538–552. IEEE (2012)
Chou, H.C., Lee, H.C., Yu, H.J., Lai, F.P., Huang, K.H., Hsueh, C.W.: Password cracking based on learned patterns from disclosed passwords. IJICIC 9(2), 821–839 (2013)
Weir, C.M.: Using probabilistic techniques to aid in password cracking attacks. The Florida State University (2010)
Pinyin, S.: Chinese Pinyin names in sogous list (2015). http://pinyin.sogou.com/dict/detail/index/34816
SSA: Popular baby names. U.S. social security administration (2013). http://www.ssa.gov/oact/babynames/limits.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wang, X., Wang, D., Chen, X., Xu, R., Shi, J., Guo, L. (2017). Improving Password Guessing Using Byte Pair Encoding. In: Nguyen, P., Zhou, J. (eds) Information Security. ISC 2017. Lecture Notes in Computer Science(), vol 10599. Springer, Cham. https://doi.org/10.1007/978-3-319-69659-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-69659-1_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69658-4
Online ISBN: 978-3-319-69659-1
eBook Packages: Computer ScienceComputer Science (R0)