Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

CHIEv: concurrent hybrid analysis for crawling and modeling of web applications

Published: 20 July 2021 Publication History

Abstract

Researchers and practitioners in the fields of testing, security assessment and web development seeking to evaluate a given web application often have to rely on the existence of a model of the respective system, which is then used as input to task-specific tools. Such models may include information on HTTP endpoints and their parameters, available user actions/event listeners and required assets. Unfortunately, this data is often unavailable in practice, as only rigorous development practices or manual analysis guarantee their existence and correctness. Crawlers based on static analysis have traditionally been used to extract required information from existing sites. Regrettably, these tools can not accurately account for the dynamic behavior introduced by technologies such as JavaScript that are prevalent on modern sites. While methods based on dynamic analysis exist, they are often not fully capable of identifying event listeners and their effects. In an earlier work, we presented XIEv, an approach for dynamic analysis of web applications that produces an execution trace usable for the extraction of navigation graphs, identification of bugs at runtime and enumeration of resources. It offers improved recognition and selection of event listeners as well as a greater range of observed effects compared to existing approaches. While the evaluation of our research prototype implementation confirmed the capabilities of XIEv, it was generally out-performed by static crawlers in terms of speed. This work introduces CHIEv, an approach that augments XIEv by enabling concurrent processing as well as incorporating the results of a static crawler in real-time. Our results indicate a significant increase in performance, particularly when applied to larger sites.

References

[1]
T. C. Authors. Chromium policy on JavaScript dialogs. https://developers.google.com/web/ updates/2017/03/dialogs-policy, 2017. Accessed: May 6, 2019.
[2]
T. C. Authors. Chrome DevTools Protocol Viewer. https://chromedevtools.github.io/devtools-protocol/, 2019. Accessed: November 10, 2020.
[3]
C. Boulton. MyBB. https://mybb.com/, 2002. Accessed: February 13, 2019.
[4]
J. Bozic, B. Garn, D. E. Simos, and F. Wotawa. Evaluation of the ipo-family algorithms for test case generation in web security testing. In 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pages 1-10. IEEE, 2015.
[5]
S. F. Conservancy. Selenium - Web Browser Automation. https://www.seleniumhq.org/, 2019. Accessed: January 10, 2019. [6>W. Foundation. MediaWiki. https://www.mediawiki.org/wiki/MediaWiki, 2002. Accessed: February 13, 2019.
[6]
W. Foundation. WordPress. https://wordpress.org/, 2003. Accessed: February 13, 2019.
[7]
I. Free Software Foundation. Wget. https://www.gnu.org/software/wget/, 2017. Accessed: January 25, 2019.
[8]
S. Gupta, G. Kaiser, D. Neistadt, and P. Grimm. Dom-based content extraction of html documents. In Proceedings of the 12th international conference on World Wide Web, pages 207-214, 2003.
[9]
M. Leithner and D. E. Simos. Domdiff: Identification and classification of inter-dom modifications. In 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pages 262-269. IEEE, 2018.
[10]
M. Leithner and D. E. Simos. Xiev: dynamic analysis for crawling and modeling of web applications. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, pages 2201-2210, 2020.
[11]
M. LLC. MODx. https://modx.com/, 2004. Accessed: February 13, 2019.
[12]
S. K. Malik and S. A. Rizvi. Information extraction using web usage mining, web scrapping and semantic annotation. In 2011 International Conference on Computational Intelligence and Communication Networks, pages 465-469. IEEE, 2011.
[13]
A. Mesbah and M. R. Prasad. Automated cross-browser compatibility testing. In Proceedings of the 33rd International Conference on Software Engineering, pages 561-570, 2011.
[14]
A. Mesbah and A. Van Deursen. Invariant-based automatic testing of ajax user interfaces. In 2009 IEEE 31st International Conference on Software Engineering, pages 210-220. IEEE, 2009.
[15]
A. Mesbah, A. Van Deursen, and S. Lenselink. Crawling ajax-based web applications through dynamic analysis of user interface state changes. ACM Transactions on the Web (TWEB), 6(1):1-30, 2012.
[16]
S. Mirshokraie and A. Mesbah. Jsart: Javascript assertion-based regression testing. In International Conference on Web Engineering, pages 238-252. Springer, 2012.
[17]
I. Open Source Matters. Joomla Content Management System (CMS). https://www.joomla.org/, 2005. Accessed: February 13, 2019.
[18]
G. Pellegrino, C. Tschürtz, E. Bodden, and C. Rossow. jäk: Using dynamic analysis to crawl and test modern web applications. In International Symposium on Recent Advances in Intrusion Detection, pages 295-316. Springer, 2015.
[19]
phpBB Limited. phpBB. https://www.phpbb.com/, 2000. Accessed: February 13, 2019.
[20]
M. Simeonovski, G. Pellegrino, C. Rossow, and M. Backes. Who controls the internet? analyzing global threats using property graph traversals. In Proceedings of the 26th International Conference on World Wide Web, pages 647-656, 2017.
[21]
C. Tschürtz. jAEk. https://github.com/ConstantinT/jAEk, 2015. Accessed: April 20, 2019.
[22]
B. Urgun. Web Input Vector Extractor Teaser. https://github.com/bedirhan/wivet, 2014. Accessed: February 27, 2019.
[23]
S. Van Acker, D. Hausknecht, and A. Sabelfeld. Measuring login webpage security. In Proceedings of the Symposium on Applied Computing, pages 1753-1760, 2017.
[24]
w3af.org. w3af. http://w3af.org/, 2013. Accessed: January 25, 2019.
[25]
W3C. W3C Document Object Model. https://www.w3.org/DOM/, 2005. Accessed: January 22, 2019.
[26]
W3C. Document Object Model Events. https://www.w3.org/TR/2006/WD-DOM-Level-3-Events-20060413/events.html, 2006. Accessed: November 10, 2020.
[27]
W3C. UI Events. https://www.w3.org/TR/uievents/, 2016. Accessed: November 10, 2020.
[28]
W3C. Selectors Level 3. https://drafts.csswg.org/selectors-3/, 2018. Accessed: November 25, 2020.
[29]
A. Woodruff, P. M. Aoki, E. Brewer, P. Gauthier, and L. A. Rowe. An investigation of documents from the world wide web. Computer Networks and ISDN Systems, 28(7-11):963-980, 1996.
[30]
M. Zalewski, N. Heinen, and S. Roschke. Skipfish. https://code.google.com/archive/p/skipfish/, 2012. Accessed: January 25, 2019.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGAPP Applied Computing Review
ACM SIGAPP Applied Computing Review  Volume 21, Issue 1
March 2021
57 pages
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2021
Published in SIGAPP Volume 21, Issue 1

Check for updates

Author Tags

  1. dynamic analysis
  2. hybrid analysis
  3. modeling
  4. web applications
  5. web crawling

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)2
Reflects downloads up to 29 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media