Abstract
Web automation applications are widely used for different purposes such as B2B integration and automated testing of web applications. Most current systems build the automatic web navigation component by using the APIs of conventional browsers. This approach suffers performance problems for intensive web automation tasks which require real time responses and/or a high degree of parallelism. Other systems use the approach of creating custom browsers to avoid some of the tasks of conventional browsers, but they work like them, when building the internal representation of the web pages. In this paper, we present a complete architecture for a custom browser able to efficiently execute web navigation sequences. The proposed architecture supports some novel automatic optimization techniques that can be applied when loading and building the internal representation of the pages. The tests performed using real web sources show that the reference implementation of the proposed architecture runs significantly faster than other navigation components.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alexa. The Web Information Company. http://www.alexa.com
Anupam, V., Freire, J., Kumar, B., Lieuwen, D.: Automating web navigation with the WebVCR. Comput. Netw. 33(1–6), 503–517 (2000)
Cascaval, C., Fowler, S., Montesinos-Ortego, P., Piekarski, W., Reshadi, M., Robatmili, B., Weber, M., Bhavsar, V.: ZOOMM: a parallel web browser engine for multicore mobile devices. In: Proceedings of the 18th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP 2013). ACM, New York, NY, USA, pp. 271–280 (2003)
Document Object Model (DOM). http://www.w3.org/DOM/
EnvJS. http://www.envjs.com/
Grosskurth, A., Godfrey, M.W.: A reference architecture for web browsers. In: ICSM 2005: Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM 2005). pp. 661–664 (September 2005)
Mai, H., Tang, S., King, S.T., Cascaval, C., Montesinos, P.: A case for parallelizing web pages. In: Proceedings of the 4th USENIX Conference on Hot Topics in Parallelism, HotPar 2012, Berkeley, CA, USA. USENIX Association (June 2012)
HtmlUnit. http://htmlunit.sourceforge.net/
Hupp, D., Miller, R.C.: Smart Bookmarks: automatic retroactive macro recording on the web. In: Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology, pp. 81–90. ACM New York, Newport (2007)
Jaunt. Java Web Scraping and Automation. http://jaunt-api.com
Kapow. http://kapowsoftware.com/
Losada, J., Raposo, J., Pan, A., Montoto, P.: Efficient execution of web navigation sequences. World Wide Web J. doi:10.1007/s11280-013-0259-8. ISSN 1386-145X
Losada, J., Raposo, J., Pan, A., Montoto, P., Álvarez, M.: Optimization techniques to speed up the page loading in custom web browsers. Manuscript accepted for publication in ICEBE 2015. Beijing, China (23–25 October 2015)
Mozilla HTML5 Parser. https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/HTML5/HTML5_Parser
Pan, A., Raposo, J., Álvarez, M., Hidalgo, J., Viña, A.: Semiautomatic wrapper generation for commercial web sources. In: IFIP WG8.1 Working Conference on Engineering Information Systems in the Internet Context, pp. 265–283. Kluwer, B.V. Deventer, Japan (2002)
Safonov, A., Konstan, J., Carlis, J.: Beyond hard-to-reach pages: interactive, parametric web macros. In: 7th Conference on Human Factors and the Web. Madison (2001)
Selenium. http://seleniumhq.org
HTML5. https://html.spec.whatwg.org
XML Path Language (XPath). http://www.w3.org/TR/xpath
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Losada, J., Raposo, J., Pan, A., Montoto, P., Álvarez, M. (2015). A Custom Browser Architecture to Execute Web Navigation Sequences. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9419. Springer, Cham. https://doi.org/10.1007/978-3-319-26187-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-26187-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26186-7
Online ISBN: 978-3-319-26187-4
eBook Packages: Computer ScienceComputer Science (R0)