Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/988672.988743acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Characterization of a large web site population with implications for content delivery

Published: 17 May 2004 Publication History

Abstract

This paper presents a systematic study of the properties of a large number of Web sites hosted by a major ISP. To our knowledge, ours is the first comprehensive study of a large server farm that contains thousands of commercial Web sites. We also perform a simulation analysis to estimate potential performance benefits of content delivery networks (CDNs) for these Web sites. We make several interesting observations about the current usage of Web technologies and Web site performance characteristics. First, compared with previous client workload studies, the Web server farm workload contains a much higher degree of uncacheable responses and responses that require mandatory cache validations. A significant reason for this is that cookie use is prevalent among our population, especially among more popular sites. However, we found an indication of wide-spread indiscriminate usage of cookies, which unnecessarily impedes the use of many content delivery optimizations. We also found that most Web sites do not utilize the cache-control features ofthe HTTP 1.1 protocol, resulting in suboptimal performance. Moreover, the implicit expiration time in client caches for responses is constrained by the maximum values allowed in the Squid proxy. Finally, our simulation results indicate that most Web sites benefit from the use of a CDN. The amount of the benefit depends on site popularity, and, somewhat surprisingly, a CDN may increase the peak to average request ratio at the origin server because the CDN can decrease the average request rate more than the peak request rate.

References

[1]
The squid Web proxy cache. version 2.5. http://www.squid-cache.org.
[2]
M. Arlitt, R. Friedrich, and T. Jin. Workload characterization of a Web proxy in a cable modem environment. Technical Report HPL-1999-48, Hewlett Packard Labs, Apr. 1999.
[3]
M. Arlitt and T. Jin. Workload characterization of the 1998 World Cup Web site. Technical Report HPL-1999-35R1, HP Labs, Oct. 1999.
[4]
M. F. Arlitt and C. L. Williamson. Web server workload characterization: The search for invariants. In Proc. of ACM SIGMETRICS, pages 126--137, 1996.
[5]
P. Barford, A. Bestavros, A. Bradley, and M. Crovella. Changes in Web client access patterns: characteristics and caching implications. World Wide Web, 2:15--28, 1999.
[6]
B. E. Brewington and G. Cybenko. How dynamic is the Web? In Proc. of the 9th Int. World Wide Web Conference, 2000.
[7]
L. Cherkasova and M. Karlsson. Dynamics and evolution of Web sites: Analysis, metrics and design issues. Technical Report HPL-2001-1R1, Hewlett Packard Laboratories, July 16 2001.
[8]
C. Cranor, T. Johnson, and O. Spatscheck. Gigascope: a stream database for network applications. In Proc. of ACM SIGMOD, June 2003.
[9]
F. Douglis, A. Feldmann, B. Krishnamurthy, and J. Mogul. Rate of change and other metrics: A live study of the World Wide Web. In Proc. of the USENIX Symp. on Internet Technologies and Systems, pages 147--158, Dec. 1997.
[10]
B. Duska, D. Marwood, and M. J. Feeley. The measured access characteristics of World Wide Web client proxy caches. In Proc. of the First USENIX Symp. on Internet Technologies and Systems, pages 23--36, Dec. 1997.
[11]
A. Feldmann, R. Caceres, F. Douglis, G. Glass, and M. Rabinovich. Performance of Web proxy caching in heterogeneous bandwidth environments. In Proc. of IEEE INFOCOM, pages 107--116, 1999.
[12]
S. D. Gribble and E. A. Brewer. System design issues for Internet middleware services: Deductions from a large client trace. In Proc. of the First USENIX Symp. on Internet Technologies and Systems, pages 207--218, Dec. 1997.
[13]
A. K. Iyengar, M. S. Squillante, and L. Zhang. Analysis and characterization of large-scale Web server access patterns and performance. World Wide Web, 2(1-2):85--100, June 1999.
[14]
Y. Jung, B. Krishnamurthy, and M. Rabinovich. Flash crowds and denial of service attacks: Characterization and implications for CDNs and web sites. In Proc. of the 11th Int. World Wide Web Conference, May 2002.
[15]
T. Kelly. Thin-client Web access patterns: measurements from a cache-busting proxy. In Proc. of the Int. Workshop on Web Content Caching and Distribution, 2001.
[16]
B. Krishnamurthy and M. Arlitt. PRO-COW: Protocol compliance on the Web: A longitudinal study. In Proc. of the 3rd USENIX Symp. on Internet Technologies and Systems, pages 109--122, 2001.
[17]
B. Krishnamurthy and J. Wang. On network-aware clustering of Web clients. In Proc. of ACM SIGCOMM, Aug. 2000.
[18]
B. Krishnamurthy, C. Wills, and Y. Zhang. On the use and performance of content distribution networks. In Proc. of the First ACM SIGCOMM Internet Measurement Workshop, pages 169--182, Nov. 2001.
[19]
B. Krishnamurthy and C. E. Wills. Analyzing factors that influence end-to-end Web performance. Computer Networks, 33(1--6):17--32, 2000.
[20]
S. Manley and M. Seltzer. Web facts and fantasy. In Proc. of the USENIX Symp. on Internet Technologies and Systems, pages 125--133, Dec. 1997.
[21]
J. C. Mogul. Network behavior of a busy Web server and its clients. Technical Report 95/5, Compaq Western Research Lab, Oct. 1995.
[22]
J. C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta encoding and data compression for HTTP. In Proc. of ACM SIGCOMM, pages 181--194, 1997.
[23]
V. N. Padmanabhan and L. Qiu. The content and access dynamics of a busy Web site: Findings and implications. In Proc. of ACM SIGCOMM, Aug. 2000.
[24]
J. E. Pitkow. Summary of WWWcharacterizations. World Wide Web, 2:3--13, June 1999.
[25]
M. S. Raunak, P. J. Shenoy, P. Goyal, and K. Ramamritham. Implications of proxy caching for provisioning networks and servers. In Proc. of ACM SIGMETRICS, pages 66--77, 2000.
[26]
C. E. Wills and M. Mikhailov. Examining the cacheability of user-requested Web resources. In Proc. of the Fourth Int. Workshop on Web Content Caching and Distribution, Apr. 1999.
[27]
A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, M. Brown, T. Landray, D. Pinnel, A. Karlin, and H. Levy. Organization-based analysis of Web-object sharing and caching. In Proc. of the USENIX Symp. on Internet Technologies and Systems, 1999.
[28]
A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, A. Karlin, and H. M. Levy. On the scale and performance of cooperative Web proxy caching. In Proc. of ACM SOSP, pages 16--31, Dec. 1999.

Cited By

View all
  • (2015)Investigating structure of modern web traffic2015 IEEE 16th International Conference on High Performance Switching and Routing (HPSR)10.1109/HPSR.2015.7483102(1-8)Online publication date: Jul-2015
  • (2014)Virtual machine consolidation in the wildProceedings of the 15th International Middleware Conference10.1145/2663165.2663316(313-324)Online publication date: 8-Dec-2014
  • (2013)Power saving for web servers using proxies2013 Sustainable Internet and ICT for Sustainability (SustainIT)10.1109/SustainIT.2013.6685209(1-5)Online publication date: Oct-2013
  • Show More Cited By

Index Terms

  1. Characterization of a large web site population with implications for content delivery

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '04: Proceedings of the 13th international conference on World Wide Web
      May 2004
      754 pages
      ISBN:158113844X
      DOI:10.1145/988672
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 May 2004

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. content distribution
      2. cookie
      3. http
      4. measurement
      5. performance
      6. web caching
      7. workload characterization

      Qualifiers

      • Article

      Conference

      WWW04
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 18 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Investigating structure of modern web traffic2015 IEEE 16th International Conference on High Performance Switching and Routing (HPSR)10.1109/HPSR.2015.7483102(1-8)Online publication date: Jul-2015
      • (2014)Virtual machine consolidation in the wildProceedings of the 15th International Middleware Conference10.1145/2663165.2663316(313-324)Online publication date: 8-Dec-2014
      • (2013)Power saving for web servers using proxies2013 Sustainable Internet and ICT for Sustainability (SustainIT)10.1109/SustainIT.2013.6685209(1-5)Online publication date: Oct-2013
      • (2012)Workload Characterization and Performance Implications of Large-Scale Blog ServersACM Transactions on the Web10.1145/2382616.23826196:4(1-26)Online publication date: 1-Nov-2012
      • (2012)A Novel Intermediary Framework for Dynamic Edge Service CompositionJournal of Computer Science and Technology10.1007/s11390-012-1223-227:2(281-297)Online publication date: 5-Mar-2012
      • (2011)On traffic locality and QoE in hybrid CDN-P2P networksProceedings of the 44th Annual Simulation Symposium10.5555/2048370.2048394(175-182)Online publication date: 3-Apr-2011
      • (2011)An up-to-date survey in web load balancingWorld Wide Web10.1007/s11280-010-0101-514:2(105-131)Online publication date: 1-Mar-2011
      • (2010)CDNsimACM Transactions on Modeling and Computer Simulation10.1145/1734222.173422620:2(1-40)Online publication date: 7-May-2010
      • (2010)Dual-QuorumIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2008.367:2(159-174)Online publication date: 1-Apr-2010
      • (2010)An automatic HTTP cookie management systemComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2010.03.00654:13(2182-2198)Online publication date: 1-Sep-2010
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media