Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

Understanding and Combating Memory Bloat in Managed Data-Intensive Systems

Published: 03 January 2018 Publication History

Abstract

The past decade has witnessed increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer’s choice for implementing such applications, due to its quick development cycle and rich suite of libraries and frameworks. While the use of such languages makes programming easier, their automated memory management comes at a cost. When the managed runtime meets large volumes of input data, memory bloat is significantly magnified and becomes a scalability-prohibiting bottleneck.
This article first studies, analytically and empirically, the impact of bloat on the performance and scalability of large-scale, real-world data-intensive systems. To combat bloat, we design a novel compiler framework, called Facade, that can generate highly efficient data manipulation code by automatically transforming the data path of an existing data-intensive application. The key treatment is that in the generated code, the number of runtime heap objects created for data classes in each thread is (almost) statically bounded, leading to significantly reduced memory management cost and improved scalability. We have implemented Facade and used it to transform seven common applications on three real-world, already well-optimized data processing frameworks: GraphChi, Hyracks, and GPS. Our experimental results are very positive: the generated programs have (1) achieved a 3% to 48% execution time reduction and an up to 88× GC time reduction, (2) consumed up to 50% less memory, and (3) scaled to much larger datasets.

References

[1]
Foto N. Afrati and Jeffrey D. Ullman. 2010. Optimizing joins in a map-reduce environment. In International Conference on Extending Database Technology (EDBT’10). 99--110.
[2]
Parag Agrawal, Daniel Kifer, and Christopher Olston. 2008. Scheduling shared scans of large data files. Proc. VLDB Endow. 1, 1 (2008), 958--969.
[3]
Alexander Aiken, Manuel Fähndrich, and Raph Levien. 1995. Better static memory management: Improving region-based analysis of higher-order languages. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’95). 174--185.
[4]
Erik Altman, Matthew Arnold, Stephen Fink, and Nick Mitchell. 2010. Performance analysis of idle programs. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’10). 739--753.
[5]
Apache 2014a. Apache Flink. Retrieved from http://flink.apache.org/.
[6]
Apache 2014b. Giraph: Open-source implementation of Pregel. Retrieved from http://incubator.apache.org/giraph/.
[7]
Apache 2014c. Hadoop: Open-source implementation of MapReduce. Retrieved from http://hadoop.apache.org.
[8]
Apache 2014d. The Hive Project. Retrieved from http://hive.apache.org/.
[9]
Apache 2014e. The Mahout Project. Retrieved from http://mahout.apache.org/.
[10]
Azul. 2014. Zing: Java for the real time business. Retrieved from http://www.azulsystems.com/products/zing/whatisit.
[11]
Godmar Back and Wilson C. Hsieh. 2005. The Kaffeos Java runtime system. ACM Trans. Program. Lang. Syst. (TOPLAS) 27, 4 (2005), 583--630.
[12]
Gaurav Banga, Peter Druschel, and Jeffrey C. Mogul. 1999. Resource containers: A new facility for resource management in server systems. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’99). 45--58.
[13]
William S. Beebee and Martin C. Rinard. 2001. An implementation of scoped memory for real-time Java. In International Conference on Embedded Software (EMSOFT’01). 289--305.
[14]
Stephen M. Blackburn and Kathryn S. McKinley. 2008. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08). 22--32.
[15]
B. Blanchet. 1999. Escape analysis for object-oriented languages. applications to Java. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’99). 20--34.
[16]
Vinayak R. Borkar, Michael J. Carey, Raman Grover, Nicola Onose, and Rares Vernica. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In International Conference on Data Engineering (ICDE’11). 1151--1162.
[17]
Chandrasekhar Boyapati, Alexandru Salcianu, William Beebee, Jr., and Martin Rinard. 2003. Ownership types for safe region-based memory management in real-time Java. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03). 324--337.
[18]
Yingyi Bu, Vinayak Borkar, Guoqing Xu, and Michael J. Carey. 2013. A bloat-aware design for big data applications. In ACM SIGNPLAN International Symposium on Memory Management (ISMM’13). 119--130.
[19]
Cascading. 2015. The Cascading Ecosystem. Retrieved from http://www.cascading.org.
[20]
Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. 2008. SCOPE: Easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. 1, 2 (2008), 1265--1276.
[21]
Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). 363--375.
[22]
Jong-Deok Choi, Manish Gupta, Mauricio Serrano, Vugranam C. Sreedhar, and Sam Midkiff. 1999. Escape analysis for Java. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’99). 1--19.
[23]
CMU. 2015. Out of memory error in efficient sharded positional indexer. Retrieved from http://www.cs.cmu.edu/∼lezhao/TA/2010/HW2/.
[24]
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. In USENIX Symposium on Networked Systems Design and Implementation (NSDI’10). 21--21.
[25]
Cplusplus. 2015. Why is Java more popular than C++. Retrieved from http://www.cplusplus.com/forum/general/79656/.
[26]
DataBricks. 2015. Project Tungsten. Retrieved from https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html.
[27]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113.
[28]
Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. 2010. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3 (2010), 515--529.
[29]
Julian Dolby and Andrew Chien. 2000. An automatic object inlining optimization and its evaluation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). 345--357.
[30]
Bruno Dufour, Barbara G. Ryder, and Gary Sevitsky. 2008. A scalable technique for characterizing the usage of temporaries in framework-intensive Java applications. In ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08). 59--70.
[31]
Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, and Shan Lu. 2015. Interruptible tasks: Treating memory pressure as interrupts for highly scalable data-parallel programs. In ACM Symposium on Operating Systems Principles (SOSP’15). 394--409.
[32]
Kathleen Fisher, Yitzhak Mandelbaum, and David Walker. 2006. The next 700 data description languages. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’06). 2--15.
[33]
Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley.
[34]
David Gay and Alex Aiken. 1998. Memory management with explicit regions. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’98). 313--323.
[35]
David Gay and Alex Aiken. 2001. Language support for regions. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’01). 70--80.
[36]
Lokesh Gidra, Gaël Thomas, Julien Sopena, Marc Shapiro, and Nhan Nguyen. 2015. NumaGiC: A garbage collector for big data on big NUMA machines. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 661--673.
[37]
Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Manuel Costa, Derek G. Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping out garbage collection from big data systems. In 15th USENIX Workshop on Hot Topics in Operating Systems.
[38]
Goetz Graefe. 1993. Query evaluation techniques for large databases. ACM Comput. Surv. 25, 2 (1993), 73--170.
[39]
Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. 2002. Region-based memory management in cyclone. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’02). 282--293.
[40]
Zhenyu Guo, Xuepeng Fan, Rishan Chen, Jiaxing Zhang, Hucheng Zhou, Sean McDirmid, Chang Liu, Wei Lin, Jingren Zhou, and Lidong Zhou. 2012. Spotting code optimizations in data-parallel pipelines through PeriSCOPE. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). 121--133.
[41]
Samuel Z. Guyer, Kathryn S. McKinley, and Daniel Frampton. 2006. Free-Me: A static analysis for automatic individual object reclamation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). 364--375.
[42]
Niels Hallenberg, Martin Elsman, and Mads Tofte. 2002. Combining region inference and garbage collection. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’02). 141--152.
[43]
Chris Hawblitzel and Thorsten von Eicken. 2002. Luna: A flexible Java protection system. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’02). 391--403.
[44]
Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. 2011. Starfish: A self-tuning system for big data analytics. In Conference on Innovative Data Systems Research (CIDR). 261--272.
[45]
Michael Hicks, Greg Morrisett, Dan Grossman, and Trevor Jim. 2004. Experience with safe manual memory-management in cyclone. In ACM SIGNPLAN International Symposium on Memory Management (ISMM’04). 73--84.
[46]
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed data-parallel programs from sequential building blocks. In European Conference on Computer Systems (EuroSys’07). 59--72.
[47]
Sumant Kowshik, Dinakar Dhurjati, and Vikram Adve. 2002. Ensuring code safety without runtime checks for real-time control systems. In International Conference on Architecture and Synthesis for Embedded Systems (CASES’02). 288--297.
[48]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media? In International World Wide Web Conference (WWW’10). 591--600.
[49]
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). 31--46.
[50]
Chris Lattner. 2005. Macroscopic Data Structure Analysis and Optimization. Ph.D. Dissertation. University of Illinois at Urbana-Champaign.
[51]
Chris Lattner and Vikram Adve. 2005. Automatic pool allocation: Improving performance by controlling data structure layout in the heap. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). 129--142.
[52]
Chris Lattner, Andrew Lenharth, and Vikram Adve. 2007. Making context-sensitive points-to analysis with heap cloning practical for the real world. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). 278--289.
[53]
Rubao Lee, Tian Luo, Yin Huai, Fusheng Wang, Yongqiang He, and Xiaodong Zhang. 2011. YSmart: Yet another SQL-to-MapReduce translator. In IEEE International Conference on Distributed Computing Systems (ICDCS’11). 25--36.
[54]
Ondřej Lhoták and Laurie Hendren. 2003. Scaling Java points-to analysis using SPARK. In International Conference on Compiler Construction (CC’03). 153--169.
[55]
Ondrej Lhotak and Laurie Hendren. 2005. Run-time evaluation of opportunities for object inlining in Java. Concurrency Comput. Practice Exper. 17, 5--6 (2005), 515--537.
[56]
Jun Liu, Nishkam Ravi, Srimat Chakradhar, and Mahmut Kandemir. 2012. Panacea: Towards holistic optimization of mapreduce applications. In International Symposium on Code Generation and Optimization (CGO’12). 33--43.
[57]
Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. 2015. Trash day: Coordinating garbage collection in distributed systems. In 5th USENIX Workshop on Hot Topics in Operating Systems.
[58]
Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. 2016. Holly: A multi-node language runtime system for coordinating distributed managed language applications. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16).
[59]
Henning Makholm. 2000. A region-based memory manager for prolog. In ACM SIGNPLAN International Symposium on Memory Management (ISMM’00). 25--34.
[60]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In ACM SIGMOD International Conference on Management of Data (SIGMOD’10). 135--146.
[61]
Yitzhak Mandelbaum, Kathleen Fisher, David Walker, Mary F. Fernández, and Artem Gleyzer. 2007. PADS/ML: A functional data description language. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’07). 77--83.
[62]
McGill. 2014. Soot framework. Retrieved from http://www.sable.mcgill.ca/soot/.
[63]
Nick Mitchell, Edith Schonberg, and Gary Sevitsky. 2009. Making sense of large heaps. In European Conference on Object-Oriented Programming (ECOOP’09). 77--97.
[64]
Nick Mitchell, Edith Schonberg, and Gary Sevitsky. 2010. Four trends leading to Java runtime bloat. IEEE Software 27, 1 (2010), 56--63.
[65]
Nick Mitchell and Gary Sevitsky. 2007. The causes of bloat, the limits of health. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’07). 245--260.
[66]
Nick Mitchell, Gary Sevitsky, and Harini Srinivasan. 2006. Modeling runtime behavior in framework-based applications. In European Conference on Object-Oriented Programming (ECOOP’06). 429--451.
[67]
Mozilla. 2014. The Rust programming language. Retrieved from http://www.rust-lang.org/.
[68]
Derek Gordon Murray, Michael Isard, and Yuan Yu. 2011. Steno: Automatic optimization of declarative queries. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). 121--131.
[69]
Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. Facade: A compiler and runtime for (almost) object-bounded big data applications. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 675--690.
[70]
Khanh Nguyen and Guoqing Xu. 2013. Cachetor: Detecting cacheable data to remove bloat. In ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’13). 268--278.
[71]
Tomasz Nykiel, Michalis Potamias, Chaitanya Mishra, George Kollios, and Nick Koudas. 2010. MRShare: Sharing across multiple queries in MapReduce. Proc. VLDB Endow. 3, 1--2 (2010), 494--505.
[72]
Christopher Olston, Benjamin Reed, Adam Silberstein, and Utkarsh Srivastava. 2008a. Automatic optimization of parallel dataflow programs. In USENIX USENIX Annual Technical Conference (ATC’08). 267--273.
[73]
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008b. Pig latin: A not-so-foreign language for data processing. In ACM SIGMOD International Conference on Management of Data (SIGMOD’08). 1099--1110.
[74]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web.Technical Report 1999-66. Stanford InfoLab. Retrieved from http://ilpubs.stanford.edu:8090/422/.
[75]
Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. 2005. Interpreting the data: Parallel analysis with Sawzall. Sci. Program. 13, 4 (2005), 277--298.
[76]
Quora. 2015. For Big Data, Java or C++. Retrieved from https://www.quora.com/For-big-data-Java-or-C++.
[77]
Semih Salihoglu and Jennifer Widom. 2013. GPS: A graph processing system. In Scientific and Statistical Database Management (SSDBM’13). 22:1--22:12.
[78]
Ajeet Shankar, Matthew Arnold, and Rastislav Bodik. 2008. JOLT: Lightweight dynamic analysis and removal of object churn. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’08). 127--142.
[79]
Yefim Shuf, Manish Gupta, Rajesh Bordawekar, and Jaswinder Pal Singh. 2002. Exploiting prolific types for memory management and optimizations. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’02). 295--306.
[80]
Spark User List. 2014. Help understanding - Not enough space to cache RDD. Retrieved from http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-td20186.html.
[81]
StackExchange. 2015. Choose C++ or Java for applications requiring huge amounts of RAM? Retrieved from http://programmers.stackexchange.com/questions/130108/choose-c-or-java-for-applications-requiring-huge-amounts-of-ram.
[82]
StackOverflow. 2015a. Out of memory error due to appending values to StringBuilder. Retrieved from http://stackoverflow.com/questions/12831076/.
[83]
StackOverflow. 2015b. Out of memory error due to large spill buffer. Retrieved from http://stackoverflow.com/questions/8464048/.
[84]
StackOverflow. 2015c. Out of memory error in a web parser. Retrieved from http://stackoverflow.com/questions/17707883/.
[85]
StackOverflow. 2015d. Out of memory error in building inverted index. Retrieved from http://stackoverflow.com/questions/17980491/.
[86]
StackOverflow. 2015e. Out of memory error in computing frequencies of attribute values. Retrieved from http://stackoverflow.com/questions/23042829/.
[87]
StackOverflow. 2015f. Out of memory error in customer review processing. Retrieved from http://stackoverflow.com/questions/20247185/.
[88]
StackOverflow. 2015g. Out of memory error in hash join using DistributedCache. Retrieved from http://stackoverflow.com/questions/15316539/.
[89]
StackOverflow. 2015h. Out of memory error in map-side aggregation. Retrieved from http://stackoverflow.com/questions/16684712/.
[90]
StackOverflow. 2015i. Out of memory error in matrix multiplication. Retrieved from http://stackoverflow.com/questions/16116022/.
[91]
StackOverflow. 2015j. Out of memory error in processing a text file as a record. Retrieved from http://stackoverflow.com/questions/12466527/.
[92]
StackOverflow. 2015k. Out of memory error in word cooccurrence matrix stripes builder. Retrieved from http://stackoverflow.com/questions/12831076/.
[93]
StackOverflow. 2015l. The performance comparison between in-mapper combiner and regular combiner. Retrieved from http://stackoverflow.com/questions/10925840/.
[94]
StackOverflow. 2015m. Reducer hang at the merge step. Retrieved from http://stackoverflow.com/questions/15541900/. (2015).
[95]
StackOverflow. 2015n. Spark worker insufficient memory. Retrieved from http://stackoverflow.com/questions/31830834/spark-worker-insufficient-memory.
[96]
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2, 2 (2009), 1626--1629.
[97]
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao Liu, and Raghotham Murthy. 2010. Hive - A petabyte scale data warehouse using Hadoop. In International Conference on Data Engineering (ICDE’10). 996--1005.
[98]
Mads Tofte and Jean-Pierre Talpin. 1994. Implementation of the typed call-by-value lamda-calculus using a stack of regions. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’94). 188--201.
[99]
Twitter. 2014. Storm: dstributed and fault-tolerant realtime computation. Retrieved from https://github.com/nathanmarz/storm.
[100]
UCI. 2014. Hyracks: A data parallel platform. Retrieved from http://code.google.com/p/hyracks/.
[101]
UCI. 2015a. Algebricks. Retrieved from https://code.google.com/p/hyracks/source/browse/#git%2Ffullstack%2Falgebricks.
[102]
UCI. 2015b. AsterixDB. Retrieved from https://code.google.com/p/asterixdb/wiki/AsterixAlphaRelease.
[103]
UCI. 2015c. Hivesterix. Retrieved from http://hyracks.org/projects/hivesterix/.
[104]
UCI. 2015d. Pregelix. Retrieved from http://hyracks.org/projects/pregelix/.
[105]
UCI. 2015e. VXQuery. Retrieved from http://incubator.apache.org/vxquery/.
[106]
Raja Vallée-Rai, Etienne Gagnon, Laurie Hendren, Patrick Lam, Patrice Pominville, and Vijay Sundaresan. 2000. Optimizing Java bytecode using the soot framework: Is it feasible? In International Conference on Compiler Construction (CC’00). 18--34.
[107]
Guoqing Xu. 2012. Finding reusable data structures. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’12). 1017--1034.
[108]
Guoqing Xu. 2013a. CoCo: Sound and adaptive replacement of Java collections. In European Conference on Object-Oriented Programming (ECOOP’13). 1--26.
[109]
Guoqing Xu. 2013b. Resurrector: A tunable object lifetime profiling technique for optimizing real-world programs. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’13). 111--130.
[110]
Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2010a. Finding low-utility data structures. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). 174--186.
[111]
Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, and Gary Sevitsky. 2009. Go with the flow: Profiling copies to find runtime bloat. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). 419--430.
[112]
Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2014. Scalable runtime bloat detection using abstract dynamic slicing. ACM Trans. Softw. Eng. Methodol. 23, 3, Article 23 (June 2014), 50 pages.
[113]
Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, and Gary Sevitsky. 2010b. Software bloat analysis: Finding, removing, and preventing performance problems in modern large-scale object-oriented applications. In ACM SIGSOFT FSE/SDP Working Conference on the Future of Software Engineering Research (FoSER’10). 421--426.
[114]
Guoqing Xu and Atanas Rountev. 2008. Precise memory leak detection for Java software using container profiling. In International Conference on Software Engineering (ICSE). 151--160.
[115]
Guoqing Xu and Atanas Rountev. 2010. Detecting inefficiently-used containers to avoid bloat. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). 160--173.
[116]
Guoqing Xu, Dacong Yan, and Atanas Rountev. 2012. Static detection of loop-invariant data structures. In European Conference on Object-Oriented Programming (ECOOP’12). 738--763.
[117]
Yahoo. 2014. Yahoo! Webscope program. Retrieved from http://webscope.sandbox.yahoo.com/.
[118]
Dacong Yan, Guoqing Xu, and Atanas Rountev. 2012. Uncovering performance problems in Java applications with reference propagation profiling. In International Conference on Software Engineering (ICSE). 134--144.
[119]
Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. 2007. Map-reduce-merge: Simplified relational data processing on large clusters. In ACM SIGMOD International Conference on Management of Data (SIGMOD’07). 1029--1040.
[120]
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. 2008. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). 1--14.
[121]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX Symposium on Networked Systems Design and Implementation (NSDI’12). USENIX Association, 2.
[122]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10). 10.
[123]
Nickolai Zeldovich, Silas Boyd-Wickizer, Eddie Kohler, and David Mazières. 2006. Making information flow explicit in hiStar. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). 263--278.
[124]
Jingren Zhou, Per-Åke Larson, and Ronnie Chaiken. 2010. Incorporating partitioning and parallel plans into the SCOPE optimizer. In International Conference on Data Engineering (ICDE’10). 1060--1071.

Cited By

View all
  • (2024)Towards Speedy Permission-Based Debloating for Android AppsProceedings of the IEEE/ACM 11th International Conference on Mobile Software Engineering and Systems10.1145/3647632.3651390(84-87)Online publication date: 14-Apr-2024
  • (2024)CLOUD-QM: a quality model for benchmarking cloud-based enterprise information systemsSoftware Quality Journal10.1007/s11219-024-09669-132:3(881-920)Online publication date: 1-Sep-2024
  • (2022)XDebloat: Towards Automated Feature-Oriented App DebloatingIEEE Transactions on Software Engineering10.1109/TSE.2021.312021348:11(4501-4520)Online publication date: 1-Nov-2022
  • Show More Cited By

Index Terms

  1. Understanding and Combating Memory Bloat in Managed Data-Intensive Systems

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Software Engineering and Methodology
      ACM Transactions on Software Engineering and Methodology  Volume 26, Issue 4
      October 2017
      128 pages
      ISSN:1049-331X
      EISSN:1557-7392
      DOI:10.1145/3177744
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 January 2018
      Accepted: 01 October 2017
      Revised: 01 October 2017
      Received: 01 July 2016
      Published in TOSEM Volume 26, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Big data
      2. managed languages
      3. memory management
      4. performance optimization

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)128
      • Downloads (Last 6 weeks)27
      Reflects downloads up to 13 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Towards Speedy Permission-Based Debloating for Android AppsProceedings of the IEEE/ACM 11th International Conference on Mobile Software Engineering and Systems10.1145/3647632.3651390(84-87)Online publication date: 14-Apr-2024
      • (2024)CLOUD-QM: a quality model for benchmarking cloud-based enterprise information systemsSoftware Quality Journal10.1007/s11219-024-09669-132:3(881-920)Online publication date: 1-Sep-2024
      • (2022)XDebloat: Towards Automated Feature-Oriented App DebloatingIEEE Transactions on Software Engineering10.1109/TSE.2021.312021348:11(4501-4520)Online publication date: 1-Nov-2022
      • (2020)Wearable Physical Activity Tracking Systems for Older Adults—A Systematic ReviewACM Transactions on Computing for Healthcare10.1145/34025231:4(1-37)Online publication date: 30-Sep-2020
      • (2018)Effective Program Debloating via Reinforcement LearningProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security10.1145/3243734.3243838(380-394)Online publication date: 15-Oct-2018

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media