research-article

Public Access

Understanding and Combating Memory Bloat in Managed Data-Intensive Systems

Authors:

Guoqing XuAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology (TOSEM), Volume 26, Issue 4

Article No.: 12, Pages 1 - 41

https://doi.org/10.1145/3162626

Published: 03 January 2018 Publication History

Abstract

The past decade has witnessed increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer’s choice for implementing such applications, due to its quick development cycle and rich suite of libraries and frameworks. While the use of such languages makes programming easier, their automated memory management comes at a cost. When the managed runtime meets large volumes of input data, memory bloat is significantly magnified and becomes a scalability-prohibiting bottleneck.

This article first studies, analytically and empirically, the impact of bloat on the performance and scalability of large-scale, real-world data-intensive systems. To combat bloat, we design a novel compiler framework, called Facade, that can generate highly efficient data manipulation code by automatically transforming the data path of an existing data-intensive application. The key treatment is that in the generated code, the number of runtime heap objects created for data classes in each thread is (almost) statically bounded, leading to significantly reduced memory management cost and improved scalability. We have implemented Facade and used it to transform seven common applications on three real-world, already well-optimized data processing frameworks: GraphChi, Hyracks, and GPS. Our experimental results are very positive: the generated programs have (1) achieved a 3% to 48% execution time reduction and an up to 88× GC time reduction, (2) consumed up to 50% less memory, and (3) scaled to much larger datasets.

References

[1]

Foto N. Afrati and Jeffrey D. Ullman. 2010. Optimizing joins in a map-reduce environment. In International Conference on Extending Database Technology (EDBT’10). 99--110.

Digital Library

[2]

Parag Agrawal, Daniel Kifer, and Christopher Olston. 2008. Scheduling shared scans of large data files. Proc. VLDB Endow. 1, 1 (2008), 958--969.

Digital Library

[3]

Alexander Aiken, Manuel Fähndrich, and Raph Levien. 1995. Better static memory management: Improving region-based analysis of higher-order languages. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’95). 174--185.

Digital Library

[4]

Erik Altman, Matthew Arnold, Stephen Fink, and Nick Mitchell. 2010. Performance analysis of idle programs. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’10). 739--753.

Digital Library

[5]

Apache 2014a. Apache Flink. Retrieved from http://flink.apache.org/.

[6]

Apache 2014b. Giraph: Open-source implementation of Pregel. Retrieved from http://incubator.apache.org/giraph/.

[7]

Apache 2014c. Hadoop: Open-source implementation of MapReduce. Retrieved from http://hadoop.apache.org.

[8]

Apache 2014d. The Hive Project. Retrieved from http://hive.apache.org/.

[9]

Apache 2014e. The Mahout Project. Retrieved from http://mahout.apache.org/.

[10]

Azul. 2014. Zing: Java for the real time business. Retrieved from http://www.azulsystems.com/products/zing/whatisit.

[11]

Godmar Back and Wilson C. Hsieh. 2005. The Kaffeos Java runtime system. ACM Trans. Program. Lang. Syst. (TOPLAS) 27, 4 (2005), 583--630.

Digital Library

[12]

Gaurav Banga, Peter Druschel, and Jeffrey C. Mogul. 1999. Resource containers: A new facility for resource management in server systems. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’99). 45--58.

Digital Library

[13]

William S. Beebee and Martin C. Rinard. 2001. An implementation of scoped memory for real-time Java. In International Conference on Embedded Software (EMSOFT’01). 289--305.

Digital Library

[14]

Stephen M. Blackburn and Kathryn S. McKinley. 2008. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08). 22--32.

Digital Library

[15]

B. Blanchet. 1999. Escape analysis for object-oriented languages. applications to Java. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’99). 20--34.

Digital Library

[16]

Vinayak R. Borkar, Michael J. Carey, Raman Grover, Nicola Onose, and Rares Vernica. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In International Conference on Data Engineering (ICDE’11). 1151--1162.

Digital Library

[17]

Chandrasekhar Boyapati, Alexandru Salcianu, William Beebee, Jr., and Martin Rinard. 2003. Ownership types for safe region-based memory management in real-time Java. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03). 324--337.

Digital Library

[18]

Yingyi Bu, Vinayak Borkar, Guoqing Xu, and Michael J. Carey. 2013. A bloat-aware design for big data applications. In ACM SIGNPLAN International Symposium on Memory Management (ISMM’13). 119--130.

Digital Library

[19]

Cascading. 2015. The Cascading Ecosystem. Retrieved from http://www.cascading.org.

[20]

Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. 2008. SCOPE: Easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. 1, 2 (2008), 1265--1276.

Digital Library

[21]

Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). 363--375.

Digital Library

[22]

Jong-Deok Choi, Manish Gupta, Mauricio Serrano, Vugranam C. Sreedhar, and Sam Midkiff. 1999. Escape analysis for Java. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’99). 1--19.

Digital Library

[23]

CMU. 2015. Out of memory error in efficient sharded positional indexer. Retrieved from http://www.cs.cmu.edu/&sim;lezhao/TA/2010/HW2/.

[24]

Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. In USENIX Symposium on Networked Systems Design and Implementation (NSDI’10). 21--21.

Digital Library

[25]

Cplusplus. 2015. Why is Java more popular than C++. Retrieved from http://www.cplusplus.com/forum/general/79656/.

[26]

DataBricks. 2015. Project Tungsten. Retrieved from https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html.

[27]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113.

Digital Library

[28]

Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. 2010. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3 (2010), 515--529.

Digital Library

[29]

Julian Dolby and Andrew Chien. 2000. An automatic object inlining optimization and its evaluation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). 345--357.

Digital Library

[30]

Bruno Dufour, Barbara G. Ryder, and Gary Sevitsky. 2008. A scalable technique for characterizing the usage of temporaries in framework-intensive Java applications. In ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08). 59--70.

Digital Library

[31]

Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, and Shan Lu. 2015. Interruptible tasks: Treating memory pressure as interrupts for highly scalable data-parallel programs. In ACM Symposium on Operating Systems Principles (SOSP’15). 394--409.

Digital Library

[32]

Kathleen Fisher, Yitzhak Mandelbaum, and David Walker. 2006. The next 700 data description languages. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’06). 2--15.

Digital Library

[33]

Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley.

Digital Library

[34]

David Gay and Alex Aiken. 1998. Memory management with explicit regions. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’98). 313--323.

Digital Library

[35]

David Gay and Alex Aiken. 2001. Language support for regions. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’01). 70--80.

Digital Library

[36]

Lokesh Gidra, Gaël Thomas, Julien Sopena, Marc Shapiro, and Nhan Nguyen. 2015. NumaGiC: A garbage collector for big data on big NUMA machines. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 661--673.

Digital Library

[37]

Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Manuel Costa, Derek G. Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping out garbage collection from big data systems. In 15th USENIX Workshop on Hot Topics in Operating Systems.

Digital Library

[38]

Goetz Graefe. 1993. Query evaluation techniques for large databases. ACM Comput. Surv. 25, 2 (1993), 73--170.

Digital Library

[39]

Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. 2002. Region-based memory management in cyclone. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’02). 282--293.

Digital Library

[40]

Zhenyu Guo, Xuepeng Fan, Rishan Chen, Jiaxing Zhang, Hucheng Zhou, Sean McDirmid, Chang Liu, Wei Lin, Jingren Zhou, and Lidong Zhou. 2012. Spotting code optimizations in data-parallel pipelines through PeriSCOPE. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). 121--133.

Digital Library

[41]

Samuel Z. Guyer, Kathryn S. McKinley, and Daniel Frampton. 2006. Free-Me: A static analysis for automatic individual object reclamation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). 364--375.

Digital Library

[42]

Niels Hallenberg, Martin Elsman, and Mads Tofte. 2002. Combining region inference and garbage collection. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’02). 141--152.

Digital Library

[43]

Chris Hawblitzel and Thorsten von Eicken. 2002. Luna: A flexible Java protection system. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’02). 391--403.

Digital Library

[44]

Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. 2011. Starfish: A self-tuning system for big data analytics. In Conference on Innovative Data Systems Research (CIDR). 261--272.

[45]

Michael Hicks, Greg Morrisett, Dan Grossman, and Trevor Jim. 2004. Experience with safe manual memory-management in cyclone. In ACM SIGNPLAN International Symposium on Memory Management (ISMM’04). 73--84.

Digital Library

[46]

Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed data-parallel programs from sequential building blocks. In European Conference on Computer Systems (EuroSys’07). 59--72.

Digital Library

[47]

Sumant Kowshik, Dinakar Dhurjati, and Vikram Adve. 2002. Ensuring code safety without runtime checks for real-time control systems. In International Conference on Architecture and Synthesis for Embedded Systems (CASES’02). 288--297.

Digital Library

[48]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is twitter, a social network or a news media? In International World Wide Web Conference (WWW’10). 591--600.

Digital Library

[49]

Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). 31--46.

Digital Library

[50]

Chris Lattner. 2005. Macroscopic Data Structure Analysis and Optimization. Ph.D. Dissertation. University of Illinois at Urbana-Champaign.

Digital Library

[51]

Chris Lattner and Vikram Adve. 2005. Automatic pool allocation: Improving performance by controlling data structure layout in the heap. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). 129--142.

Digital Library

[52]

Chris Lattner, Andrew Lenharth, and Vikram Adve. 2007. Making context-sensitive points-to analysis with heap cloning practical for the real world. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). 278--289.

Digital Library

[53]

Rubao Lee, Tian Luo, Yin Huai, Fusheng Wang, Yongqiang He, and Xiaodong Zhang. 2011. YSmart: Yet another SQL-to-MapReduce translator. In IEEE International Conference on Distributed Computing Systems (ICDCS’11). 25--36.

Digital Library

[54]

Ondřej Lhoták and Laurie Hendren. 2003. Scaling Java points-to analysis using SPARK. In International Conference on Compiler Construction (CC’03). 153--169.

Digital Library

[55]

Ondrej Lhotak and Laurie Hendren. 2005. Run-time evaluation of opportunities for object inlining in Java. Concurrency Comput. Practice Exper. 17, 5--6 (2005), 515--537.

Digital Library

[56]

Jun Liu, Nishkam Ravi, Srimat Chakradhar, and Mahmut Kandemir. 2012. Panacea: Towards holistic optimization of mapreduce applications. In International Symposium on Code Generation and Optimization (CGO’12). 33--43.

Digital Library

[57]

Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. 2015. Trash day: Coordinating garbage collection in distributed systems. In 5th USENIX Workshop on Hot Topics in Operating Systems.

Digital Library

[58]

Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. 2016. Holly: A multi-node language runtime system for coordinating distributed managed language applications. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16).

[59]

Henning Makholm. 2000. A region-based memory manager for prolog. In ACM SIGNPLAN International Symposium on Memory Management (ISMM’00). 25--34.

Digital Library

[60]

Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In ACM SIGMOD International Conference on Management of Data (SIGMOD’10). 135--146.

Digital Library

[61]

Yitzhak Mandelbaum, Kathleen Fisher, David Walker, Mary F. Fernández, and Artem Gleyzer. 2007. PADS/ML: A functional data description language. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’07). 77--83.

Digital Library

[62]

McGill. 2014. Soot framework. Retrieved from http://www.sable.mcgill.ca/soot/.

[63]

Nick Mitchell, Edith Schonberg, and Gary Sevitsky. 2009. Making sense of large heaps. In European Conference on Object-Oriented Programming (ECOOP’09). 77--97.

Digital Library

[64]

Nick Mitchell, Edith Schonberg, and Gary Sevitsky. 2010. Four trends leading to Java runtime bloat. IEEE Software 27, 1 (2010), 56--63.

Digital Library

[65]

Nick Mitchell and Gary Sevitsky. 2007. The causes of bloat, the limits of health. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’07). 245--260.

Digital Library

[66]

Nick Mitchell, Gary Sevitsky, and Harini Srinivasan. 2006. Modeling runtime behavior in framework-based applications. In European Conference on Object-Oriented Programming (ECOOP’06). 429--451.

Digital Library

[67]

Mozilla. 2014. The Rust programming language. Retrieved from http://www.rust-lang.org/.

[68]

Derek Gordon Murray, Michael Isard, and Yuan Yu. 2011. Steno: Automatic optimization of declarative queries. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). 121--131.

Digital Library

[69]

Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. Facade: A compiler and runtime for (almost) object-bounded big data applications. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 675--690.

Digital Library

[70]

Khanh Nguyen and Guoqing Xu. 2013. Cachetor: Detecting cacheable data to remove bloat. In ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’13). 268--278.

Digital Library

[71]

Tomasz Nykiel, Michalis Potamias, Chaitanya Mishra, George Kollios, and Nick Koudas. 2010. MRShare: Sharing across multiple queries in MapReduce. Proc. VLDB Endow. 3, 1--2 (2010), 494--505.

Digital Library

[72]

Christopher Olston, Benjamin Reed, Adam Silberstein, and Utkarsh Srivastava. 2008a. Automatic optimization of parallel dataflow programs. In USENIX USENIX Annual Technical Conference (ATC’08). 267--273.

Digital Library

[73]

Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008b. Pig latin: A not-so-foreign language for data processing. In ACM SIGMOD International Conference on Management of Data (SIGMOD’08). 1099--1110.

Digital Library

[74]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web.Technical Report 1999-66. Stanford InfoLab. Retrieved from http://ilpubs.stanford.edu:8090/422/.

[75]

Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. 2005. Interpreting the data: Parallel analysis with Sawzall. Sci. Program. 13, 4 (2005), 277--298.

Digital Library

[76]

Quora. 2015. For Big Data, Java or C++. Retrieved from https://www.quora.com/For-big-data-Java-or-C++.

[77]

Semih Salihoglu and Jennifer Widom. 2013. GPS: A graph processing system. In Scientific and Statistical Database Management (SSDBM’13). 22:1--22:12.

Digital Library

[78]

Ajeet Shankar, Matthew Arnold, and Rastislav Bodik. 2008. JOLT: Lightweight dynamic analysis and removal of object churn. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’08). 127--142.

Digital Library

[79]

Yefim Shuf, Manish Gupta, Rajesh Bordawekar, and Jaswinder Pal Singh. 2002. Exploiting prolific types for memory management and optimizations. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’02). 295--306.

Digital Library

[80]

Spark User List. 2014. Help understanding - Not enough space to cache RDD. Retrieved from http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-td20186.html.

[81]

StackExchange. 2015. Choose C++ or Java for applications requiring huge amounts of RAM? Retrieved from http://programmers.stackexchange.com/questions/130108/choose-c-or-java-for-applications-requiring-huge-amounts-of-ram.

[82]

StackOverflow. 2015a. Out of memory error due to appending values to StringBuilder. Retrieved from http://stackoverflow.com/questions/12831076/.

[83]

StackOverflow. 2015b. Out of memory error due to large spill buffer. Retrieved from http://stackoverflow.com/questions/8464048/.

[84]

StackOverflow. 2015c. Out of memory error in a web parser. Retrieved from http://stackoverflow.com/questions/17707883/.

[85]

StackOverflow. 2015d. Out of memory error in building inverted index. Retrieved from http://stackoverflow.com/questions/17980491/.

[86]

StackOverflow. 2015e. Out of memory error in computing frequencies of attribute values. Retrieved from http://stackoverflow.com/questions/23042829/.

[87]

StackOverflow. 2015f. Out of memory error in customer review processing. Retrieved from http://stackoverflow.com/questions/20247185/.

[88]

StackOverflow. 2015g. Out of memory error in hash join using DistributedCache. Retrieved from http://stackoverflow.com/questions/15316539/.

[89]

StackOverflow. 2015h. Out of memory error in map-side aggregation. Retrieved from http://stackoverflow.com/questions/16684712/.

[90]

StackOverflow. 2015i. Out of memory error in matrix multiplication. Retrieved from http://stackoverflow.com/questions/16116022/.

[91]

StackOverflow. 2015j. Out of memory error in processing a text file as a record. Retrieved from http://stackoverflow.com/questions/12466527/.

[92]

StackOverflow. 2015k. Out of memory error in word cooccurrence matrix stripes builder. Retrieved from http://stackoverflow.com/questions/12831076/.

[93]

StackOverflow. 2015l. The performance comparison between in-mapper combiner and regular combiner. Retrieved from http://stackoverflow.com/questions/10925840/.

[94]

StackOverflow. 2015m. Reducer hang at the merge step. Retrieved from http://stackoverflow.com/questions/15541900/. (2015).

[95]

StackOverflow. 2015n. Spark worker insufficient memory. Retrieved from http://stackoverflow.com/questions/31830834/spark-worker-insufficient-memory.

[96]

Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2, 2 (2009), 1626--1629.

Digital Library

[97]

Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao Liu, and Raghotham Murthy. 2010. Hive - A petabyte scale data warehouse using Hadoop. In International Conference on Data Engineering (ICDE’10). 996--1005.

[98]

Mads Tofte and Jean-Pierre Talpin. 1994. Implementation of the typed call-by-value lamda-calculus using a stack of regions. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’94). 188--201.

Digital Library

[99]

Twitter. 2014. Storm: dstributed and fault-tolerant realtime computation. Retrieved from https://github.com/nathanmarz/storm.

[100]

UCI. 2014. Hyracks: A data parallel platform. Retrieved from http://code.google.com/p/hyracks/.

[101]

UCI. 2015a. Algebricks. Retrieved from https://code.google.com/p/hyracks/source/browse/#git%2Ffullstack%2Falgebricks.

[102]

UCI. 2015b. AsterixDB. Retrieved from https://code.google.com/p/asterixdb/wiki/AsterixAlphaRelease.

[103]

UCI. 2015c. Hivesterix. Retrieved from http://hyracks.org/projects/hivesterix/.

[104]

UCI. 2015d. Pregelix. Retrieved from http://hyracks.org/projects/pregelix/.

[105]

UCI. 2015e. VXQuery. Retrieved from http://incubator.apache.org/vxquery/.

[106]

Raja Vallée-Rai, Etienne Gagnon, Laurie Hendren, Patrick Lam, Patrice Pominville, and Vijay Sundaresan. 2000. Optimizing Java bytecode using the soot framework: Is it feasible? In International Conference on Compiler Construction (CC’00). 18--34.

Digital Library

[107]

Guoqing Xu. 2012. Finding reusable data structures. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’12). 1017--1034.

Digital Library

[108]

Guoqing Xu. 2013a. CoCo: Sound and adaptive replacement of Java collections. In European Conference on Object-Oriented Programming (ECOOP’13). 1--26.

Digital Library

[109]

Guoqing Xu. 2013b. Resurrector: A tunable object lifetime profiling technique for optimizing real-world programs. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’13). 111--130.

Digital Library

[110]

Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2010a. Finding low-utility data structures. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). 174--186.

Digital Library

[111]

Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, and Gary Sevitsky. 2009. Go with the flow: Profiling copies to find runtime bloat. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). 419--430.

Digital Library

[112]

Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2014. Scalable runtime bloat detection using abstract dynamic slicing. ACM Trans. Softw. Eng. Methodol. 23, 3, Article 23 (June 2014), 50 pages.

Digital Library

[113]

Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, and Gary Sevitsky. 2010b. Software bloat analysis: Finding, removing, and preventing performance problems in modern large-scale object-oriented applications. In ACM SIGSOFT FSE/SDP Working Conference on the Future of Software Engineering Research (FoSER’10). 421--426.

Digital Library

[114]

Guoqing Xu and Atanas Rountev. 2008. Precise memory leak detection for Java software using container profiling. In International Conference on Software Engineering (ICSE). 151--160.

Digital Library

[115]

Guoqing Xu and Atanas Rountev. 2010. Detecting inefficiently-used containers to avoid bloat. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). 160--173.

Digital Library

[116]

Guoqing Xu, Dacong Yan, and Atanas Rountev. 2012. Static detection of loop-invariant data structures. In European Conference on Object-Oriented Programming (ECOOP’12). 738--763.

Digital Library

[117]

Yahoo. 2014. Yahoo&excl; Webscope program. Retrieved from http://webscope.sandbox.yahoo.com/.

[118]

Dacong Yan, Guoqing Xu, and Atanas Rountev. 2012. Uncovering performance problems in Java applications with reference propagation profiling. In International Conference on Software Engineering (ICSE). 134--144.

Digital Library

[119]

Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. 2007. Map-reduce-merge: Simplified relational data processing on large clusters. In ACM SIGMOD International Conference on Management of Data (SIGMOD’07). 1029--1040.

Digital Library

[120]

Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. 2008. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). 1--14.

Digital Library

[121]

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX Symposium on Networked Systems Design and Implementation (NSDI’12). USENIX Association, 2.

Digital Library

[122]

Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10). 10.

Digital Library

[123]

Nickolai Zeldovich, Silas Boyd-Wickizer, Eddie Kohler, and David Mazières. 2006. Making information flow explicit in hiStar. In USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). 263--278.

Digital Library

[124]

Jingren Zhou, Per-Åke Larson, and Ronnie Chaiken. 2010. Incorporating partitioning and parallel plans into the SCOPE optimizer. In International Conference on Data Engineering (ICDE’10). 1060--1071.

Cited By

Thung FLiu JRattanukul PMaoz SToch EGao DLo DRubin JLam WCatolino GPoshyvanyk D(2024)Towards Speedy Permission-Based Debloating for Android AppsProceedings of the IEEE/ACM 11th International Conference on Mobile Software Engineering and Systems10.1145/3647632.3651390(84-87)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3647632.3651390
Şener UGökalp EEren P(2024)CLOUD-QM: a quality model for benchmarking cloud-based enterprise information systemsSoftware Quality Journal10.1007/s11219-024-09669-132:3(881-920)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s11219-024-09669-1
Tang YZhou HLuo XChen TWang HXu ZCai Y(2022)XDebloat: Towards Automated Feature-Oriented App DebloatingIEEE Transactions on Software Engineering10.1109/TSE.2021.312021348:11(4501-4520)Online publication date: 1-Nov-2022
https://doi.org/10.1109/TSE.2021.3120213
Show More Cited By

Index Terms

Understanding and Combating Memory Bloat in Managed Data-Intensive Systems
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
      2. Source code generation

Recommendations

FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications
ASPLOS '15

The past decade has witnessed the increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer's choice for ...
FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications
ASPLOS'15

The past decade has witnessed the increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer's choice for ...
FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

The past decade has witnessed the increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer's choice for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 26, Issue 4

October 2017

128 pages

ISSN:1049-331X

EISSN:1557-7392

DOI:10.1145/3177744

Editor:
David S. Rosenblum
National University of Singapore, Singapore

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 January 2018

Accepted: 01 October 2017

Revised: 01 October 2017

Received: 01 July 2016

Published in TOSEM Volume 26, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Office of Naval Research
National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
850
Total Downloads

Downloads (Last 12 months)128
Downloads (Last 6 weeks)27

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Thung FLiu JRattanukul PMaoz SToch EGao DLo DRubin JLam WCatolino GPoshyvanyk D(2024)Towards Speedy Permission-Based Debloating for Android AppsProceedings of the IEEE/ACM 11th International Conference on Mobile Software Engineering and Systems10.1145/3647632.3651390(84-87)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3647632.3651390
Şener UGökalp EEren P(2024)CLOUD-QM: a quality model for benchmarking cloud-based enterprise information systemsSoftware Quality Journal10.1007/s11219-024-09669-132:3(881-920)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s11219-024-09669-1
Tang YZhou HLuo XChen TWang HXu ZCai Y(2022)XDebloat: Towards Automated Feature-Oriented App DebloatingIEEE Transactions on Software Engineering10.1109/TSE.2021.312021348:11(4501-4520)Online publication date: 1-Nov-2022
https://doi.org/10.1109/TSE.2021.3120213
Vargemidis DGerling KSpiel KAbeele VGeurts L(2020)Wearable Physical Activity Tracking Systems for Older Adults—A Systematic ReviewACM Transactions on Computing for Healthcare10.1145/34025231:4(1-37)Online publication date: 30-Sep-2020
https://dl.acm.org/doi/10.1145/3402523
Heo KLee WPashakhanloo PNaik MLie DMannan MBackes MWang X(2018)Effective Program Debloating via Reinforcement LearningProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security10.1145/3243734.3243838(380-394)Online publication date: 15-Oct-2018
https://dl.acm.org/doi/10.1145/3243734.3243838

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents