Abstract
The study of sublinear algorithms is a recent development in theoretical computer science and discrete mathematics that has significant potential to provide scalable analysis algorithms for massive data. The approaches of sublinear algorithms address the fundamental mathematical problem of understanding global features of a data set using limited resources. However, much of the work in this area is theoretical in nature and has yet to be applied to practical problems. This chapter provides background on sublinear algorithms, and then surveys a series of recent successes in the sublinear analysis of large-scale graphs and the robust generation of color maps for visualization of large physics simulation data. We end the chapter with a discussion of potential research directions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abbasi, H., Eisenhauer, G., Wolf, M., Schwan, K., Klasky, S.: Just in time: adding value to the IO pipelines of high performance applications with JIT staging. In: Proc. of 20th International Symposium on High Performance Distributed Computing (HPDC’11) (2011)
Arifuzzaman, S.M., Khan, M., Marathe, M.: PATRIC: A parallel algorithm for counting triangles and computing clustering coefficients in massive networks. NDSSL Technical Report 12-042, Network Dynamics and Simulation Science Laboratory, Virginia Polytechnic Institute and State University (2012)
Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: KDD’08, pp. 16–24 (2008)
Bennett, J., Krishnamoorthy, V., Liu, S., Grout, R., Hawkes, E.R., Chen, J.H., Shepherd, J., Pascucci, V., Bremer, P.-T.: Feature-based statistical analysis of combustion simulation data. IEEE Trans. Vis. Comput. Graph. 17(12), 1822–1831 (2011)
Bennett, J.C., Abbasi, H., Bremer, P.-T., Grout, R., Gyulassy, A., Jin, T., Klasky, S., Kolla, H., Parashar, M., Pascucci, V., Pebay, P., Thompson, D., Yu, H., Zhang, F., Chen, J.: Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In: Hollingsworth, J. (ed.) SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Salt Lake Convention Center, Salt Lake City, 10–16 November 2012, pp. 49:1–49:9, pub-IEEE:adr. IEEE Computer Society Press (2012)
Berry, J., Fostvedt, L., Nordman, D., Phillips, C., Seshadhri, C., Wilson, A.: Why do simple algorithms for triangle enumeration work in the real world? In: Innovations in Theoretical Computer Science (ITCS) (2014)
Boldi, P., Vigna, S.: The web graph framework I: compression techniques. In: Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), Manhattan,, pp. 595–601. ACM Press, New York (2004)
Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th International Conference on World Wide Web. ACM Press, Madrid (2011)
Buriol, L., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., Sohler, C.: Counting triangles in data streams. In: PODS’06, pp. 253–262 (2006)
Burt, R.S.: Structural holes and good ideas. Am. J. Sociol. 110(2), 349–399 (2004)
Chen, J.H., Choudhary, A., de Supinski, B., DeVries, M., Hawkes, E.R., Klasky, S., Liao, W.K., Ma, K.L., Mellor-Crummey, J., Podhorski, N., Sankaran, R., Shende, S., Yoo, C.S.: Terascale direct numerical simulations of turbulent combustion using s3d. Comput. Sci. Discov. 2, 1–31 (2009)
Cohen, J.: Graph twiddling in a MapReduce world. Comput. Sci. Eng. 11, 29–41 (2009)
Coleman, J.S.: Social capital in the creation of human capital. Am. J. Sociol. 94, S95–S120 (1988)
Dubhashi, D., Panconesi, A.: Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, New York (2009)
Eckmann, J.-P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the World Wide Web. PNAS 99(9), 5825–5829 (2002)
Fabian, N., Moreland, K., Thompson, D., Bauer, A.C., Marion, P., Gevecik, B., Rasquin, M., Jansen, K.E.: The paraview coprocessing library: a scalable, general purpose in situ visualization library. In: Proc. of IEEE Symposium on Large Data Analysis and Visualization (LDAV), pp. 89–96 (2011)
Fagiolo, G.: Clustering in complex directed networks. Phys. Rev. E 76, 026107 (2007)
Favre, J.M., Whitlock, B., Meredith, J.S.: Parallel in situ coupling of simulation with a fully featured visualization system. In: Proc. of 11th Eurographics Symposium on Parallel Graphics and Visualization (EGPGV’11) (2011)
Fischer, E.: The art of uninformed decisions: a primer to property testing. Bull. EATCS 75, 97–126 (2001)
Foucault Welles, B., Van Devender, A., Contractor, N.: Is a friend a friend?: Investigating the structure of friendship networks in virtual worlds. In: CHI-EA’10, pp. 4027–4032 (2010)
Goldreich, O., Ron, D.: On testing expansion in bounded-degree graphs. ECCC, TR00-020 (2000)
Gonen, M., Ron, D., Shavitt, Y.: Counting stars and other small subgraphs in sublinear time. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 99–116 (2010)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Holland, P.W., Leinhardt, S.: A method for detecting structure in sociometric data. Am. J. Sociol. 76, 492–513 (1970)
Kolda, T.G., Pinar, A., Plantenga, T., Seshadhri, C., Task, C.: Counting triangles in massive graphs with mapreduce. Technical Report 1301.5887, arxiv (2013)
Kolountzakis, M., Miller, G., Peng, R., Tsourakakis, C.: Efficient triangle counting in large graphs via degree-based vertex partitioning. In: WAW’10 (2010)
Mascarenhas, A., Grout, R.W., Bremer, P.-T., Hawkes, E.R., Pascucci, V., Chen, J.H.: Topological feature extraction for comparison of terascale combustion simulation data. In: Pascucci, V., Tricoche, X., Hagen, H., Tierny, J. (eds.) Topological Methods in Data Analysis and Visualization. Mathematics and Visualization, pp. 229–240. Springer, Berlin/Heidelberg (2011)
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: Simple building blocks of complex networks. Science 298(5594), 824–827 (2011)
Ron, D.: Algorithmic and analysis techniques in property testing. Found. Trends Theor. Comput. Sci. 5(2), 73–205 (2009)
Rubinfeld, R.: Sublinear time algorithms. In: International Conference of Mathematicians (2006)
Rubinfeld, R., Sudan, M.: Robust characterization of polynomials with applications to program testing. SIAM J. Comput. 25, 647–668 (1996)
Schank, T., Wagner, D.: Approximating clustering coefficient and transitivity. J. Graph Algorithms Appl. 9, 265–275 (2005)
Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: Experimental and Efficient Algorithms, pp. 606–609. Springer, Berlin (2005)
Seshadhri, C., Pinar, A., Durak, N., Kolda, T.G.: Directed closure measures for networks with reciprocity. arXiv:1302.6220 (2013)
Seshadhri, C., Pinar, A., Kolda, T.G.: Triadic measures on graphs: the power of wedge sampling. In: Proceedings of the SIAM Conference on Data Mining (SDM) (2013)
Seshadhri, C., Pinar, A., Kolda, T.G.: Wedge sampling for computing clustering coefficients and triangle counts on large graphs. arXiv:1309.3321 (2013)
Son, S., Kang, A., Kim, H., Kwon, T., Park, J., Kim, H.: Analysis of context dependence in social interaction networks of a massively multiplayer online role-playing game. PLoS ONE 7(4), e33918 (2012)
Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: WWW’11, pp. 607–614 (2011)
Szell, M., Thurner, S.: Measuring social dynamics in a massive multiplayer online game. Soc. Netw. 32, 313–329 (2010)
Szell, M., Lambiotte, R., Thurner, S.: Multirelational organization of large-scale social networks in an online world. Proc. Natl. Acad. Sci 107, 13636–13641 (2010)
Thompson, D., Bennett, J., Seshadhri, C., Pinar, A.: A provably-robust sampling method for generating colormaps of large data. In: Proceedings of the IEEE Symposium on Large Data Analysis and Visualization (LDAV), Atlanta (2013)
Tsourakakis, C.E.: Fast counting of triangles in large real networks, without counting: algorithms and laws. In: ICDM 2008, pp. 608–617 (2008)
Tsourakakis, C., Drineas, P., Michelakis, E., Koutis, I., Faloutsos, C.: Spectral counting of triangles in power-law networks via element-wise sparsification. In: ASONAM’09, pp. 66–71 (2009)
Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: counting triangles in massive graphs with a coin. In: KDD ’09, pp. 837–846 (2009)
Tsourakakis, C., Kolountzakis, M.N., Miller, G.: Triangle sparsifiers. J. Graph Algorithms Appl. 15, 703–726 (2011)
Vishwanath, V., Hereld, M., Papka, M.E.: Toward simulation-time data analysis and i/o acceleration on leadership-class systems. In: Proc. of IEEE Symposium on Large Data Analysis and Visualization (LDAV) (2011)
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1994)
Watts, D., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)
Yu, W., Wang, C., Grout, R.W., Chen, J.H., Ma, K.-L.: In-situ visualization for large-scale combustion simulations. IEEE Comput. Graph. Appl. 30, 45–57 (2010)
Acknowledgements
The authors wish to thank Tamara G. Kolda, Madhav Jha, and Todd Plantenga for many helpful discussions and Dr. Jacqueline Chen for access to the combustion use case examples used in this work. This work is supported by the Laboratory Directed Research and Development (LDRD) program of Sandia National Laboratories and the DARPA GRAPHS program. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Seshadhri, C., Pinar, A., Thompson, D., Bennett, J.C. (2015). Sublinear Algorithms for Extreme-Scale Data Analysis. In: Bennett, J., Vivodtzev, F., Pascucci, V. (eds) Topological and Statistical Methods for Complex Data. Mathematics and Visualization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44900-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-662-44900-4_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44899-1
Online ISBN: 978-3-662-44900-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)