Nothing Special   »   [go: up one dir, main page]

Skip to main content

Sublinear Algorithms for Extreme-Scale Data Analysis

  • Conference paper
  • First Online:
Topological and Statistical Methods for Complex Data

Part of the book series: Mathematics and Visualization ((MATHVISUAL))

Abstract

The study of sublinear algorithms is a recent development in theoretical computer science and discrete mathematics that has significant potential to provide scalable analysis algorithms for massive data. The approaches of sublinear algorithms address the fundamental mathematical problem of understanding global features of a data set using limited resources. However, much of the work in this area is theoretical in nature and has yet to be applied to practical problems. This chapter provides background on sublinear algorithms, and then surveys a series of recent successes in the sublinear analysis of large-scale graphs and the robust generation of color maps for visualization of large physics simulation data. We end the chapter with a discussion of potential research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abbasi, H., Eisenhauer, G., Wolf, M., Schwan, K., Klasky, S.: Just in time: adding value to the IO pipelines of high performance applications with JIT staging. In: Proc. of 20th International Symposium on High Performance Distributed Computing (HPDC’11) (2011)

    Google Scholar 

  2. Arifuzzaman, S.M., Khan, M., Marathe, M.: PATRIC: A parallel algorithm for counting triangles and computing clustering coefficients in massive networks. NDSSL Technical Report 12-042, Network Dynamics and Simulation Science Laboratory, Virginia Polytechnic Institute and State University (2012)

    Google Scholar 

  3. Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: KDD’08, pp. 16–24 (2008)

    Google Scholar 

  4. Bennett, J., Krishnamoorthy, V., Liu, S., Grout, R., Hawkes, E.R., Chen, J.H., Shepherd, J., Pascucci, V., Bremer, P.-T.: Feature-based statistical analysis of combustion simulation data. IEEE Trans. Vis. Comput. Graph. 17(12), 1822–1831 (2011)

    Article  Google Scholar 

  5. Bennett, J.C., Abbasi, H., Bremer, P.-T., Grout, R., Gyulassy, A., Jin, T., Klasky, S., Kolla, H., Parashar, M., Pascucci, V., Pebay, P., Thompson, D., Yu, H., Zhang, F., Chen, J.: Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In: Hollingsworth, J. (ed.) SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Salt Lake Convention Center, Salt Lake City, 10–16 November 2012, pp. 49:1–49:9, pub-IEEE:adr. IEEE Computer Society Press (2012)

    Google Scholar 

  6. Berry, J., Fostvedt, L., Nordman, D., Phillips, C., Seshadhri, C., Wilson, A.: Why do simple algorithms for triangle enumeration work in the real world? In: Innovations in Theoretical Computer Science (ITCS) (2014)

    Google Scholar 

  7. Boldi, P., Vigna, S.: The web graph framework I: compression techniques. In: Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), Manhattan,, pp. 595–601. ACM Press, New York (2004)

    Google Scholar 

  8. Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th International Conference on World Wide Web. ACM Press, Madrid (2011)

    Google Scholar 

  9. Buriol, L., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., Sohler, C.: Counting triangles in data streams. In: PODS’06, pp. 253–262 (2006)

    Google Scholar 

  10. Burt, R.S.: Structural holes and good ideas. Am. J. Sociol. 110(2), 349–399 (2004)

    Article  Google Scholar 

  11. Chen, J.H., Choudhary, A., de Supinski, B., DeVries, M., Hawkes, E.R., Klasky, S., Liao, W.K., Ma, K.L., Mellor-Crummey, J., Podhorski, N., Sankaran, R., Shende, S., Yoo, C.S.: Terascale direct numerical simulations of turbulent combustion using s3d. Comput. Sci. Discov. 2, 1–31 (2009)

    Article  Google Scholar 

  12. Cohen, J.: Graph twiddling in a MapReduce world. Comput. Sci. Eng. 11, 29–41 (2009)

    Article  Google Scholar 

  13. Coleman, J.S.: Social capital in the creation of human capital. Am. J. Sociol. 94, S95–S120 (1988)

    Article  Google Scholar 

  14. Dubhashi, D., Panconesi, A.: Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, New York (2009)

    Book  MATH  Google Scholar 

  15. Eckmann, J.-P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the World Wide Web. PNAS 99(9), 5825–5829 (2002)

    Article  MathSciNet  Google Scholar 

  16. Fabian, N., Moreland, K., Thompson, D., Bauer, A.C., Marion, P., Gevecik, B., Rasquin, M., Jansen, K.E.: The paraview coprocessing library: a scalable, general purpose in situ visualization library. In: Proc. of IEEE Symposium on Large Data Analysis and Visualization (LDAV), pp. 89–96 (2011)

    Google Scholar 

  17. Fagiolo, G.: Clustering in complex directed networks. Phys. Rev. E 76, 026107 (2007)

    Article  Google Scholar 

  18. Favre, J.M., Whitlock, B., Meredith, J.S.: Parallel in situ coupling of simulation with a fully featured visualization system. In: Proc. of 11th Eurographics Symposium on Parallel Graphics and Visualization (EGPGV’11) (2011)

    Google Scholar 

  19. Fischer, E.: The art of uninformed decisions: a primer to property testing. Bull. EATCS 75, 97–126 (2001)

    MATH  Google Scholar 

  20. Foucault Welles, B., Van Devender, A., Contractor, N.: Is a friend a friend?: Investigating the structure of friendship networks in virtual worlds. In: CHI-EA’10, pp. 4027–4032 (2010)

    Google Scholar 

  21. Goldreich, O., Ron, D.: On testing expansion in bounded-degree graphs. ECCC, TR00-020 (2000)

    Google Scholar 

  22. Gonen, M., Ron, D., Shavitt, Y.: Counting stars and other small subgraphs in sublinear time. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 99–116 (2010)

    Google Scholar 

  23. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)

    Article  MATH  MathSciNet  Google Scholar 

  24. Holland, P.W., Leinhardt, S.: A method for detecting structure in sociometric data. Am. J. Sociol. 76, 492–513 (1970)

    Article  Google Scholar 

  25. Kolda, T.G., Pinar, A., Plantenga, T., Seshadhri, C., Task, C.: Counting triangles in massive graphs with mapreduce. Technical Report 1301.5887, arxiv (2013)

    Google Scholar 

  26. Kolountzakis, M., Miller, G., Peng, R., Tsourakakis, C.: Efficient triangle counting in large graphs via degree-based vertex partitioning. In: WAW’10 (2010)

    Google Scholar 

  27. Mascarenhas, A., Grout, R.W., Bremer, P.-T., Hawkes, E.R., Pascucci, V., Chen, J.H.: Topological feature extraction for comparison of terascale combustion simulation data. In: Pascucci, V., Tricoche, X., Hagen, H., Tierny, J. (eds.) Topological Methods in Data Analysis and Visualization. Mathematics and Visualization, pp. 229–240. Springer, Berlin/Heidelberg (2011)

    Chapter  Google Scholar 

  28. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: Simple building blocks of complex networks. Science 298(5594), 824–827 (2011)

    Article  Google Scholar 

  29. Ron, D.: Algorithmic and analysis techniques in property testing. Found. Trends Theor. Comput. Sci. 5(2), 73–205 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  30. Rubinfeld, R.: Sublinear time algorithms. In: International Conference of Mathematicians (2006)

    Google Scholar 

  31. Rubinfeld, R., Sudan, M.: Robust characterization of polynomials with applications to program testing. SIAM J. Comput. 25, 647–668 (1996)

    Article  MathSciNet  Google Scholar 

  32. Schank, T., Wagner, D.: Approximating clustering coefficient and transitivity. J. Graph Algorithms Appl. 9, 265–275 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  33. Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: Experimental and Efficient Algorithms, pp. 606–609. Springer, Berlin (2005)

    Google Scholar 

  34. Seshadhri, C., Pinar, A., Durak, N., Kolda, T.G.: Directed closure measures for networks with reciprocity. arXiv:1302.6220 (2013)

    Google Scholar 

  35. Seshadhri, C., Pinar, A., Kolda, T.G.: Triadic measures on graphs: the power of wedge sampling. In: Proceedings of the SIAM Conference on Data Mining (SDM) (2013)

    Google Scholar 

  36. Seshadhri, C., Pinar, A., Kolda, T.G.: Wedge sampling for computing clustering coefficients and triangle counts on large graphs. arXiv:1309.3321 (2013)

    Google Scholar 

  37. Son, S., Kang, A., Kim, H., Kwon, T., Park, J., Kim, H.: Analysis of context dependence in social interaction networks of a massively multiplayer online role-playing game. PLoS ONE 7(4), e33918 (2012)

    Article  Google Scholar 

  38. Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: WWW’11, pp. 607–614 (2011)

    Google Scholar 

  39. Szell, M., Thurner, S.: Measuring social dynamics in a massive multiplayer online game. Soc. Netw. 32, 313–329 (2010)

    Article  Google Scholar 

  40. Szell, M., Lambiotte, R., Thurner, S.: Multirelational organization of large-scale social networks in an online world. Proc. Natl. Acad. Sci 107, 13636–13641 (2010)

    Article  Google Scholar 

  41. Thompson, D., Bennett, J., Seshadhri, C., Pinar, A.: A provably-robust sampling method for generating colormaps of large data. In: Proceedings of the IEEE Symposium on Large Data Analysis and Visualization (LDAV), Atlanta (2013)

    Google Scholar 

  42. Tsourakakis, C.E.: Fast counting of triangles in large real networks, without counting: algorithms and laws. In: ICDM 2008, pp. 608–617 (2008)

    Google Scholar 

  43. Tsourakakis, C., Drineas, P., Michelakis, E., Koutis, I., Faloutsos, C.: Spectral counting of triangles in power-law networks via element-wise sparsification. In: ASONAM’09, pp. 66–71 (2009)

    Google Scholar 

  44. Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: counting triangles in massive graphs with a coin. In: KDD ’09, pp. 837–846 (2009)

    Google Scholar 

  45. Tsourakakis, C., Kolountzakis, M.N., Miller, G.: Triangle sparsifiers. J. Graph Algorithms Appl. 15, 703–726 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  46. Vishwanath, V., Hereld, M., Papka, M.E.: Toward simulation-time data analysis and i/o acceleration on leadership-class systems. In: Proc. of IEEE Symposium on Large Data Analysis and Visualization (LDAV) (2011)

    Google Scholar 

  47. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1994)

    Book  Google Scholar 

  48. Watts, D., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)

    Article  Google Scholar 

  49. Yu, W., Wang, C., Grout, R.W., Chen, J.H., Ma, K.-L.: In-situ visualization for large-scale combustion simulations. IEEE Comput. Graph. Appl. 30, 45–57 (2010)

    Google Scholar 

Download references

Acknowledgements

The authors wish to thank Tamara G. Kolda, Madhav Jha, and Todd Plantenga for many helpful discussions and Dr. Jacqueline Chen for access to the combustion use case examples used in this work. This work is supported by the Laboratory Directed Research and Development (LDRD) program of Sandia National Laboratories and the DARPA GRAPHS program. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Janine C. Bennett .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Seshadhri, C., Pinar, A., Thompson, D., Bennett, J.C. (2015). Sublinear Algorithms for Extreme-Scale Data Analysis. In: Bennett, J., Vivodtzev, F., Pascucci, V. (eds) Topological and Statistical Methods for Complex Data. Mathematics and Visualization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44900-4_3

Download citation

Publish with us

Policies and ethics