Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/775047.775059acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

ANF: a fast and scalable tool for data mining in massive graphs

Published: 23 July 2002 Publication History

Abstract

Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can be represented as a graph. This work presents a data mining tool, called ANF, that can quickly answer a number of interesting questions on graph-represented data, such as the following. How robust is the Internet to failures? What are the most influential database papers? Are there gender differences in movie appearance patterns? At its core, ANF is based on a fast and memory-efficient approach for approximating the complete "neighbourhood function" for a graph. For the Internet graph (268K nodes), ANF's highly-accurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drill-down tasks and to repeatedly answer questions about changing data sources. To enable this drill-down, ANF employs new techniques for approximating neighbourhood-type functions for graphs with distinguished nodes and/or edges. When compared to the best existing approximation, ANF's approach is both faster and more accurate, given the same resources. Additionally, unlike previous approaches, ANF scales gracefully to handle disk resident graphs. Finally, we present some of our results from mining large graphs using ANF.

References

[1]
L. A. Adamic. The small world Web. In Proceedings of the European Conf. on Digital Libraries, 1999.]]
[2]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998.]]
[3]
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, and R. Stata. Graph structure in the Web. In Proceedings of the 9th International World Wide Web Conference, pages 247--256, 2000.]]
[4]
E. Cohen. Size-estimation framework with applications to transitive closure and reachability. Journal of Computer and System Sciences, 55(3):441--453, December 1997.]]
[5]
Cook and Holder. Graph-based data mining. ISTA: Intelligent Systems & their applications, 15, 2000.]]
[6]
CORA search engine. http://www.cora.whizbang.com.]]
[7]
P. Domingos and M. Richardson. Mining the network value of customers. In KDD-2001, pages 57--66, 2001.]]
[8]
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM, 1999.]]
[9]
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31:182--209, 1985.]]
[10]
IMDB. http://www.imdb.com.]]
[11]
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PDKK, pages 13--23, 2000.]]
[12]
http://cs.bell-labs.com/who/ches/map/.]]
[13]
S. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. The Web as a graph. In ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 1--10, 2000.]]
[14]
R. J. Lipton and J. F. Naughton. Estimating the size of generalized transitive closures. In Proceedings of 15th International Conference on Very Large Data Bases, pages 315--326, 1989.]]
[15]
M. H. Nodine, M. T. Goodrich, and J. S. Vitter. Blocking for external graph searching. In Proc. ACM PODS Conference (PODS-93), pages 222--232, 1993.]]
[16]
C. R. Palmer, G. Siganos, M. Faloutsos, C. Faloutsos, and P. Gibbons. The connectivity and fault-tolerance of the Internet topology. In Workshop on Network-Related Data Management (NRDM-2001), 2001.]]
[17]
C. R. Palmer and J. G. Steffan. Generating network toplogies that obey power laws. In IEEE Globecom 2000, 2000.]]
[18]
G. Salton and M. MeGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.]]
[19]
http://www.isi.edu/scan/mercator/maps.html.]]
[20]
S. L. Tauro, C. Palmer, G. Siganos, and M. Faloutsos. A simple conceptual model for the Internet topology. In IEEE Globecom 2001, 2001.]]

Cited By

View all
  • (2024)(Vision Paper) A Vision for Spatio-Causal Situation Awareness, Forecasting, and PlanningACM Transactions on Spatial Algorithms and Systems10.1145/367255610:2(1-42)Online publication date: 1-Jul-2024
  • (2024)A Revisit to Graph Neighborhood Cardinality Estimation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00242(3125-3137)Online publication date: 13-May-2024
  • (2023)propagate: A Seed Propagation Framework to Compute Distance-Based Metrics on Very Large GraphsMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43418-1_40(671-688)Online publication date: 17-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
July 2002
719 pages
ISBN:158113567X
DOI:10.1145/775047
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2002

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

KDD02
Sponsor:

Acceptance Rates

KDD '02 Paper Acceptance Rate 44 of 307 submissions, 14%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)4
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)(Vision Paper) A Vision for Spatio-Causal Situation Awareness, Forecasting, and PlanningACM Transactions on Spatial Algorithms and Systems10.1145/367255610:2(1-42)Online publication date: 1-Jul-2024
  • (2024)A Revisit to Graph Neighborhood Cardinality Estimation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00242(3125-3137)Online publication date: 13-May-2024
  • (2023)propagate: A Seed Propagation Framework to Compute Distance-Based Metrics on Very Large GraphsMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43418-1_40(671-688)Online publication date: 17-Sep-2023
  • (2022)On Sampling Collaborative Filtering DatasetsProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498439(842-850)Online publication date: 11-Feb-2022
  • (2022)Adaptive k-center and diameter estimation in sliding windowsInternational Journal of Data Science and Analytics10.1007/s41060-022-00318-z14:2(155-173)Online publication date: 2-Apr-2022
  • (2022)Degrees of Separation and Diameter in Large GraphsEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_59-2(1-7)Online publication date: 15-Jun-2022
  • (2022)Bringing Aggregate Programming Towards the CloudLeveraging Applications of Formal Methods, Verification and Validation. Adaptation and Learning10.1007/978-3-031-19759-8_19(301-317)Online publication date: 17-Oct-2022
  • (2021)A Graph Mining Approach for Ranking and Discovering the Interesting Frequent Subgraph PatternsInternational Journal of Computational Intelligence Systems10.1007/s44196-021-00001-414:1Online publication date: 4-Aug-2021
  • (2020)The flajolet-martin sketch itself preserves differential privacyProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497365(19561-19572)Online publication date: 6-Dec-2020
  • (2020)ELiteProceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3398682.3399164(1-10)Online publication date: 14-Jun-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media