Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Identifying graphs from noisy and incomplete data

Published: 09 November 2010 Publication History

Abstract

There is a growing wealth of data describing networks of various types, including social networks, physical networks such as transportation or communication networks, and biological networks. At the same time, there is a growing interest in analyzing these networks, in order to uncover general laws that govern their structure and evolution, and patterns and predictive models to develop better policies and practices. However, a fundamental challenge in dealing with this newly available observational data describing networks is that the data is often of dubious quality -- it is noisy and incomplete -- and before any analysis method can be applied, the data must be cleaned, and missing information inferred. In this paper, we introduce the notion of graph identification, which explicitly models the inference of a "cleaned" output network from a noisy input graph. It is this output network that is appropriate for further analysis. We present an illustrative example and use the example to explore the types of inferences involved in graph identification, as well as the challenges and issues involved in combining those inferences. We then present a simple, general approach to combining the inferences in graph identification and experimentally show the utility of our combined approach and how the performance of graph identification is sensitive to the inter-dependencies among these inferences.

References

[1]
I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 1:1--36, 2007.
[2]
I. Bhattacharya, S. Godbole, and S. Joshi. Structured entity identification and document categorization: two tasks with one joint model. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 25--33, 2008.
[3]
M. Bilgic and L. Getoor. Effective label acquisition for collective classification. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43--51, 2008.
[4]
M. Bilgic, G. M. Namata, and L. Getoor. Combining collective classification and link prediction. In ICDMW '07: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pages 381--386, Washington, DC, USA, 2007. IEEE Computer Society.
[5]
Y. Choi, E. Breck, and C. Cardie. Joint extraction of entities and relations for opinion recognition. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 431--439. Association for Computational Linguistics, 2006.
[6]
W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In Proceedings of IJCAI-03 Workshop on Information Integration, pages 73--78, August 2003.
[7]
L. Getoor and C. P. Diehl. Link mining: a survey. SIGKDD Explorations Newsletter, 7:3--12, 2005.
[8]
L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of link structure. Machine Learning, 3:679--707, 2003.
[9]
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery Data, 1(1):2, 2007.
[10]
D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In International Conference on Information and Knowledge Management, November 2003.
[11]
Q. Lu and L. Getoor. Link-based classification. In Proceedings of the International Conference on Machine Learning, 2003.
[12]
L. McDowell, K. M. Gupta, and D. W. Aha. Cautious inference in collective classification. In AAAI, pages 596--601, 2007.
[13]
G. M. Namata, P. Sen, M. Bilgic, and L. Getoor. Collective classification for text classification. In M. Sahami and A. Srivastava, editors, Text Mining: Classification, Clustering, and Applications. Taylor and Francis Group, 2009.
[14]
M. J. Rattigan, M. Maier, and D. Jensen. Exploiting network structure for active inference in collective classification. Technical report, University of Massachusetts Amherst, 2007.
[15]
M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62:107--136, 2006.
[16]
D. Roth, K. Small, and I. Titov. Sequential learning of classifiers for structured prediction problems. In Artificial Intelligence STAT, April 2009.
[17]
D. Roth and W. Yih. A linear programming formulation for global inference in natural language tasks. In Proceedings of CoNLL-2004, pages 1--8. Boston, MA, USA, 2004.
[18]
B. Taskar, A. Pieter, and D. Koller. Discriminative probabilistic models for relational data. In Conf. on Uncertainty in Artificial Intelligence, 2002.
[19]
B. Taskar, M.-F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. In Advances in Neural Information Processing Systems, December 2003.
[20]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, October 1999.

Cited By

View all
  • (2020)Graph GeneratorsACM Computing Surveys10.1145/337944553:2(1-30)Online publication date: 17-Apr-2020
  • (2018)SINE: Scalable Incomplete Network Embedding2018 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2018.00089(737-746)Online publication date: Nov-2018
  • (2015)Federated databases and actionable intelligence: using social network analysis to disrupt transnational wildlife trafficking criminal networksSecurity Informatics10.1186/s13388-015-0018-84:1Online publication date: 12-Apr-2015

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 12, Issue 1
June 2010
77 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1882471
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2010
Published in SIGKDD Volume 12, Issue 1

Check for updates

Author Tags

  1. classification
  2. data mining
  3. entity resolution
  4. link prediction
  5. social networks
  6. statistical relational learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Graph GeneratorsACM Computing Surveys10.1145/337944553:2(1-30)Online publication date: 17-Apr-2020
  • (2018)SINE: Scalable Incomplete Network Embedding2018 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2018.00089(737-746)Online publication date: Nov-2018
  • (2015)Federated databases and actionable intelligence: using social network analysis to disrupt transnational wildlife trafficking criminal networksSecurity Informatics10.1186/s13388-015-0018-84:1Online publication date: 12-Apr-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media