Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Using unsupervised link discovery methods to find interesting facts and connections in a bibliography dataset

Published: 01 December 2003 Publication History

Abstract

This paper describes a submission to the Open Task of the 2003 KDD Cup. For this task contestants were asked to devise their own questions about the HEP-Th bibliography dataset, and the most interesting result would be selected as the winner. Instead of taking a more traditional approach such as starting with a inspection of the data, formulating questions or hypotheses interesting to us and then devising an analysis and approach to answer these questions, we tried to go a different route: can we develop a program that automatically finds interesting facts and connections in the data?To do this we developed a set of unsupervised link discovery methods that compute interestingness based on a notion of "rarity" and "abnormality". The experiments performed on the HEP-Th dataset show that our approaches are able to automatically uncover interesting hidden connections (e.g. significant relationships between people) and unexpected facts (e.g. citation loops) without the support of any prerequisite knowledge or training examples. The interestingness of some of our results is self-evident. For others we were able to verify them by looking for supporting evidence on the World-Wide-Web, which shows that our methods can find connections between entities that actually are interestingly connected in the real world in an unsupervised way.

References

[1]
Shou-de Lin and Hans Chalupsky. Unsupervised Link Discovery in Multi-relational Data via Rarity Analysis. in Proceedings of the Third IEEE International Conference on Data Mining. Melbourne, Florida. 2003.
[2]
R. Rastogi S. Ramaswamy, K. Shim. Efficient Algorithms for Mining Outliers from Large Data Sets. in Proceedings of SIGMOD'00, Dallas, Texas, 2000.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 5, Issue 2
December 2003
202 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/980972
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2003
Published in SIGKDD Volume 5, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Loops in publication citation networksJournal of Information Science10.1177/016555151987182646:6(837-848)Online publication date: 25-Feb-2021
  • (2014)SpyNetMinerInternational Journal of Data Warehousing and Mining10.4018/ijdwm.201401010310:1(32-54)Online publication date: 1-Jan-2014
  • (2012)Identifying key players in a covert network using behavioral profile2012 International Conference on Recent Trends in Information Technology10.1109/ICRTIT.2012.6206756(22-27)Online publication date: Apr-2012
  • (2012)Behavioral Profile Generation for 9/11 Terrorist Network Using Efficient Selection StrategiesAdvances in Computer Science, Engineering & Applications10.1007/978-3-642-30111-7_32(333-344)Online publication date: 2012
  • (2012)Can intermediary-based science standards crosswalking work? Some evidence from mining the standard alignment tool (SAT)Journal of the American Society for Information Science and Technology10.1002/asi.2271263:9(1843-1858)Online publication date: 1-Sep-2012
  • (2010)Derived types in semantic association discoveryJournal of Intelligent Information Systems10.1007/s10844-009-0094-735:2(213-244)Online publication date: 1-Oct-2010
  • (2009)Enterprise university as a digital ecosystem: Visual analysis of academic collaboration2009 3rd IEEE International Conference on Digital Ecosystems and Technologies10.1109/DEST.2009.5276678(727-732)Online publication date: Jun-2009
  • (2009)Supporting Strategic Decision Making in an Enterprise University Through Detecting Patterns of Academic CollaborationInformation Systems: Modeling, Development, and Integration10.1007/978-3-642-01112-2_50(496-507)Online publication date: 2009
  • (2008)Using importance flooding to identify interesting networks of criminal activityJournal of the American Society for Information Science and Technology10.5555/1458620.145863459:13(2099-2114)Online publication date: 1-Nov-2008
  • (2008)Discovering and Explaining Abnormal Nodes in Semantic GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2007.19069120:8(1039-1052)Online publication date: 1-Aug-2008
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media