Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2908131.2908153acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

LlamaFur: learning latent category matrix to find unexpected relations in Wikipedia

Published: 22 May 2016 Publication History

Abstract

Besides finding trends and unveiling typical patterns, modern information retrieval is increasingly interested in the discovery of serendipity and surprising information. In this work we focus on finding unexpected links in hyperlinked corpora when documents are assigned to categories. To achieve our goal, we determine a latent category matrix that explains common links using a highly scalable margin-based online learning algorithm, which makes us able to process graphs with 108 links in less than 10 minutes. We show that our method provides better accuracy than all existing text-based techniques, with higher efficiency and relying on a much smaller amount of information. It also provides higher precision than standard link prediction, especially at low recall levels; the two methods are in fact shown to be orthogonal to each other and can therefore be fruitfully combined.

References

[1]
L. A. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25:211--230, 2001.
[2]
C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., 2006.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003.
[4]
P. Boldi and C. Monti. Cleansing wikipedia categories using centrality. In Proc. 24th Int.Conf. on WWW, WWW '16 Companion, 2016 (To appear).
[5]
Paolo Boldi, Irene Crimaldi, and Corrado Monti. A network model characterized by a latent attribute structure with competition. Information Sciences, to appear.
[6]
C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proc. of the 27th ACM SIGIR, pages 25--32. ACM, 2004.
[7]
J. Chang and D. M. Blei. Relational topic models for document networks. In Int. Conf. on AI and statistics, pages 81--88, 2009.
[8]
Nello Cristianini and John Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000.
[9]
K. Henderson and T. Eliassi-Rad. Applying latent dirichlet allocation to group discovery in large graphs. In Proc. 2009 ACM Symposium on Applied Computing, pages 1456--1461. ACM, 2009.
[10]
F. Jacquenet and C. Largeron. Discovering unexpected documents in corpora. Knowledge-Based Systems, 22(6):421--429, 2009.
[11]
N. Japkowicz and S. Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429--449, 2002.
[12]
M. Kim and J. Leskovec. Multiplicative attribute graph model of real-world networks. Internet Mathematics, 8(1--2):113--160, 2012.
[13]
S. Lattanzi and D. Sivakumar. Affiliation networks. In Proc. of ACM STOC '09, pages 427--434, 2009.
[14]
B. Liu, Y. Ma, and Philip S. Yu. Discovering unexpected information from your competitors' web sites. In Proc. KDD 2001, pages 144--153. ACM, 2001.
[15]
Y. Liu, A. Niculescu-Mizil, and W. Gryc. Topic-link lda: joint models of topic and author community. In Proc. 26th Annual Int. Conf. on Machine Learning, pages 665--672. ACM, 2009.
[16]
L. Lü and T. Zhou. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390(6):1150--1170, 2011.
[17]
C. Monti, A. Rozza, G. Zappella, M. Zignani, A. Arvidsson, and E. Colleoni. Modelling political disaffection from twitter data. In Proc. of the 2nd Int. WISDOM, page 3. ACM, 2013.
[18]
T. Murakami, K. Mori, and R. Orihara. Metrics for evaluating the serendipity of recommendation lists. In Proc. of the 2007 Conf. on New Frontiers in AI, JSAI'07, pages 40--46. Springer-Verlag, 2008.
[19]
N. Ramakrishnan and A. Y. Grama. Data mining-guest editors' introduction: From serendipity to science. Computer, 32(8):34--37, 1999.
[20]
Stuart Russell and Peter Norvig. Artificial intelligence: A modern approach. 2010.
[21]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A large ontology from wikipedia and wordnet. Web Semantics: Science, Services and Agents on the WWW, 6(3):203--217, 2008.

Cited By

View all
  • (2023)Evidence of Demographic rather than Ideological Segregation in News Discussion on RedditProceedings of the ACM Web Conference 202310.1145/3543507.3583468(2777-2786)Online publication date: 30-Apr-2023
  • (2021)Exploding TV Sets and Disappointing Laptops: Suggesting Interesting Content in News Archives Based on Surprise EstimationAdvances in Information Retrieval10.1007/978-3-030-72113-8_17(254-269)Online publication date: 27-Mar-2021
  • (2018)Towards Recommending Interesting Content in News ArchivesMaturity and Innovation in Digital Libraries10.1007/978-3-030-04257-8_13(142-146)Online publication date: 15-Nov-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WebSci '16: Proceedings of the 8th ACM Conference on Web Science
May 2016
392 pages
ISBN:9781450342087
DOI:10.1145/2908131
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

WebSci '16
Sponsor:
WebSci '16: ACM Web Science Conference
May 22 - 25, 2016
Hannover, Germany

Acceptance Rates

WebSci '16 Paper Acceptance Rate 13 of 70 submissions, 19%;
Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Evidence of Demographic rather than Ideological Segregation in News Discussion on RedditProceedings of the ACM Web Conference 202310.1145/3543507.3583468(2777-2786)Online publication date: 30-Apr-2023
  • (2021)Exploding TV Sets and Disappointing Laptops: Suggesting Interesting Content in News Archives Based on Surprise EstimationAdvances in Information Retrieval10.1007/978-3-030-72113-8_17(254-269)Online publication date: 27-Mar-2021
  • (2018)Towards Recommending Interesting Content in News ArchivesMaturity and Innovation in Digital Libraries10.1007/978-3-030-04257-8_13(142-146)Online publication date: 15-Nov-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media