research-article

WTF: the who to follow service at Twitter

Authors:

Reza ZadehAuthors Info & Claims

WWW '13: Proceedings of the 22nd international conference on World Wide Web

Pages 505 - 514

https://doi.org/10.1145/2488388.2488433

Published: 13 May 2013 Publication History

Abstract

WTF ("Who to Follow") is Twitter's user recommendation service, which is responsible for creating millions of connections daily between users based on shared interests, common connections, and other related factors. This paper provides an architectural overview and shares lessons we learned in building and running the service over the past few years. Particularly noteworthy was our design decision to process the entire Twitter graph in memory on a single server, which significantly reduced architectural complexity and allowed us to develop and deploy the service in only a few months. At the core of our architecture is Cassovary, an open-source in-memory graph processing engine we built from scratch for WTF. Besides powering Twitter's user recommendations, Cassovary is also used for search, discovery, promoted products, and other services as well. We describe and evaluate a few graph recommendation algorithms implemented in Cassovary, including a novel approach based on a combination of random walks and SALSA. Looking into the future, we revisit the design of our architecture and comment on its limitations, which are presently being addressed in a second-generation system under development.

References

[1]

L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links in social networks. WSDM, 2011.

Digital Library

[2]

B. Bahmani, A. Chowdhury, and A. Goel. Fast incremental and personalized PageRank. VLDB, 2010.

Digital Library

[3]

Y. Bu, B. Howe, M. Balazinska, and M. Ernst. HaLoop: Efficient iterative data processing on large clusters. VLDB, 2010.

Digital Library

[4]

R. Chen, M. Yang, X. Weng, B. Choi, B. He, and X. Li. Improving large graph processing on partitioned graphs in the cloud. SoCC, 2012.

Digital Library

[5]

F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. KDD, 2009.

Digital Library

[6]

J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI, 2004.

Digital Library

[7]

J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: A runtime for iterative MapReduce. MAPREDUCE Workshop, 2010.

Digital Library

[8]

D. Fogaras, B. Rácz, K. Csalogány, and T. Sarlós. Towards scaling fully personalized PageRank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.

[9]

A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanamurthy, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a high-level dataflow system on top of MapReduce: The Pig experience. VLDB, 2009.

Digital Library

[10]

M. Hasan and M. Zaki. A survey of link prediction in social networks. In C. Aggarwal, editor, Social Network Data Analytics. Springer, 2011.

[11]

J. Huang, D. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. VLDB, 2011.

Digital Library

[12]

G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48:96--129, 1998.

Digital Library

[13]

J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.

Digital Library

[14]

R. Kohavi, R. Henne, and D. Sommerfield. Practical guide to controlled experiments on the web: Listen to your customers not to the HiPPO. SIGKDD, 2007.

Digital Library

[15]

G. Lee, J. Lin, C. Liu, A. Lorek, and D. Ryaboy. The unified logging infrastructure for data analytics at Twitter. VLDB, 2012.

Digital Library

[16]

F. Leibert, J. Mannix, J. Lin, and B. Hamadani. Automatic management of partitioned, replicated search services. SoCC, 2011.

Digital Library

[17]

R. Lempel and S. Moran. SALSA: The Stochastic Approach for Link-Structure Analysis. ACM Transactions on Information Systems, 19(2):131--160, 2001.

Digital Library

[18]

J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2009.

[19]

J. Lin. MapReduce is good enough? If all you have is a hammer, throw away everything that's not a nail! arXiv:1209.2191, 2012.

[20]

J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce. Morgan & Claypool Publishers, 2010.

Digital Library

[21]

J. Lin and A. Kolcz. Large-scale machine learning at Twitter. SIGMOD, 2012.

Digital Library

[22]

J. Lin and M. Schatz. Design patterns for efficient graph algorithms in MapReduce. MLG Workshop, 2010.

Digital Library

[23]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. Hellerstein. GraphLab: A new parallel framework for machine learning. UAI, 2010.

Digital Library

[24]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. VLDB, 2012.

Digital Library

[25]

G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. SIGMOD, 2010.

Digital Library

[26]

J. Mondal and A. Deshpande. Managing large dynamic graphs efficiently. SIGMOD, 2012.

Digital Library

[27]

C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. SIGMOD, 2008.

Digital Library

[28]

L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford University, 1999.

[29]

L. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103--111, 1990.

Digital Library

[30]

R. Zadeh and A. Goel. Dimension independent similarity computation. arXiv:1206.2082, 2012.

[31]

Y. Zhang, Q. Gao, L. Gao, and C. Wang. PrIter: A distributed framework for prioritized iterative computations. SoCC, 2011.

Digital Library

Cited By

Wang HBaeza-Yates RBonchi F(2024)Revisiting Local PageRank Estimation on Undirected Graphs: Simple and OptimalProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671820(3036-3044)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671820
Zhu ZWang KLiu HLi JLuo SBaeza-Yates RBonchi F(2024)Topology-monitorable Contrastive Learning on Dynamic GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671777(4700-4711)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671777
Zhu ZLuo SLin WWang SMo DLi C(2024)Personalized PageRanks over Dynamic Graphs - The Case for Optimizing Quality of Service2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00038(409-422)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00038
Show More Cited By

Index Terms

WTF: the who to follow service at Twitter
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

WTF, GPU! computing twitter's who-to-follow on the GPU
COSN '14: Proceedings of the second ACM conference on Online social networks

In this paper, we investigate the potential of GPUs for performing link structure analysis of social graphs. Specifically, we implement Twitter's WTF ("Who to Follow") recommendation system on a single GPU. Our implementation shows promising results on ...
Energy efficiency of large scale graph processing platforms
UbiComp '16: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct

A number of graph processing platforms have emerged recently as a result of the growing demand on graph data analytics with complex and large-scale graph structured datasets. These platforms have been tailored for iterative graph computations and can ...
Vertex-weighted measures for link prediction in hashtag graphs
ASONAM '19: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Communications on the popular social networking platform, Twitter, can be mapped in terms of a hashtag graph, where vertices correspond to hashtags, and edges correspond to co-occurrences of hashtags within the same distinct tweet. Furthermore, a vertex ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '13: Proceedings of the 22nd international conference on World Wide Web

May 2013

1628 pages

ISBN:9781450320351

DOI:10.1145/2488388

General Chairs:
Daniel Schwabe
PUC-Rio - Brazil
,
Virgílio Almeida
UFMG - Brazil
,
Hartmut Glaser
CGI.br - Brazil
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Labs - Spain & Chile
,
Sue Moon
KAIST - South Korea

Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
CGIBR: Comite Gestor da Internet no Brazil

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '13

Sponsor:

NICBR
CGIBR

WWW '13: 22nd International World Wide Web Conference

May 13 - 17, 2013

Rio de Janeiro, Brazil

Acceptance Rates

WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

336
Total Citations
View Citations
3,281
Total Downloads

Downloads (Last 12 months)113
Downloads (Last 6 weeks)20

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang HBaeza-Yates RBonchi F(2024)Revisiting Local PageRank Estimation on Undirected Graphs: Simple and OptimalProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671820(3036-3044)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671820
Zhu ZWang KLiu HLi JLuo SBaeza-Yates RBonchi F(2024)Topology-monitorable Contrastive Learning on Dynamic GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671777(4700-4711)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671777
Zhu ZLuo SLin WWang SMo DLi C(2024)Personalized PageRanks over Dynamic Graphs - The Case for Optimizing Quality of Service2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00038(409-422)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00038
Ortega SMartin-Delgado M(2024)Randomized SearchRank: A semiclassical approach to a quantum search enginePhysical Review Research10.1103/PhysRevResearch.6.0430146:4Online publication date: 4-Oct-2024
https://doi.org/10.1103/PhysRevResearch.6.043014
Muñoz PDíez FBellogín A(2024)Modeling disinformation networks on Twitter: structure, behavior, and impactApplied Network Science10.1007/s41109-024-00610-w9:1Online publication date: 26-Jan-2024
https://doi.org/10.1007/s41109-024-00610-w
Loukil MSfaxi LRobbana R(2024)How to create graphs in hardware-constrained environments? Choosing the best creation approach via machine learning-based predictive modelsInternational Journal of Data Science and Analytics10.1007/s41060-023-00495-5Online publication date: 2-Feb-2024
https://doi.org/10.1007/s41060-023-00495-5
Dutta LMaji GSen S(2024)Mutual Interest-Based Twitter Followee Recommendation Using Latent Dirichlet Allocation Topic ModellingJournal of The Institution of Engineers (India): Series B10.1007/s40031-024-01125-9Online publication date: 9-Aug-2024
https://doi.org/10.1007/s40031-024-01125-9
Li JShomer HMao HZeng SMa YShah NTang JYin DOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Evaluating graph neural networks for link predictionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666291(3853-3866)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666291
Hu RDu YHu JLi H(2023)Cross-community shortcut detection based on network representation learning and structural featuresIntelligent Data Analysis10.3233/IDA-21651327:3(709-732)Online publication date: 18-May-2023
https://doi.org/10.3233/IDA-216513
Zhang FJiang MWang S(2023)Efficient Dynamic Weighted Set Sampling and Its ExtensionProceedings of the VLDB Endowment10.14778/3617838.361784017:1(15-27)Online publication date: 4-Dec-2023
https://doi.org/10.14778/3617838.3617840
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents