Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2488388.2488392acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Hierarchical geographical modeling of user locations from social media posts

Published: 13 May 2013 Publication History

Abstract

With the availability of cheap location sensors, geotagging of messages in online social networks is proliferating. For instance, Twitter, Facebook, Foursquare, and Google+ provide these services both explicitly by letting users choose their location or implicitly via a sensor. This paper presents an integrated generative model of location and message content. That is, we provide a model for combining distributions over locations, topics, and over user characteristics, both in terms of location and in terms of their content preferences. Unlike previous work which modeled data in a flat pre-defined representation, our model automatically infers both the hierarchical structure over content and over the size and position of geographical locations. This affords significantly higher accuracy --- location uncertainty is reduced by 40% relative to the best previous results [21] achieved on location estimation from Tweets.
We achieve this goal by proposing a new statistical model, the nested Chinese Restaurant Franchise (nCRF), a hierarchical model of tree distributions. Much statistical structure is shared between users. That said, each user has his own distribution over interests and places. The use of the nCRF allows us to capture the following effects: (1) We provide a topic model for Tweets; (2) We obtain location specific topics; (3) We infer a latent distribution of locations; (4) We provide a joint hierarchical model of topics and locations; (5) We infer personalized preferences over topics and locations within the above model. In doing so, we are both able to obtain accurate estimates of the location of a user based on his tweets and to obtain a detailed estimate of a geographical language model.

References

[1]
R. Adams, Z. Ghahramani, and M. Jordan. Tree-structured stick breaking for hierarchical data. In Neural Information Processing Systems, pages 19--27, 2010.
[2]
A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. Smola. Scalable inference in latent variable models. In Web Science and Data Mining (WSDM), 2012.
[3]
A. Ahmed, Q. Ho, J. Eisenstein, E. Xing, A. Smola, and C. Teo. Unified analysis of streaming news. In Proceedings of WWW, Hyderabad, India, 2011. IW3C2, Sheridan Printing.
[4]
A. Ahmed, Q. Ho, C. H. Teo, J. Eisenstein, A. J. Smola, and E. P. Xing. Online inference for the infinite topic-cluster model: Storylines from streaming text. In AISTATS, 2011.
[5]
A. Ahmed, S. Ravi, S. Narayanamurthy, and A. J. Smola. Fastex: Hash clustering with exponential families. In NIPS, 2012.
[6]
A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed large-scale natrual graph factorization. In WWW, 2013.
[7]
A. Ahmed and E. P. Xing. Timeline: A dynamic hierarchical dirichlet process model for recovering birth/death and evolution of topics in text stream. In UAI, 2010.
[8]
C. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Annals of Statistics, 2:1152--1174, 1974.
[9]
M. J. Beal, Z. Ghahramani, and C. E. Rasmussen. The infinite hidden markov model. In Neural Information Processing Systems. MIT Press, 2002.
[10]
D. Blackwell and J. MacQueen. Ferguson distributions via polya urn schemes. The Annals of Statistics, 1(2):353--355, 1973.
[11]
D. Blei, T. Griffiths, and M. Jordan. The nested chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. Journal of the ACM, 57(2):1--30, 2010.
[12]
Z. Cheng, J. Caverlee, K. Lee, and D. Sui. Exploring millions of footprints in location sharing services. In International AAAI Conference on Weblogs and Social Media, 2011.
[13]
E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: user movement in location-based social networks. In Knowledge Discovery and Data Mining, pages 1082--1090, New York, NY, USA, 2011. ACM.
[14]
K. Chou, A. Willsky, and A. Benveniste. Multiscale recursive estimation, data fusion, and regularization. IEEE Transactions on Automatic Control, 39(3):464--478, mar 1994.
[15]
P. J. Cowans. Probabilistic Document Modelling. PhD thesis, University of Cambridge, 2006.
[16]
J. Eisenstein, A. Ahmed, and E. Xing. Sparse additive generative models of text. In International Conference on Machine Learning, pages 1041--1048, New York, NY, USA, 2011. ACM.
[17]
J. Eisenstein, B. O'Connor, N. A. Smith, and E. Xing. A latent variable model for geographic lexical variation. In Empirical Methods in Natural Language Processing, pages 1277--1287, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
[18]
T. S. Ferguson. A bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209--230, 1973.
[19]
Q. Hao, R. Cai, C. Wang, R. Xiao, J.-M. Yang, Y. Pang, and L. Zhang. Equip tourists with knowledge mined from travelogues. In Proceedings of WWW, pages 401--410, New York, NY, USA, 2010. ACM.
[20]
T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1/2):177--196, 2001.
[21]
L. Hong, A. Ahmed, S. Gurumurthy, A. Smola, and K. Tsioutsiouliklis. Discovering geographical topics in the twitter stream. In World Wide Web, 2012.
[22]
L. F. James. Coag-frag duality for a class of stable poisson-kingman mixtures, 2010.
[23]
Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW, pages 533--542, New York, NY, USA, 2006. ACM.
[24]
S. Roller, M. Speriosu, S. Rallapalli, B. Wing, and J. Baldridge. Supervised text-based geolocation using language models on an adaptive grid. In Proceedings of EMNLP, pages 1500--1510, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.
[25]
S. Sizov. Geofolk: latent spatial semantics in web 2.0 social media. In Proceedings of WSDM, pages 281--290, New York, NY, USA, 2010. ACM.
[26]
Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(576):1566--1581, 2006.
[27]
H. Wallach. Structured Topic Models for Language. PhD thesis, University of Cambridge, 2008.
[28]
C. Wang, J. Wang, X. Xing, and W. Ma. Mining geographic knowledge using location aware topic model. In Proceedings of the 4th ACM workshop on Geographical Information Retrieval, pages 65--70, New York, NY, USA, 2007. ACM.
[29]
X. Wang, A. McCallum, and X. Wei. Topical N-grams: Phrase and topic discovery, with an application to information retrieval. In International Conference on Data Mining ICDM, pages 697--702. IEEE Computer Society, 2007.
[30]
B. Wing and J. Baldridge. Simple supervised document geolocation with geodesic grids. In Proceedings of ACL, pages 955--964, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.
[31]
Z. Yin, L. Cao, J. Han, C. Zhai, and T. Huang. Geographical topic discovery and comparison. In World Wide Web, pages 247--256, New York, NY, USA, 2011. ACM.

Cited By

View all
  • (2023)MetaGeo: A General Framework for Social User Geolocation Identification With Few-Shot LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.315420434:11(8950-8964)Online publication date: Nov-2023
  • (2023)Feature Analysis of Regional Behavioral Facilitation Information Based on Source Location and Target People in DisasterBig Data Analytics and Knowledge Discovery10.1007/978-3-031-39831-5_21(224-232)Online publication date: 10-Aug-2023
  • (2022)Language Modeling on Location-Based Social NetworksISPRS International Journal of Geo-Information10.3390/ijgi1102014711:2(147)Online publication date: 18-Feb-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '13: Proceedings of the 22nd international conference on World Wide Web
May 2013
1628 pages
ISBN:9781450320351
DOI:10.1145/2488388

Sponsors

  • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
  • CGIBR: Comite Gestor da Internet no Brazil

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. chinese restaurant process
  2. geolocation
  3. non-paremetric bayesian models
  4. topic models
  5. twitter
  6. user profiling

Qualifiers

  • Research-article

Conference

WWW '13
Sponsor:
  • NICBR
  • CGIBR
WWW '13: 22nd International World Wide Web Conference
May 13 - 17, 2013
Rio de Janeiro, Brazil

Acceptance Rates

WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)MetaGeo: A General Framework for Social User Geolocation Identification With Few-Shot LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.315420434:11(8950-8964)Online publication date: Nov-2023
  • (2023)Feature Analysis of Regional Behavioral Facilitation Information Based on Source Location and Target People in DisasterBig Data Analytics and Knowledge Discovery10.1007/978-3-031-39831-5_21(224-232)Online publication date: 10-Aug-2023
  • (2022)Language Modeling on Location-Based Social NetworksISPRS International Journal of Geo-Information10.3390/ijgi1102014711:2(147)Online publication date: 18-Feb-2022
  • (2022)Developing insights from the collective voice of target users in TwitterJournal of Big Data10.1186/s40537-022-00611-59:1Online publication date: 2-Jun-2022
  • (2022)PGeoTopic: A Distributed Solution for Mining Geographical Topic ModelsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.298914234:2(881-893)Online publication date: 1-Feb-2022
  • (2021)From Symbols to Embeddings: A Tale of Two Representations in Computational Social ScienceJournal of Social Computing10.23919/JSC.2021.00112:2(103-156)Online publication date: Jun-2021
  • (2021)Location Classification Based on TweetsProceedings of the 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery10.1145/3486635.3491075(51-60)Online publication date: 2-Nov-2021
  • (2020)Twitter User Location Inference Based on Representation Learning and Label PropagationProceedings of The Web Conference 202010.1145/3366423.3380019(2648-2654)Online publication date: 20-Apr-2020
  • (2020)Geolocation using GAT with Multiview Learning2020 IEEE International Conference on Smart Data Services (SMDS)10.1109/SMDS49396.2020.00017(81-88)Online publication date: Oct-2020
  • (2020)Multiple-Aspect Attentional Graph Neural Networks for Online Social Network User LocalizationIEEE Access10.1109/ACCESS.2020.29938768(95223-95234)Online publication date: 2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media