Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2488388.2488401acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Inferring the demographics of search users: social data meets search queries

Published: 13 May 2013 Publication History

Abstract

Knowing users' views and demographic traits offers a great potential for personalizing web search results or related services such as query suggestion and query completion. Such signals however are often only available for a small fraction of search users, namely those who log in with their social network account and allow its use for personalization of search results. In this paper, we offer a solution to this problem by showing how user demographic traits such as age and gender, and even political and religious views can be efficiently and accurately inferred based on their search query histories. This is accomplished in two steps; we first train predictive models based on the publically available myPersonality dataset containing users' Facebook Likes and their demographic information. We then match Facebook Likes with search queries using Open Directory Project categories. Finally, we apply the model trained on Facebook Likes to large-scale query logs of a commercial search engine while explicitly taking into account the difference between the traits distribution in both datasets. We find that the accuracy of classifying age and gender, expressed by the area under the ROC curve (AUC), are 77% and 84% respectively for predictions based on Facebook Likes, and only degrade to 74% and 80% when based on search queries. On a US state-by-state basis we find a Pearson correlation of 0.72 for political views between the predicted scores and Gallup data, and 0.54 for affiliation with Judaism between predicted scores and data from the US Religious Landscape Survey. We conclude that it is indeed feasible to infer important demographic data of users from their query history based on labelled Likes data and believe that this approach could provide valuable information for personalization and monetization even in the absence of demographic data.

References

[1]
A. Arnold, R. Nallapati, and W. W. Cohen. A comparative study of methods for transductive transfer learning. In Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, ICDMW '07, pages 77--82, 2007.
[2]
Y. Bachrach, M. Kosinski, T. Graepel, P. Kohli, and D. Stillwell. Personality and patterns of Facebook usage. In Proceedings of the 3rd Annual ACM Web Science Conference, WebSci '12, pages 24--32, Evanston, IL, 2012. ACM.
[3]
P. N. Bennett, F. Radlinski, R. W. White, and E. Yilmaz. Inferring and using location metadata to personalize web search. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR '11, pages 135--144, Beijing, China, 2011. ACM.
[4]
P. N. Bennett, K. Svore, and S. T. Dumais. Classification-enhanced ranking. In Proceedings of the 19th international conference on World wide web, WWW '10, pages 111--120, Raleigh, NC, 2010. ACM.
[5]
D. Carmel, N. Zwerdling, I. Guy, S. Ofek-Koifman, N. Har'el, I. Ronen, E. Uziel, S. Yogev, and S. Chernov. Personalized social search based on the user's social network. In Proceedings of the 18th ACM conference on Information and knowledge management, CIKM '09, pages 1227--1236, Hong Kong, China, 2009. ACM.
[6]
A. Culotta. Towards detecting influenza epidemics by analyzing Twitter messages. In Proceedings of the First Workshop on Social Media Analytics, SOMA '10, pages 115--122, Washington, DC, 2010. ACM.
[7]
W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Transferring naive Bayes classifiers for text classification. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 1, AAAI'07, pages 540--545, Vancouver, BC, 2007. AAAI Press.
[8]
H. Daume, III and D. Marcu. Domain adaptation for statistical classifiers. J. Artif. Int. Res., 26(1):101--126, May 2006.
[9]
M. Ettredge, J. Gerdes, and G. Karuga. Using web-based search data to predict macroeconomic statistics. Commun. ACM, 48(11):87--92, Nov. 2005.
[10]
W. Fan, I. Davidson, B. Zadrozny, and P. S. Yu. An improved categorization of classifier's sensitivity on sample selection bias. In Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM '05, pages 605--608, Washington, DC, USA, 2005.
[11]
J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski, and L. Brilliant. Detecting influenza epidemics using search engine query data. Nature, 457(7232):1012--1014, Feb. 2009.
[12]
S. Goel, J. M. Hofman, S. Lahaie, D. M. Pennock, and D. J. Watts. Predicting consumer behavior with Web search. Proceedings of the National Academy of Sciences, 107(41):17486--17490, Oct. 2010.
[13]
J. Hu, H.-J. Zeng, H. Li, C. Niu, and Z. Chen. Demographic prediction based on user's browsing behavior. In Proceedings of the 16th international conference on World Wide Web, WWW '07, pages 151--160, Banff, AB, 2007. ACM.
[14]
B. J. Jansen and L. Solomon. Gender demographic targeting in sponsored search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '10, pages 831--840, Atlanta, GA, 2010.
[15]
R. Jones, R. Kumar, B. Pang, and A. Tomkins. "I know what you did last summer": query logs and user privacy. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM '07, pages 909--914, Lisbon, Portugal, 2007. ACM.
[16]
E. Kharitonov and P. Serdyukov. Gender-aware re-ranking. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '12, pages 1081--1082, Portland, OR, 2012. ACM.
[17]
W. Kong, Y. Liu, S. Ma, and L. Ru. Detecting epidemic tendency by mining search logs. In Proceedings of the 19th international conference on World wide web, WWW '10, pages 1133--1134, Raleigh, NC, 2010. ACM.
[18]
M. Kosinski, P. Kohli, D. Stillwell, Y. Bachrach, and T. Graepel. Personality and website choice. In Proceedings of the 3rd Annual ACM Web Science Conference, WebSci '12, Evanston, IL, 2012.
[19]
L. Lorigo, B. Pan, H. Hembrooke, T. Joachims, L. Granka, and G. Gay. The influence of task and gender on search and evaluation behavior using google. Inf. Process. Manage., 42(4):1123--1131, July 2006.
[20]
J. Otterbacher. Inferring gender of movie reviewers: exploiting writing style, content and metadata. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM '10, pages 369--378, Toronto, ON, 2010. ACM.
[21]
M. Pennacchiotti and A.-M. Popescu. Democrats, Republicans and Starbucks afficionados: user classification in Twitter. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '11, pages 430--438, San Diego, CA, 2011. ACM.
[22]
D. Quercia, M. Kosinski, D. Stillwell, and J. Crowcroft. Our Twitter profiles, our selves: Predicting personality with Twitter. In PASSAT/SocialCom 2011, pages 180--185, Boston, MA, 2011. IEEE.
[23]
S. Torres and I. Weber. What and how children search on the web. In Proceedings of the 20th ACM international conference on Information and knowledge management, CIKM '11, pages 393--402, Glasgow, UK, 2011. ACM.
[24]
I. Weber and C. Castillo. The demographics of web search. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 523--530, Geneva, Switzerland, 2010. ACM.
[25]
I. Weber, V. R. K. Garimella, and E. Borra. Mining web query logs to analyze political issues. In Proceedings of the 3rd Annual ACM Web Science Conference, WebSci '12, pages 330--334, Evanston, IL, 2012. ACM.
[26]
I. Weber, V. R. K. Garimella, and E. Borra. Political search trends. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '12, pages 1012--1012, Portland, OR, 2012. ACM.
[27]
I. Weber and A. Jaimes. Demographic information flows. In Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM '10, pages 1521--1524, Toronto, ON, 2010. ACM.
[28]
I. Weber and A. Jaimes. Who uses web search for what: and how. In Proceedings of the fourth ACM international conference on Web search and data mining, WSDM '11, pages 15--24, Hong Kong, China, 2011. ACM.
[29]
em Proceedings of the 22nd national conference on Artificial J. J.-C. Ying, Y.-J. Chang, C.-M. Huang, and V. S. Tseng. Demographic prediction based on users mobile behaviors. In Mobile Data Challenge 2012 (by Nokia) Workshop, Newcastle, UK., 2012.
[30]
B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on Machine learning, ICML '04, pages 114--, Banff, AB, 2004. ACM.

Cited By

View all
  • (2024)Generative AI in EU Law: Liability, Privacy, Intellectual Property, and CybersecuritySSRN Electronic Journal10.2139/ssrn.4694565Online publication date: 2024
  • (2024)Using Social Media as a Source of Real-World Data for Pharmaceutical Drug Development and Regulatory Decision MakingDrug Safety10.1007/s40264-024-01409-547:5(495-511)Online publication date: 6-Mar-2024
  • (2023)Internet search data showed increased interest in supplementary online education during the COVID-19 pandemic, with females showing a greater increaseFrontiers in Education10.3389/feduc.2023.11426898Online publication date: 17-Aug-2023
  • Show More Cited By

Index Terms

  1. Inferring the demographics of search users: social data meets search queries

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WWW '13: Proceedings of the 22nd international conference on World Wide Web
      May 2013
      1628 pages
      ISBN:9781450320351
      DOI:10.1145/2488388

      Sponsors

      • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
      • CGIBR: Comite Gestor da Internet no Brazil

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 May 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. personalized search
      2. social networks
      3. user demographics

      Qualifiers

      • Research-article

      Conference

      WWW '13
      Sponsor:
      • NICBR
      • CGIBR
      WWW '13: 22nd International World Wide Web Conference
      May 13 - 17, 2013
      Rio de Janeiro, Brazil

      Acceptance Rates

      WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;
      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)61
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 21 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Generative AI in EU Law: Liability, Privacy, Intellectual Property, and CybersecuritySSRN Electronic Journal10.2139/ssrn.4694565Online publication date: 2024
      • (2024)Using Social Media as a Source of Real-World Data for Pharmaceutical Drug Development and Regulatory Decision MakingDrug Safety10.1007/s40264-024-01409-547:5(495-511)Online publication date: 6-Mar-2024
      • (2023)Internet search data showed increased interest in supplementary online education during the COVID-19 pandemic, with females showing a greater increaseFrontiers in Education10.3389/feduc.2023.11426898Online publication date: 17-Aug-2023
      • (2023)Promoting the Transparency of AI-Generated InferencesSSRN Electronic Journal10.2139/ssrn.4595891Online publication date: 2023
      • (2023)Incentivizing Exploration in Linear Contextual Bandits under Information GapProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608794(415-425)Online publication date: 14-Sep-2023
      • (2023)Predicting Users' Demographic Features Based on Searched Queries and Installed Apps and Games2023 28th International Computer Conference, Computer Society of Iran (CSICC)10.1109/CSICC58665.2023.10105350(01-07)Online publication date: 25-Jan-2023
      • (2022)Methods to Establish Race or Ethnicity of Twitter Users: Scoping ReviewJournal of Medical Internet Research10.2196/3578824:4(e35788)Online publication date: 29-Apr-2022
      • (2022)Privacy-Preserving Recommendation with Debiased Obfuscaiton2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom56396.2022.00086(590-597)Online publication date: Dec-2022
      • (2022)Fairness in vulnerable attribute prediction on social mediaData Mining and Knowledge Discovery10.1007/s10618-022-00855-y36:6(2194-2213)Online publication date: 17-Sep-2022
      • (2022)Privacy in targeted advertising on mobile devices: a surveyInternational Journal of Information Security10.1007/s10207-022-00655-x22:3(647-678)Online publication date: 24-Dec-2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media