Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3492324.3494157acmconferencesArticle/Chapter ViewAbstractPublication PagesbdcatConference Proceedingsconference-collections
research-article

Linking User Accounts across Social Media Platforms

Published: 13 January 2022 Publication History

Abstract

To improve social media analysis across diverse platforms, an effective method to evaluate the possibility that different accounts belong to the same users is required. This might be used to support fake news detection or other nefarious activities. In this paper, we present an approach to calculate the probability that different social media accounts on diverse social media platforms belong to the same user. We consider various platform aspects related to user accounts that can be used for user matching including the selected username, the avatar or profile picture, the content of platform posts and related metadata such as the writing style, account binding and use of hyperlinks across platforms. The experimental results showed that the approach is able to distinguish whether the same person has different platform accounts with an F1-score of up to 0.937.

References

[1]
Average time spent daily on social media (with 2019 data), 2019. URL https://www.broadbandsearch.net/blog/average-daily-time-on-social-media.
[2]
S.V. Scott, W.J. Orlikowski. Entanglements in practice: performing anonymity through social media. MIS Quarterly: Management Information Systems, 38(3):873-893, 2014.
[3]
N. Perlroth. Verifying ages online is a daunting task, even for experts, 2012. URL http://nyti.ms/Tf16Gs.
[4]
A. Sheth. Transforming big data into smart data: Deriving value via harnessing volume, variety, and velocity using semantic techniques and technologies. In IEEE 30th International Conference on Data Engineering, 2014.
[5]
S. Aslam. Twitter by the numbers: Stats, demographics & fun facts, 2019. URL https://www.omnicoreagency.com/twitter-statistics/
[6]
C.G. Akcora, B. Carminati and E. Ferrari. Network and profile based measures for user similarities on social networks. IEEE International Conference on Information Reuse & Integration, pages 292-298, 2011.
[7]
C.G. Akcora, B. Carminati and E. Ferrari. User similarities on social networks. Social Network Analysis and Mining, 3(3):475-495, 2013.
[8]
X. Yang, H. Steck and Y. Liu. Circle-based recommendation in online social networks. In Proceedings of 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2012.
[9]
E. Jaho, M. Karaliopoulos, I. Stavrakakis. ISCODE: a framework for interest similarity-based community detection in social networks. In 2011 IEEE Conference on Computer Communications, pages 912-917.
[10]
J. Hu, Z. Gao, and W. Pan. Multiangle social network recommendation algorithms and similarity network evaluation. Journal of Applied Mathematics, 2013, 2013.
[11]
A. Sanfeliu and King-Sun Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE transactions on systems, man, and cybernetics, (3):353-362, 1983.
[12]
M.R. Khayyambashi and F.S. Rizi. An approach for detecting profile cloning in online social networks. In 7th International Conference on e-Commerce in Developing Countries: with focus on e-Security, 2013.
[13]
P. Bedi and C. Sharma. Community detection in social networks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 6(3):115-135, 2016.
[14]
S. Boulianne. Social media use & participation: A meta-analysis of current research. Information, communication & society, 18(5):524-538, 2015.
[15]
D.P. Lewis, T. Jebara and W.S. Noble. Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics, 22(22):2753-2760, 2006.
[16]
C. Kong, M. Gao, C. Xu, W. Qian, and A. Zhou. Entity matching across multiple heterogeneous data sources. In International Conference on Database Systems for Advanced Applications, pages 133-146, 2016.
[17]
Y. Jung. Nonlinear regression models for heterogeneous data with massive outliers. Journal of Applied Statistics, 46(8):1456-1477, 2019.
[18]
J. Tang, S. Alelyani, H. Liu. Feature selection for classification: A review. Data classification: Algorithms and applications, p 37, 2014.
[19]
J. Vosecky, D. Hong, and V.Y Shen. User identification across multiple social networks. In 2009 first international conference on networked digital technologies, pages 360-365. IEEE, 2009.
[20]
T. Iofciu, P. Fankhauser, F. Abel, and K. Bischo, Identifying users across social tagging systems. In Fifth International AAAI Conference on Weblogs and Social Media, 2011.
[21]
O. Munoz-Garcia, A. Garcia-Silva, O. Corcho, H. Hernandez, and C. Navarro. Identifying topics in social media posts using dbpedia. 2011.
[22]
J. Tang and H. Liu. Feature selection with linked data in social media. In Proceedings of the 2012 SIAM International Conference on Data Mining, pages 118-128. SIAM, 2012.
[23]
R. Zafarani and H. Liu. Connecting users across social media sites: a behavioral-modeling approach. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 41-49. ACM, 2013.
[24]
K. Shu, S. Wang, J. Tang, R. Zafarani and H. Liu. User identity linkage across online social networks: A review. Acm Sigkdd Explorations Newsletter, 18(2):5-17, 2017.
[25]
Y. Wang and B. Li. Sentiment analysis for social media images. In 2015 IEEE International Conference on Data Mining Workshop.
[26]
T. Kenter and M. De Rijke. Short text similarity with word embeddings. In Proceedings of the 24th ACM international on conference on information and knowledge management, pages 1411-1420. ACM, 2015.
[27]
J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu. CNN-RNN: A unified framework for multi-label image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
[28]
W. Yu, X. Sun, K. Yang, Y. Rui and H. Yao. Hierarchical semantic image matching using cnn feature pyramid. Computer Vision and Image Understanding, 169:40-51, 2018.
[29]
E. Sivak and I.van Smirnov. Parents mention sons more often than daughters on social media. Proceedings of the National Academy of Sciences, 116(6):2039-2041, 2019.
[30]
N. Peng and M. Dredze. Named entity recognition for chinese social media with jointly trained embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 548-554, 2015.
[31]
K. Garimella, G. De Francisci Morales, A. Gionis and M. Mathioudakis. Quantifying controversy on social media. ACM Transactions on Social Computing, 1(1):3, 2018.
[32]
D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019-1031, 2007.
[33]
S. Singthongchai, E. Naenudorn and S. Wanapu. Using of jaccard coefficient for keywords similarity. In Proceedings of the international multiconference of engineers and computer scientists, volume 1, pages 380-384, 2013.
[34]
Sung-Hyuk Cha. Comprehensive survey on distance/similarity measures between probability density functions. City, 1(2):1, 2007.
[35]
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Universal sentence encoder. arXiv preprint arXiv:1803.11175, 2018.
[36]
A. Del Sole. Introducing microsoft cognitive services. In Microsoft Computer Vision APIs Distilled, pages 1-4. Springer, 2018.
[37]
H.V. Nguyen and L. Bai. Cosine similarity metric learning for face veri_cation. In Asian conference on computer vision, pages 709-720. Springer, 2010.
[38]
A. Sadovnik, W. Gharbi, T. Vu, and A. Gallagher. Finding your lookalike: Measuring face similarity rather than face identity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 2345-2353, 2018.
[39]
A. Gruzd, B. Wellman & Y. Takhteyev. Imagining twitter as an imagined community. American Behavioral Scientist, 55(10):1294-1318, 2011.
[40]
J Kagstrom, R Karlsson, and E Kagstrom. uclassify web service. 2013.
[41]
BroadbandSearch. Why hackers should be afraid of how they write, 2019. URL https://www.smh.com.au/technology/why-hackers-should-be-afraid-of-how-they-write-20130116-2csdo.html.
[42]
M.C. Lee, J. Chang, and T.C. Hsieh. A grammar-based semantic similarity algorithm for natural language sentences. The Scientific World Journal, 2014.
[43]
P. M McCarthy, G.A. Lewis, D.F. Dufty, and Danielle S McNamara, Analyzing writing styles with coh-metrix. In FLAIRS Conference, pages 764-769, 2006.
[44]
M. Mautner. https://github.com/mmautner/readability, 2014.
[45]
D.S. McNamara, Y. Ozuru, A.C. Graesser, and M. Louwerse. Validating coh-metrix. In Proceedings of the 28th annual conference of the cognitive science society, pages 573-578, 2006.
[46]
G. Drakos. Handling missing values in machine learning: Part 1, 2018. URL https://towardsdatascience.com/handling-missing-values-in-machine-learning-part-1-dda69d4f88ca.
[47]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016.
[48]
A. Krizhevsky, I. Sutskever, and G.E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural Information processing systems, pages 1097-1105, 2012.
[49]
S. Wijeratne, L. Balasuriya, A. Sheth, and D. Doran. Emojinet: Building a machine readable sense inventory for emoji. In International conference on social informatics, pages 527-541. Springer, 2016.
[50]
J. Oliva, J. Serrano, M.D. del Castillo and A. Iglesias. SyMSS: A syntax-based measure for short-text semantic similarity. Data & Knowledge Engineering, 70(4):390-405, 2011.

Cited By

View all
  • (2024)Distributed Rumor Source Detection via Boosted Federated LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339023836:11(5986-6001)Online publication date: Nov-2024
  • (2022)A Comparison of Several AI Techniques for Authorship Attribution on Romanian TextsMathematics10.3390/math1023458910:23(4589)Online publication date: 3-Dec-2022
  • (2022)Sift Descriptor for Social Media User Accounts MatchingProceedings of the Sixth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’22)10.1007/978-3-031-19620-1_14(142-151)Online publication date: 31-Oct-2022

Index Terms

  1. Linking User Accounts across Social Media Platforms
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      BDCAT '21: Proceedings of the 2021 IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies
      December 2021
      133 pages
      ISBN:9781450391641
      DOI:10.1145/3492324
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 January 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      BDCAT '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 27 of 93 submissions, 29%

      Upcoming Conference

      BDCAT '24

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)41
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Distributed Rumor Source Detection via Boosted Federated LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339023836:11(5986-6001)Online publication date: Nov-2024
      • (2022)A Comparison of Several AI Techniques for Authorship Attribution on Romanian TextsMathematics10.3390/math1023458910:23(4589)Online publication date: 3-Dec-2022
      • (2022)Sift Descriptor for Social Media User Accounts MatchingProceedings of the Sixth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’22)10.1007/978-3-031-19620-1_14(142-151)Online publication date: 31-Oct-2022

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media