Nothing Special   »   [go: up one dir, main page]

Skip to main content

Retrieve the Hidden Leaves in the Forest: Prevent Voting Spamming in Zhihu

  • Conference paper
  • First Online:
Security and Privacy in Social Networks and Big Data (SocialSec 2019)

Abstract

Nowadays, more and more people start posting their opinions on online social networks, such as commercial product evaluation websites, forums, and crowdsourcing \( Q \& A\) websites. In practice, most majority vote schemes cannot reveal the true distribution of opinions, due to the spam problem. Many public relationship companies can recruit people or use automatic commenting machines to promote target products and ruin the reputation of their opponents. In such a sense, the opinions on these websites may not be reliable. In the literature, there are a lot of studies contributed to detect such spams, based on the characteristics of posted content, social relationship, user activity, posting time, etc. We find that most spam detection schemes rely heavily on the experience and preference of experts. This is dangerously as it can lead to bias and dictatorship. In this work, we take Zhihu - one popular Chinese \( Q \& A\) website as a case study, and propose a time diversity based voting scheme to reduce the impact of voting spamming. We illustrate that, our proposed opinion tolerant system can maintain a good balance in the appearance of different opinions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Where does a wise man hide a leaf? In the forest. But what does he do if there is no forest? He grows a forest to hide it in.

    By G.K. Chesterton, The Innocence of Father Brown, [17].

References

  1. Amazon. http://www.amazon.com/

  2. Ebay. https://www.ebay.com/

  3. Facebook. https://www.facebook.com/

  4. How answers are ranked in Zhihu? https://zhuanlan.zhihu.com/p/19902495/

  5. Imdb. https://www.imdb.com/

  6. Is GMO harmful to health or envrionment? https://www.zhihu.com/question/64850604/

  7. QQ International. http://www.imqq.com/English1033.html/

  8. Quora. https://www.quora.com/

  9. Stack overflow. http://stackoverflow.com/

  10. Twitter. https://www.twitter.com/

  11. Wikipedia. https://www.wikipedia.org/

  12. Youtube. https://www.youtube.com/

  13. Zhihu. http://www.zhihu.com/

  14. Arrow, K.: A difficulty in the concept of social welfare. J. Polit. Econ. 58(4), 328–346 (1950)

    Article  Google Scholar 

  15. Chen, C., Wu, K., Srinivasan, V., Zhang, X.: Battling the internet water army: detection of hidden paid posters. In: IEEE/ACM ASONAM, pp. 116–120 (2013)

    Google Scholar 

  16. Chen, Y., Chen, H.: Opinion spam detection in web forum: a real case study. In: WWW, pp. 173–183 (2015)

    Google Scholar 

  17. Chesterton, G.K.: The Innocence of Father Brown. John Lane Company (1911)

    Google Scholar 

  18. Danezis, G., Mittal, P.: SybilInfer: detecting sybil nodes using social networks. In: NDSS, pp. 1–15 (2009)

    Google Scholar 

  19. Ghosh, A., Kale, S., McAfee, P.: Who moderates the moderators? Crowdsourcing abuse detection in user-generated content. In: ACM EC, pp. 167–176 (2011)

    Google Scholar 

  20. Harris, C.G.: Detecting deceptive opinion spam using human computation. In: Workshops at AAAI on AI (2012)

    Google Scholar 

  21. Morris, M.R., Counts, S., Roseway, A., Hoff, A., Schwarz, J.: Tweeting is believing? Understanding microblog credibility perceptions. In: CSCW, pp. 441–450 (2012)

    Google Scholar 

  22. Shi, L., Yu, S., Lou, W., Hou, Y.T.: SybilShield: an agent-aided social network-based sybil defense among multiple communities. In: IEEE INFOCOM, pp.1034–1042 (2013)

    Google Scholar 

  23. Thomas, K., McCoy, D., Grier, C., Kolcz, A., Paxson, V.: Trafficking fraudulent accounts: the role of the underground market in twitter spam and abuse. In: USENIX Security, pp. 195–210 (2013)

    Google Scholar 

  24. Tuomisto, H.: A consistent terminology for quantifying species diversity? Yes, it does exist. Oecologia 164(4), 853–860 (2010)

    Article  Google Scholar 

  25. Wang, G., Konolige, T., Wilson, C., Wang, X., Zheng, H., Zhao, B.Y.: You are how you click: clickstream analysis for sybil detection. In: USENIX Security, pp. 241–256 (2013)

    Google Scholar 

  26. Wang, G., et al.: Social turing tests: crowdsourcing sybil detection. http://arxiv.org/pdf/1205.3856.pdf (2012)

  27. Wang, G., et al.: Serf and turf: crowdturfing for fun and profit. In: ACM WWW, pp. 679–688 (2012)

    Google Scholar 

  28. Wei, W., Xu, F., Tan, C., Li, Q.: SybilDefender: defend against sybil attacks in large social networks. In: IEEE INFOCOM, pp. 1951–1959 (2012)

    Google Scholar 

  29. Wilson, E.B.: Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 22(158), 209–212 (1927)

    Article  Google Scholar 

  30. Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. ACM Trans. Knowl. Discov. Data 8(1), 1–29 (2014)

    Article  Google Scholar 

  31. Yu, H., Gibbons, P.B., Kaminsky, M., Xiao, F.: SybilLimit: a near-optimal social network defense against sybil attacks. In: IEEE S&P, pp. 3–17 (2008)

    Google Scholar 

  32. Yu, H., Kaminsky, M., Gibbons, P.B., Flaxman, A.: Sybilguard: defending against sybil attacks via social networks. SIGCOMM 36(4), 267–278 (2006)

    Article  Google Scholar 

  33. Zhenga, X., Zenga, Z., Chen, Z., Yua, Y., Rong, C.: Detecting spammers on social networks. Neurocomputing 159, 27–34 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jun Zhang or Houda Labiod .

Editor information

Editors and Affiliations

A Definition of Diversity of Visibility

A Definition of Diversity of Visibility

In order to formally evaluate the appearance of the diversity of answers, we borrow the concept of the true diversity from [24]. Considering a set of groups G and supposing the proportion of each group \(g_i \in G\) is \(q_i\), the true diversity with 1-mean is

$$\begin{aligned} D=\exp \left( -\sum _{g_i \in G} q_i \ln (q_i) \right) . \end{aligned}$$
(12)

A large number of true diversity indicates that there is a good balance between the proportion of species.

In our case, the diversity is not only related with number of answers in different groups, but also their positions. Let us denote the position of an answer a under the ranking policy p as L(ap). We consider the fact that the visibility of an answer decreases with the decreasing of its position. Therefore we define the visibility index of an answer at position L as \(\lambda ^L\), where \(\lambda \) is the decay factor, under the assumption that there is an exponential decrease of visibility by rankings. For a set of answers N, we denote the set of its groups as G(N), such that

$$\begin{aligned} \nonumber&\forall g \in G(N), g \in 2^{N} \\ \nonumber&\forall g_i,g_j \in G(N), g_i \cap g_j = \varnothing \\&\cup _{g \in G(N)}=N. \end{aligned}$$
(13)

The total visibility index of each group \(g \in G(N)\) is

$$\begin{aligned} V(g,p)=\sum _{a \in g} \lambda ^{L(a,p)}. \end{aligned}$$
(14)

The corresponding visibility ratio of each group \(g \in G(N)\) is

$$\begin{aligned} q(g,p)=\frac{V(g,p)}{\sum _{g' \in G(N)} V(g',p)}. \end{aligned}$$
(15)

Then the diversity of visibility of a set of answers N under the ranking policy p is defined as the true diversity of the visibility of groups. Formally, it is defined as

$$\begin{aligned} D_{vis}(N,p)= \exp \left( - \sum _{g \in G(N)} q(g,p) \ln \left( q(g,p) \right) \right) . \end{aligned}$$
(16)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, J., Labiod, H. (2019). Retrieve the Hidden Leaves in the Forest: Prevent Voting Spamming in Zhihu. In: Meng, W., Furnell, S. (eds) Security and Privacy in Social Networks and Big Data. SocialSec 2019. Communications in Computer and Information Science, vol 1095. Springer, Singapore. https://doi.org/10.1007/978-981-15-0758-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-0758-8_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-0757-1

  • Online ISBN: 978-981-15-0758-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics