Abstract
Bipartite graphs are common in many complex systems as they describe a relationship between two different kinds of actors, e.g., genes and proteins, metabolites and enzymes, authors and articles, or products and consumers. A common approach to analyze them is to build a graph between the nodes on one side depending on their relationships with nodes on the other side; this so-called one-mode projection is a crucial step for all further analysis but a systematic approach to it was lacking so far. Here, we present a systematic approach that evaluates the significance of the co-occurrence for each pair of nodes v, w, i.e., the number of common neighbors of v and w. It turns out that this can be seen as a special case of evaluating the interestingness of an association rule in data mining. Based on this connection we show that classic interestingness measures in data mining cannot be applied to evaluate most real-world product-consumer relationship data. We thus introduce generalized interestingness measures for both, one-mode projections of bipartite graphs and data mining and show their robustness and stability by example. We also provide theoretical results that show that the old method cannot even be used as an approximative method. In a last step we show that the new interestingness measures show stable and significant results that result in attractive one-mode projections of bipartite graphs.
Similar content being viewed by others
Notes
This part reproduces parts of the conference version published with IEEE (Zweig 2010). Reproduced with permission.
According to an interestingness measure called conviction.
This is given under the assumption that the occurrence of M in the random graph model is normally distributed.
As long as this does not lead to degree 0 or degree r + 1 (l + 1) in which case nothing is done.
Note that the order was chosen for displaying reasons—none of the data samples directly showed them in this order.
References
Abdi H (2007) The Kendall rank correlation coefficient. In: Encyclopedia of measurement and statistics. Sage, Thousand Oaks
Admiraal R, Handcock MS (2008) Networksis: a package to simulate bipartite graphs with fixed marginals through sequential importance sampling. J Stat Softw 24(8):1–21
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD international conference on management of data 1993, pp 207–216
Alon U (2006) An introduction to systems biology: design principles of biological circuits. Chapman & Hall/CRC
Artzy-Randrup Y, Fleishman SJ, Ben-Tal N, Stone L (2004) Comment on “Network motifs: simple building blocks of complex networks” and “Superfamilies of evolved and designed networks”. Science 305:1107c
Barvinok A (2008) Enumerateing contingency tables via random permanents. Combin Probab Comput 17(1):1–19
Bollobás B (2001) Random graphs, 2nd edn. In: Cambridge studies in advanced mathematics, vol 73. Cambridge University Press, London
Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings ACM SIGMOD international conference on management of data 1997, pp 255–264
Brualdi RA (1980) Matrices of zeros and ones with fixed row and column vectors. Linear Algebra Appl 33:159–231
Brualdi RA (2006) Algorithms for constructing (0,1)-matrices with prescribed row and column sum vectors. Discrete Math 306:3054–3062
Chen Y, Diaconis P, Holmes SP, Liu JS (2005) Sequential monte carlo methods for statistical analysis of tables. J Am Stat Assoc 100(469):109–120
Cobb GW, Chen YP (2003) An application of Markov chain Monte Carlo to community ecology. Am Math Mon 110:265–288
Dorogovtsev SN, Mendes JF (2003) Evolution of networks. Oxford University Press
Gale D (1957) A theorem on flows in networks. Pac J Math 7:1073–1082
Gionis A, Mannila H, Mielikäinen T, Tsaparas P (2007) Assessing data mining results via swap randomization. ACM Trans Knowl Discov Data 1(3):article no. 14
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99:7821–7826
Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL (2007) The human disease network. Proc Natl Acad Sci 104:8685–8690
Greenhill C, McKay BD (2008) Asymptotic enumeration of sparse nonnegative integer matrices with specified row and column sums. Adv Appl Math 41(4):459–481
Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining—a general survey and comparison. SIGKDD Explor 2(2):1–58
Holmes RB, Jones LK (1986) On uniform generation of two-way tables with fixed margins and the conditional volume test of Diaconis and Efron. Ann Stat 24(1):64–68
Kendall M (1938) A new measure of rank correlation. Biometrika 30:81–89
Li M, Fan Y, Chen J, Gao L, Di Z, Wu J (2005) Weighted networks of scientific communication: the measurement and topological role of weight. Phys A 350:643–656
Ford LR, Fulkerson DR (1962) Flows in networks. Princeton University Press
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: Simple building blocks of complex networks. Science 298:824–827
Newman ME (2001a) Scientific collaboration networks I. Phys Rev E 64:016,131
Newman ME (2001b) Scientific collaboration networks II: shortest paths, weighted networks, and centrality. Phys Rev E 64:016,132
Newson R (2006) Efficient calculation of jackknife confidence intervals for rank statistics. J Stat Softw 15(1):1–10
Newman ME, Barabási AL, Watts DJ (eds) (2006) The structure and dynamics of networks. Princeton University Press, Princeton
Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818
Piatetsky-Shapiro G (1991) Knowledge discovery in databases. Discovery, analysis, and presentation of strong rules. AAAI/MIT Press, pp 229–248
Ravasz E, Somera A, Mongru D, Oltvai Z, Barabási AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297:1551–1553
Raeder T, Chawla NV (2011) Market basket analysis with networks. Soc Netw Anal Min 1
Ryser H (1963) Combinatorial mathematics. In: Carus mathematical monograph, vol 14. Mathematical Association of America, Washington
Vázquez A, Flammini A, Maritan A, Vespignani A (2002) Modeling of protein interaction networks. ComPlexUs 1:38–44
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244
Wasserman S, Faust K (1999) Social network analysis—methods and applications, revised, reprinted edn. Cambridge University Press, Cambridge
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442
Zhou T, Ren J, Medo M, Zhang YC (2007) Bipartite network projection and personal recommendation. Phys Rev E 76:046,115
Zweig KA (2010) How to forget the second side of the story: a new method for the one-mode projection of bipartite graphs. In: Proceedings of the 2010 international conference on advances in social networks analysis and mining ASONAM 2010, pp 200–207
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zweig, K.A., Kaufmann, M. A systematic approach to the one-mode projection of bipartite graphs. Soc. Netw. Anal. Min. 1, 187–218 (2011). https://doi.org/10.1007/s13278-011-0021-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13278-011-0021-0