stcn major 2

yA enero Traditional Feature- ed Methods: Graph > @ = Key Count the number of different graphlets in a graph. Note: Definition of graphlets here is slightly different from node-level features. Thestwo ai . Stanford e two differences are: University = Nodes in graphlets here do not need to be connected (allows for isolated nodes) Stanford CS224W: Machine Learning with Graphs * The graphlets here are not rooted. = Examples in the next slide illustrate this. bestia Erol, Jute | sek Lecture 2: ‘Traditional feature-based Inet seo evel eeiereStanford CS224W: Graph-Level Features €8 Stanford €3 University and Graph Kernels LY ZZ Ni Jure Leskovec, http://cs224w.stanford.edu© We want features that characterize the structure of an entire graph. = For example: ‘ f § Stanford University Stanford CS224W: Machine Learning with Graphs Instructor: Prof. Jure Leskovec Lecture 2.3: Traditional feature-based ‘methods: Graph-level featuresBackground: Kernel Methods are widely-used for traditional ML for graph-level prediction. Idea: Design kernels instead of feature vectors. Kernel K(G,G') € R measures similarity b/w data Kernel matrix K = (KG G’)) must always be Stanford = , Gc! y 3 University positive semidefinite (ie, has positive eigenvals) Therevexists\a feature representation (-) such that Stanford CS224W: Machine Learning with Graphs K(G,G') = 6(G)"(G') Instructor: Prof. Jure Leskovec Once the kerfiel is defined, off-the-shelf ML model, ‘SPT TALA pesed such as kerne! SVM, can be used to make predictions. ‘methods: Graph-level features. Measure similarity between two graphs: Other kernels are also proposed in the literature (beyond the scope of this lecture) 3 Stanford University = Random-walk kernel * Shortest-path graph kernel Machine Leaning with Graphs And many more... ference teres legen “Etclortgraphet amols forage graph comparson” Artic nlignce and Statist. 200 Traditional feature-based “Waiatarehmangropn weet" lovns of Machine Cearnng Research 12.9 (201) ‘methods: Graph-level features= Key idea: Bag-of-Words (BoW) for a graph Recall: BoW simply uses the word counts as features for documents (no ordering considered). / Naive extension to a graph: Regard nodes as words. Since both graphs have 4 red nodes, we get the same feature vector for two different graphs... Stanford University Stanford CS224W: Machine Learning with Graphs ¢ (N )= $ (N ) nett Fret rs Leetees Lecture 2.3: Traditional feature-based ‘methods: Graph-level featuresDegi:e Deg2:e Deg3: © ae #(N) = count( NJ )= 01, 3, 0] Obtains different features 5S for different graphs! (JN) = count(XI ) = (0, 2, 2] Both Graphlet Kernel and Weisfeiler-Lehman (WL) Kernel use Bag-of-* representation of graph, where * is more sophisticated than node degrees! Stanford 3 University Stanford CS224W: Machine Learning with Graphs Instructor: Prof. Jure Leskovec Lecture 2.3: Traditional feature-based ‘methods: Graph-level featuresFor k = 3, there are 4 graphlets. n G2 93 a A A /. Stanford For ke 4, there are 11 graphlets. 3 University Stanford CS224W: Machine Learning with Graphs Instructor: Prof. Jure Leskovec 7 eo ° Lecture 2.3: ° o oo 00 00 Traditional feature-based ‘methods: Graph-level features= Given graph ©, and a graphlet list ( define the graphlet count vector /;, € IR" as = Example for k = 3 Stanford 3 University n a Stanford CS224W: Machine Learning with Graphs Instructor: Prof. Jure Leskovec = =(y 3 0)T Lecture 2.3: , 6, Traditional feature-based ‘methods: Graph-level featuresW: ML with Graph: ecture 2.3 - Traditional Feature-based Methods: Graph > @ = Given two graphs, G and G’, eraphiet kernel is computed as K(G,G) = fa" Fo . if G and G’ have different sizes, that will eal skew the value. on: normalize each feature vector Stanford University Stanford CS224W: Machine Learning with Graphs fe T helo KCG.6) = hcth oe 6 = SamGa) KG) = he Por — Seo Faun estes eae etreeT eT ont Cte) Limitations: Counting graphlets is expensive! Counting size-k graphlets for a graph with size n by enumeration takes n*. This is unavoidable in the worst-case since subgraph isomorphism test (judging whether a graph is a subgraph of another graph) is NP-hard. If a graph’s node degree is bounded by d, an O(nd*-') algorithm exists to count all the graphlets of size k. Can we design a more efficient graph kernel? DR ee a eC Pa Cee es ace T aa | ee a | Y Stanford 3 University Stanford CS224W: Machine Learning with Graphs Instructor: Prof. Jure Leskovec Lecture 2.3: ‘Traditional featute-based Ipethods: Graph-evel featgayeater aCe ae @ Assign initial colors LS Las oo 9 © Stanford Aggregate neighboring colors University pt Poe faci Crs icra Gan Cait > an Machine Learning with Graphs in > 3 @ @ & D wn => Instructor: Prof. Jure Leskovec DO ® GO, wn 8 Lecture 2.3 Traditional feature-based ‘methods: Graph-level featuresxample o yr refine Aggregated colors LS ® © ee Stanford Hash aggregated colors University et a eee a Z a5 ap Instructor: Prof, Jure Leskovec a Z @ Trastchel Bhd posed ‘methods: Graph-level featuresmp! f rrefin nt given two ¢ Aggregated colors Zs & Do ®D QD @S Stanford Hash aggregated colors Hash table 3 University Be ZS 2, a: Stanford CS224W: 2, et 4 Machine Learning with Graphs Qe &— | FI eee eee ers Lecture 2.3: Traditional feature-based 13 ‘methods: Graph-level featuresW: ML with Graph Traditional Feature-based Methods: Graph Colors 1 5,6,7,8,9,10,11,12,13 = [6,2,1,2,1,0,2,1,0, 0, 0, 2, 1] Counts to a o> 1,2,3,4,5,6,7,8,9,10,11,12,13 = [6,2,1,2,1,1,1,0,1, 1, 1, 0, 1] Ce} Stanford University Stanford CS224W: Machine Learning with Graphs Instructor: Prof. Jure Leskovec Lecture 2.3: ‘Traditional feature-based eases Ceo eileLecture 2.3 - Traditional Feature-based Methods: Graph @_ > @ Cone a CTS Stanford University Stanford CS224W: Machine Learning with Graphs I KR oO Instructor: Prof. Jure Leskovec Lecture 2.3: ‘Traditional feature-based ates aa aie= WL kernel is computationally efficient The time complexity for color refinement at each step is linear in #(edges), since it involves aggregating neighboring colors. = When computing a kernel value, only colors appeared in the two graphs need to be tracked. 3 Thus, #(colors) is at most the total number of nodes. Stanford University Stanford CS224W: Machine Learning with Graphs = Counting colors takes linear-time w.rt. #(nodes). instructor: Prof. Jure Leskovec Lecture 2.3: = In total, time complexity is linear in #(edges). ee eee ‘methods: Graph-level features= Graphlet Kernel Graph is represented as Bag-of-graphlets Computationally expensive = Weisfeiler-Lehman Kernel Apply K-step color refinement algorithm to enrich node colors Stanford * Different colors capture different K-hop neighborhood 3 University structures . Stanford CS224W: Graph is represented as Bag-of-colors Machine Learning with Graphs Computationally efficient Instructor, Prof, dure Leskowrec Closely related to Graph Neural Networks (as we Neggturs 3.8 ie Traditional feature-based will see!) methods: Graph-level featuresSuc eau eee \ ee ae) Hand-crafted feature + ML model Node-level: * Node degree, centrality, clustering coefficient, graphlets Link-level: * Distance-based feature Stanford University Stanford CS224W: = local/global neighborhood overlap erie eerie * Graph-level: » Instructor: Prof. Jure Leskovec * Graphlet kernel, WL kernel Leqture 2.8: ‘Traditional feature-based estas setae earStanford CS224W: Node Embeddings €; Stanford 9 "University CS224W: on ol Camo eee 1s http://cs224w.stanford. aGiven an input graph, extract node, link and graph-level features, learn a model (SVM, neural network, etc.) that maps features to labels. Stanford University Input Structured Learning Prediction Graph Features Algorithm Stanford CS224W: Machine Learning with Graphs Instructor: Prof. Jure Leskovec Feature engineering Downstream eae, (node-level, edge-level,graph- prediction task Node Embeddings Tove features) eGraph Representation Learning Graph Representation Learning alleviates the need to do feature engineering every single time. ® Stanford Input Structured Learning 63 University Graph Features Algorithm Precicnon Stanford 6S224W: Machine Learning with Graphs Representation Learning -- Dow seer Automatically learn the features Lecture, 3.1 Node Embeddings Instructor: Prof. Jure LeskovecGraph Representation Learning Goal: Efficient task-independent feature learning for machine learning with graphs! node vector Stanford u fu Rt 3 University ‘ Ré ae ee nnn Feature representation, instructor Prot. Jure Leskovec embedding Lecture 3.1 Node Embeddings:Why Embedding? * Similarity of embeddings between nodes indicates their similarity in the network. For example: = Both nodes are close to each other (connected by an edge) A Stanford = Encode network information * Potentially used for many downstream predictions 3 Universicy Stanford CS224W: Vec Tasks Machine Learring with Graphs Node classification Link prediction Instructor: Prof. Jure Leskovec Graph classification Anomalous node detection Clustering Lecture 3.1 Node Embeddings: embeddings IR4= 2D embedding of nodes of the Zachary’s Karate Club network: ° * o, ° Po IR - * Stanford MI, University ST 4 Stanford CS224W: i Feo ene vi mg Instructor: Prof. Jure Leskovec Input Output Lecture 3.1 Node Embeddings:Stanford CS224W: Node Embeddings: €§ Stanford 9 University Encoder and Decoder Poean eee OE oN TRC) http://cs224w.stanford.edu= Assume we have a graph G: » Vis the vertex set. » Ais the adjacency matrix (assume binary). = For simplicity: no node features or extra information is used Stanford University 0104 re eae A 10 oO a t Prof. Jure Leskc V: {1, 2, 3, 4} 0001 ent ae eons Node Embeddings:= Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the graph a Stanford ENC(u) University <\ encode nodes f Stanford CS224W: A \ ms Machine Learning with Graphs Lecture 3.4 embedding space ect.4W: Machine Learning with Graphs | 2021 | Lecture 3.1 - Node Embeddings Goal: similarity(u,v) ~ zz, in the original network Similarity of the embedding ENC(u) Stanford University é\ —* encode nodes A ~Y ‘Stanford CS224W: | Machine Leaming with Graphs ENC(u) Instructor: Prof. ure Leskovec original network embedding space Lecture 3.1 Node Embeddings:W: Machine Learning with Graphs | 2021 | Lecture 3.1 - Node Embeddings Goal: similarity(u,v) ~ zz, in the original network Similarity of the embedding ® ENC(u) Stanford University L\ fence nodes mS ‘Stanford CS224w: | { \y Machine Looting #8 Grats Instructor: Prof. Jure Leskovec Lecture 3.1 Node Embeddings: original networkTr ee Node Embeddings W: Machine Learning with Graphs | 2021 | Lecture 3.1 - Node Embeddings Encoder maps from nodes to embeddings Define a node similarity function (i.e., a measure of similarity in the original network) Decoder maps from embeddings to the similarity score Optimize the parameters of the encoder so 3 santor that: F . p(T ay Stanford CS224W: ee Machine Learning with Graphs similarity(u,v) ~ zz, in the original network Similarity of the embedding Instructor: Prof. Jure Leskovec Lecture 3.1 Node Embeddings:IW: Machine Learning with Graphs | Two Key Components Peet eRe cue nter) vector ENC(v) =z, noc he input ¢ specifies how the Stanford relationships in vector space map to the University relationships in the original network Stanford os224w: ET t Nahe oer Gro similarity(u,v) ~ Zj)Zy Decoder instructor: Prot. ture Leskovec Similarity of u and v in dot product between node | Neksteedabiags the original network x embeddings 8Simplest encoding approach: Encoder is just an embedding-lookup ENC(v) =z, =Z-v ax|v, Matrix, each column is a node 3 Stanford ZER embedding [what we learn / Went) optimize] Stanford CS224W: Machine Learning with Graphs ql indicator vector, all zeroes Instrictor: Prof. ure Lsskoen vVE except a one in column ae indicating node v Node EmbeddingsSimplest encoding approach: encoder is just an embedding-lookup embedding vector for a embedding specific node matrix \ Stanford Dimension/size 3 University Z= of embeddings Stanford CS224W: Machine Learning with Graphs Instructor: Prof. Jure Leskovec ‘one column per node Nei eros“Shallow” Encoding Simplest encoding approach: Encoder is just an embedding-lookup Each node is assigned a unique embedding vector (i.e., we directly optimize the embedding of each node) Stanford 63 University Stanford CS224W: Machine Learning with Graphs Instructor: Prof. Jure Leskovec Many methods: DeepWalk, node2vec euaitord* Shallow encoder: embedding lookup * Parameters to optimize: Z which contains node embeddings z,, for all nodes u € V We will cover deep encoders (GNNs) in Lecture 6 4 Stanford University * Decoder: b&sed on node similarity. aan ae =) . Machine Learning with Graphs * Objective: maximize z;z,, for node pairs (u, v) that are similar Instructor: Prof. Jure Leskovec Lecture 3.1 Node Embeddings:= Key choice of methods is how they define node similarity. = Should two nodes have a similar embedding if they... * are linked? Stanford = share neighbors? University = have similar “structural roles”? stanford €$224W » We will now learn node similarity definition that uses ce Sl eee random walks, and how to optimize embeddings for Instructor: Prof. Jure Leskovec: such a similarity measure. Lecture 3.1 Node Embeddings:Ecc) W: Machine Learning with Graphs | 2021 | Lecture 3.1 - Node Embeddings Note on Node Embeddings This is unsupervised/self-supervised way of learning node embeddings * We are not utilizing node labels * We are not utilizing node features The goal is to directly estimate a set of coordinates (i.e., the embedding) of a node so that some aspect 3 pone of the network structure (captured by DEC) is elite preserved Mestine iene ‘with Graphs These embeddings are task independent Instructor: Prof. Jure Leskovee = They are not trained for a specific task but can be Lectura 3.5 Node Embeddings used for any task.Soya a a Pn eee Le ca Can An a ee €% Stanford 9 University Stanford CS224W: Machine Learning with Graphs Coen ake __ Inst: Prot. Jr Leskovec Jure Leskovec, Stanford University { koctura 32: http://cs224w.stanford.edu fornog Emacs i) Pe tsW: ML with Graphs | 2021 | Lecture Notation Vector Z,,: *» The embedding of node u (what we aim to find). Probability P(v |z,,) : <== Our model prediction based on z,, The (predicted) probability of visiting node v on random walks starting from node u. Non-linear functions used to produce predicted probabilities Softmax function one Turns vectr of K real values (model predictions) into te e 4 K probabilities that sum to 1: o(z); = he Fee trea econ Sigmoid function: Instructor: Prof. Jure Leskovec » S-shaped function that turns real values into the range of (0, 1). — Written as S(x) = Tras Random Walk Approaches for Node Embeddings @on a graph and a starting point, we select a neighbor of it at random, and move to this neighbor; then we select a neighbor of this point at random, and move to it, etc. The (random) sequence of points visited this way is a random walk on the graph. 3 Stanford University Stanford CS224W: Machine Learning with Graphs Instructor: Prof. Jure Leskovec Lecture 3.2: Random Waik’Approaches for Node EmbeddingsSURI Sena ne «| Y a W: ML with Graphs ¥ 7 probability that w ) ZyZy © and yco-occur on § a random walk over the graph Hast Corrina Wit rach Instructor: Prof. Jure Leskovec Stanford University Lecture 3.2: Random Walk Approaches for Node Embeddings €j)Random-Walk Embeddings Estimate probability of visiting node v ona random walk starting from node u using some random walk strategy R ‘A x PTS Stanford ‘ (vu) 8 University Optimize embeddings to encode these Beer random walk statistics: ZL Machine Learning with Graphs Instructor: Prof. Jure Leskovee Similarity in embedding space (Here: dot 8 « Pr(v|t)| pectueaz product=cos(@)) encodes random walk “similarity” ara een jae padeaeW: ML with Graphs Why Random Walks? Pane Pac UM Lev rong casa ee MY Expressivity: Flexible stochastic definition of node similarity. that Idea: if random walk earn from node u visits v with high probability, w and v are similar (high-order multi-hop information) Stanford CS224W: Efficiency: Do not need to consider all pode Mschine Leeming wih Graphs Instructor: Prof. Jure Leskovec pairs when training; o} pairs that co-occur on rar Lecture 3.2: Random Walk Approaches for Node EmbeddingsUnsupervised Feature Learning Ma UI Intuition: Find embedding of nodes in d-dimensional space that preserves similarity Idea: Learn node embedding such that nearby nodes are close together in the network Stanford 9 University Given a node u, how do we define nearby pee caer nodes? Machine Learning with Graphs * Nr(u) ... neighbourhood of u obtained by some _ Pie ee aha random walk strategy R Sahsah ME Gbrosches for Node EmbeddingsNode Embed. @ > @ W: ML with Graph Tea UMC a = Given G = (V,E), = Our goal is to learn a mapping f:u > R?: fF) =2% = Log-likelihood objective: Stanford max » log P(NR()I Zu) University uev = Np(u) is the neighborhood of node u by strategy R ea eae Machine Learning with Graphs Instructor: Prof. Jure Leskovec = Given node u, we want to learn feature Lecture 3.2; representations that are predictive of the nodes Race WakAbbroaches forNode EmbeddingsTr enton Walk Optimization W: ML with Graphs | 2021 | Lecture 3.2-Random Walk Approaches for Node Embed. @ 4 @ Run short fixed-length random walks starting from each node uw in the graph using some random walk strategy & For each node u collect Np(u), the multiset* of nodes visited on random walks starting Stanford Hom 23 University Optimize embeddings according to: Given . eat 5 Stanford CS224W: node u, predict its neighbors Np (u) Machine Learning with Graphs Instructor: Prof. Jure Leskovec max » log P(Ng(uU)| Zy) => Maximum likelihood objective Lecture 32: f Random Walk Approaches » for Node Embeddings — @) ited tiple times on random walksRandom Walk Optimization Equivalently, £=) YS -log(P(vlzw)) uceV VENR(U) Intuition: Optimize embeddings z,, to maximize ee Stanford the likelihood of random walk co-occurrences 3 University P(v|z,) Machine Leerring wit Grophe Why softmax? exp(z2z,) We want node v to be Instructor: Prof. ure Leskovec P(viz,) = most similar tonode «| Geotye 3p ‘U. T, (out of all nod Ynev CXP(ZezZn) oka Random Waik Approaches es for Node EmbeddingsRandom Walk Optimization Equivalently, £=) YS =log(P(vtzn)) ueV VENR(U) ituition: Optimize embeddings z,, to maximize Srantond the likelihood of random walk co-occurrences 3 University P(v|z,,) Machine Lerring wit roche T. Why softmax? exp(giz,) Wowent rode» to be Instructor: Prof. ure Leskoveo P24) == oat Lecture a2 Ynev CXP(ZeZn) Random Walk Approaches for Node Embeddings max exp(x,)Random Walk Optimization bz Stanford 63 University Putting it all together: exp(ZuZv) a Znev exp(ZyZn) £=) YY) -t0g ) uUeV vVENR(u) Stanford CS224W: Machine Learning with Graphs Optimizing random walk embeddings = 7 — Rahaor Wak Apbroaches Finding embeddings z,, that minimize L for Node EmbeddiPet Pa URC eee Roa) ola ae | a EUR ear) exp (ZZ) ) Lnev €XP(ZuZn) Stanford midis 3 University Stanford C8224W Le > » —log( ueV vENR(u) Nested sum over nodes gives Machine Learning with Graphs O(|VI2) complexity! inetietee erot luca estate Lecture 3 2: Random Waik Approaches for Node Embeddings €exp(ZuZv) T Znev €XP(ZuZn x Stanford 63 University La > > —log( ueV vENR(u) The normalization term from the softmax is the culpri Stanford CS224W: .. can we approximate it? Machine Learning with Graphs Instructor: Prof. Jure Leskovec Lecture 3.2 Random Waik’Approaches for Node EmbeddingsNegative Sampling Solution: Negative sampling exp(zuZv) Lnev exP(Zh2n) ~ log (a(232,)) — EK. 4 log (a(222,,)), mi~Py bor og( ) Stanford CS224W: random distribution Machine Learning with Graphs sigmoid function 2 over nodes Instructor: Prof. Jure Leskovec Instead of normalizing w.r.t. all nodes, just loci Random Waik’Approaches normalize against k random “negative samples” 7; fr Node embedrandom distribution T, exp( ZZ, P(ti2») over nodes Znev &xP(Zi2n) . \ = log (o(ziz,)) - Dan log (o(z%z»,)) n~Py = Sample k negative nodes each with prob. proportional to its degree Stanford 0S224W: . x 4 Machine Learning with Graphs = Two considerations for k (# negative samples): 1. Higher k gives more robust estimates log¢ Stanford University Instructor: Prof. Jure Leskoveo Lecture 3.2 2. Higher k corresponds to higher bias on negative events Random Walk Approaches: : for Node Embeddings In practice k 5= After we obtained the objective function, how do we optimize (minimize) it? £= LY tog Pete) fat chet = Gradient Descent: a simple way to minimize L: ® Stanford University * Initialize z; at some randomized value for all i. Stanford 6S224W: * Iterate until convergence. Machine Learning with Graphs Instructor: Prof. Jure Leskovec ac * For all i, compute the derivative =. me leaming rate az Lecture 32 az RandomWaiklApproaches * Forall i, make a step towards the direction of derivative:z; < zj — 32, for Node EmbeddingsStochastic Gradient Descent = Stochastic Gradient Descent: Instead of evaluating gradients over all examples, evaluate it for each individual training example. = Initialize z; at some randomized value for all i. , Stanford = Iterate until convergence: £“ = y —log(P(vl2)) University vege . ; x atin GLO * Sample a node i, for all j calculate the derivative 2, stanford cs224w a2; Machine Leaning with Graphs aco Instructor: Prof. Jure Leskovec * For all j, update:z) < z) —n —. 7 eure 32: Random WaikApbroaches for Node Embeddings,Random Walks: Summary Run short fixed-length random walks starting from each node on the graph For each node u collect Np (u), the multiset of nodes visited on random walks starting from wu Optimize embeddings using Stochastic Stanford Gradient Descent: 3 University Stanford CS224W: Machine Learning with Graphs £=) YD -togP(wten)) UEV VENR(U) Instructor: Prof. Jure Leskovec Lecture 3.2: OX is usir Random Waik’Approaches for Node EmbeddingsHao) atarolel (eM U-M Tae le) al A] Leg = So far we have described how to optimize embeddings given a random walk strategy R = What strategies should we use to run these random walks? = Simplest idea: Just run fixed-length, unbiased random walks starting from each node (i.e., DeepWalk from Perozzi et al., 2013) Stanford University Stanford CS224W: = The issue is that such notion of similarity is too constrained Machine Learning with Graphs . . Instructor: Prof. Jure Leskovee = How can we generalize this? ‘ Lecture 3.2 Random Waik/Approaches Reference: Perozzi etal. 2014. Deer ic sions. KDD, for Node EmbeddingsSoca) W: ML with Graph Peete Pau eco ucR a esta a a i) Overview of node2vec Goal: Embed nodes with similar network neighborhoods close in the feature space. We frame this goal as a maximum likelihood optimization problem, independent to the downstream prediction task. Key observation: Flexible notion of network neighborhood Np(u) of node u leads to rich node i Stanford CS224W: embeddings Mactine Learning with Graphs Develop biased 2" order random walk R to Instron: Prot ture Lesioven generate network neighborhood Np(u) of nodeu — Leste 9 Random Walk Approaches Reference: Grover et al. 2016. node2vec: Scalable Feature Learning for Networks. KDD. for Node EmbeddingsIdea: use flexible, biased random walks that can trade off between | and global views of the network (Grover and Leskovec, 2016). f® Stanford 63 University Stanford CS224W: Machine Learning with Graphs Instructor: Prof. Jure Leskovec Lecture 3.2 Random Waik’Approaches for Node Embeddingsnodea2vec: Biased Walks Two classic strategies to define a neighborhood Np(w) of agiven node u: Stanford 3 University Stanford CS224W Walk of length 3 (Np (w) of size 3): Machine Learning with Graphs Instructor: Prof, Jure Leskovec Npps(u) = {51,S2,53} Local microscopic view te. Random Waik'Approaches Nopgs(u) = { $4, 55,56} Global macroscopic view ‘orNode Embeddings® 952 Qe OO O-O-O O60 OO O° Stanford 09 University BFS : D FS: ‘Stanford CS224w: Micro-view of Macro-view of Machine Learning with Graphs neighbourhood neighbourhood rat Dotot: Protire laekovec! Lecture 3.2 Random Waik’Approaches for Node EmbeddingsBiased fixed-length random walk R that given a node u generates neighborhood Npr(u) = Two parameters: Return parameter p: * Return back to the previous node Stanford In-out parameter q: 9 University = Moving outwards (DFS) vs. inwards (BFS) Stanford CS224W: * Intuitively, q is the “ratio” of BFS vs. DFS Machine Learning with Graphs Instructor: Prof. Jure Leskovec Lecture 3.2; Random Walk Approaches for Node Embeddings @)Biased 2"4-order random walks explore network neighborhoods: * Rnd. walk just traversed edge (s,, w) and is now at w Insight: Neighbors of w can only be: ‘Same distance to s1 a % Stanford 5 University Stanford CS224W: Machine Learning with Graphs A O—s: net ictor Prot eshotes a “Back tos; Lecture 3.2 Idea: Remember where the walk came from Random Waik’Approaches for Node EmbeddingsRandom Walk Approaches for Node Embed... @ > @ W: ML with Graphs | 2021 | Lecture Biased 2"4-order random walks explore network neighborhoods: » Rnd. walk just traversed edge (s,, w) and is now at w Insight: Neighbors of w can only be: ‘Same distance to sy Stanford University Stanford CS224W: Machine Learning with Graphs Instructor: Prof. Jure Leskovec Lecture 32; Random Walk Approaches for Node Embeddings €) + — or= Walker came over edge (sy, w) and is at w. Where to go next? 1/p,1/q,1 are unnormalized probabilities Stanford University = p,q model transition probabilities Races area et rare " p... return parameter Instructor: Prof. Jure Leskovec " q.. "walk away” parameter Lecture 3.2: Random Waik’Approaches for Node EmbeddingsBiased Random Walks Walker came over edge (sj, w) and is at w. Where to go next? Target’ Prob. Dist. (s3,¢) 81] |1/p | 0 = W- |So 1 1 aS tanford s3||1/a| 2 3 University s,|/a| 2 7 t Machine Learning with Graphs * BFS-like walk: Low value of p t fectuonl jason Inetictor Prof urs Cshetee * DFS-like walk: Low value of q —s Lecture 3.2 Np(u) are the nodes visited by the biased walk Compute random walk probabilities = 2) Simulate r random walks of length l starting from each node u = 3) Optimize the node2vec objective using Stochastic Gradient Descent f* Stanford 63 University = Linear-time complexity ae = All 3 steps are individually parallelizable Instructor: Prof. Jure Leskovec Lecture 3.2 Random Waik’Approaches for Node Embeddings= Different kinds of biased random walks: * Based on node attributes (Dong et al., 2017). Based on learned weights (Abu-E|-Haija et al., 2017) = Alternative optimization schemes: Directly optimize based on 1-hop and 2-hop random walk probabilities (as in LINE from Tang et al. 2015). Stanford 7 ; 3 Universi = Network preprocessing techniques: : LIVETSICY, * Run random walks on modified versions of the original SEA a network (e.g., Ribeiro et al. 2017's struct2vec, Chen et al. 2016's HARP). Instructor: Prof. Jure Leskovec Lecture 3.2: Random Waik/Approaches for Node Embeddings= Core idea: Embed nodes so that distances in embedding space reflect node similarities in the original network. = Different notions of node similarity: * Naive: similar if 2 nodes are connected * Neighborhood overlap (covered in Lecture 2) Random walk approaches (covered today) Beeetoren Machine Learning with Graphs § Stanford University Instructor: Prof. Jure Leskovec Lecture 3.2 Random Waik’Approaches for Node EmbeddingsSo what method should | use..? = No one method wins in all cases.... E.g., node2vec performs better on node classification while alternative methods perform better on link prediction (Goyal and Ferrara, 2017 survey) = Random walk approaches are generally more Stanford efficient University = In general: Must choose definition of node Machine Leaning with Graph similarity that matches your application! Instructor: Prof, Jure Leskoveo Lecture 32: Random Waik'Approaches= Stanford CS224W: Machine Learning with Graphs | 2021 | Lecture 3.3 - Embedding Entire (UN uaue Reena Cera eee een raphs insoctor: Pots Jure Leskovec Jure Leskovec, Stanford University Lecture 3.8: http://cs224w.stanford.edu Te eaeW: Machine Learning with Graphs | 2021 | Lecture 3.3 - Embedding Entire = Goal: Want to embed a subgraph or an entire graph G. Graph embedding: zg. Stanford University original network embedding space Seco Machine Learning with Graphs = Tasks: Instructor: Prof. Jure Leskovec ® Classifying toxic vs. non-toxic molecules oe = Identifying anomalous graphs Embedding Entire GraphsSimple idea 1: = Runa standard graph embedding technique on the (sub)graph G = Then just sum (or average) the node embeddings in the (sub)graph G ® Stanford Zq= ) Ze 69 University Stanford CS224W: vEG Machine Learning with Graphs = Used by Duvenaud et al., 2016 to classify Instructor: Prof. Jure Leskovec molecules based on their graph structure eg Fiavisd craphe= Idea 2: Introduce a “virtual node” to represent the (sub)graph and run a standard graph embedding technique x <\ Nas } Stanford > 9 University Stanford CS224W: original network embedding space Machine Learning with Graphs = Proposed by Li et al., 2016 as a general struction Btoy- dure Ueskwsc, Lecture 3.8: technique for subgraph embedding Embedding Entire GraphsApproach 3: Anonymous Walk Embeddings States in anonymous walks correspond to the index of the first time we visited the node ina random walk Graph wit. “s a < Stanford Vacs Sa, Sil O20 Y y ¥ Instructor: Prof. Jure Leskovec Mi Walk 1 Anonymous Walk 2 0+0>0>8-0 0-8>0>0-0 Lecture 35: Embedding Entire Graphs Embeddings, CML 2018 htsApproach 3: Anonymous Walk Embeddings States in anonymous walks correspond to the index of the first time we visited the node ina random walk Graph a - joa ~ Stanford @ p @ O Stanford 6S224W: L7Q.9)| o>? © eho Machine Learning with Graphs V y + Instructor: Prof. Jure Leskovec Anonymous Walk I “Anonymous Walk 2 yO _0>8>0>0-0 Lecture 3.5: Embedding Entire Graphs beddings, ICML 2018 htW: Machine Learning with Graphs | 2021 | Lecture 3.3 - Embedding Entire Graphs > @ = Agnostic to the identity of the nodes visited (hence anonymous) Example RW1: ° va Step1:nodeA —— node node 2 (different from node 1) Stanford = Step 2:nodeB =—~ = Step 3:nodeC —~ node 3 (different from node 1, 2) University = Step 4:nodeB —~ node 2 (same as the node in step 2) aEpTIGaSaT = Step 5:nodeC —— node 3 (same as the node in step 3) Machine Learning with Graphs Instructor: Prof. Jure Leskovec Note: RW2 gives the same anonymous walk Lecture 33: Embedding Ente GraphsGrowth of Anonymous Walks with Length .3 Stanford University Length of Anonymous Walks Stanford cS224W: Machine Learning with Graphs Number of anonymous walks grows exponentially: "Th re 5 anon f length Instructor: Prof. Jure Leskovec : 3 Lecture 3.5 w,=111, w2=112, w3= 121, w4= 122, W5= D3 Embedding Entire GraphsVat) ( Re Rome W LLC Simulate anonymous walks w; of | steps and record their counts Represent the graph as a probability distribution over these walks For example: = Setl = 3 * Then we can represent the graph as a 5-dim vector Stanford CS224W: * Since there are 5 anonymous walks w; of length 3: 111, 112, Maine Seana wih Graphs 121, 122, 123 Instructor: Prof. Jure Leskovee = Z¢[i] = probability of anonymous walk w; in G Lecture 3.8 Embedding Entre Graphs Stanford 63 University

stcn major 2

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

stcn major 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

stcn major 2

Uploaded by

Copyright:

Available Formats

You might also like

stcn major 2

Uploaded by

Document Informationclick to expand document informationstandard topics in complex networks 2

Document Informationclick to expand document information

Copyright:

Available Formats

stcn major 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

stcn major 2

Uploaded by

Copyright:

Available Formats

You might also like