COS 435, Spring 2006: Problem Sets

COS 435, Spring 2006

Summary of the algorithms for ranking nodes in social networks

We calculate scores on the nodes of a directed graph (the social network).

Notation: We consider a directed graph with n nodes. E is the adjacency matrix, i.e. E[i,j] = 1 if there is an edge from node i to node j in the graph and E[i,j] = 0 otherwise.

The HITS (hubs and authorities) algorithm:
Let a be a vector of the n authority values of the n nodes in the graph and h be a vector of the n hub values.

initialize a = (1, 1, ... , 1)^T and h = (1, 1, ... , 1)^T;
repeat until convergence {

a_new = E^T h;
h_new = E a;
a = normalized a_new;
h = normalized h_new;

}

The normalization simply divides each vector component by the vector's Euclidean length, i.e. the square root of the sum of the squares of the vector components. Note that this normalization step differs from that in the reading I assigned in Mining the Web: Discovering Knowledge from Hypertext Data. Instead it follows the original paper by Kleinberg. Note that (a_new = E^T h) is simply the calculation a_new[i] = SUM _k (E[k,i] *h[k]) for the n values of i and (h_new = E a ) is simply the calculation h_new[i] = SUM _k (E[i,k] *a[k]) for the n values of i. Parameter k of the sum ranges over all n values.

The pagerank algorithm:
Let pr denote the vector of n pagerank values of the n nodes in the graph, q be the "random jump" parameter, and t_k be the outdegree of vertex k.

initialize pr = (1/n , 1/n, ... , 1/n)^T ;
repeat until convergence {

for i from 1 to n {

pr[i]_new = q/n + (1- q) * SUM_k(E[k,i] * ( pr[k] / t_k) )

_}pr = pr_new;

}

Using this update formula, the components of pr always sum to 1, as one wants if pr represents the probabilities of being at the different vertices. No normalization step is necessary.