COS 435, Spring 2006

Summary of the algorithms for ranking nodes in social networks


We calculate scores on the nodes of a directed graph (the social network).

Notation:  We consider a directed graph with n nodes.  E is the adjacency matrix, i.e. E[i,j] = 1 if there is an edge from node i to node j in the graph and E[i,j] = 0 otherwise.

The HITS (hubs and authorities) algorithm: 
Let  a  be a vector of the n authority values of the n nodes in the graph and h be a vector of the n hub values. 

initialize a = (1, 1, ... , 1)T  and h = (1, 1, ... , 1)T;
repeat until convergence {
anew = ET h;
hnew = E a;
a = normalized anew;
h = normalized hnew;
}

The normalization simply divides each vector component by the vector's Euclidean length, i.e. the square root of the sum of the squares of the vector components.  Note that this normalization step differs from that in the reading I assigned in Mining the Web: Discovering Knowledge from Hypertext Data.  Instead it follows the original paper by Kleinberg.  Note that  (anew = ET h) is simply the calculation anew[i] = SUM k (E[k,i] *h[k]) for the n values of i and (hnew = E a )  is simply the calculation hnew[i] = SUM k (E[i,k] *a[k]) for the n values of i. Parameter k of the sum ranges over all n values.

The pagerank algorithm:
Let pr denote the vector of n pagerank values of the n nodes in the graph, q be the "random jump"  parameter, and tk be the outdegree of vertex k. 
initialize pr = (1/n , 1/n, ... , 1/n)T ;
repeat until convergence {

for i from 1 to n {
pr[i]newq/n + (1- q) * SUM k  (E[k,i] * ( pr[k] / tk ) )
}
pr  = prnew ;
}

Using this update formula, the components of pr always sum to 1, as one wants if pr represents the probabilities of being at the different vertices.  No normalization step is necessary.