Network Centrality
Throughout is the adjacency matrix where if links to , and is -th element of vector .
Degree
Count the number of neighbors of a node as the degree centrality.
- Degree centrality does not capture the importance of indirect connections. Not all neighbors are equal.
Eigenvector
When the score is proportional to the sum of neighboring scores, we get
Since has non-negative elements, by the Perron-Frobenius theorem, only the leading eigenvector (of each component) has all elements non-negative and survives the power iteration. Thus, we let be the largest eigenvalue of and the eigenvector centrality
is the eigenvector corresponding to .
- Eigenvector centrality is vacuous for acyclic directed graphs, as the source nodes have zero score, and hence all nodes have zero score as well.
Katz-Bonacich
Extending the eigenvector centrality, the Katz-Bonacich centrality assigns each node an additional constant score. Normalizing this constant score to one gives
where is a discount factor that captures the relative importance of direct and indirect connections. The closed-form solution is
As , aligns with the eigenvector centrality . As , aligns with the degree centrality .
By the Neumann series expansion, the Katz-Bonacich centrality of a node counts the number of distinct walks starting from it, discounted by their length:
- A node with a high Katz-Bonacich centrality increases the centrality of its neighbors by the same amount no matter how many neighbors it has, which may not be realistic in some settings.
PageRank
Extending the Katz-Bonacich centrality, PageRank dilutes the importance contribution of a node to its neighbors by its number of outgoing edges. Intuitively, a page is important if another page points to it, but is less important if that other page points to many pages. Formally,
where is the out-degree matrix lower bounded by , giving
Removing the constant term gives
Note that for undirected graphs and is the leading eigenvector of with eigenvalue . Thus, reduces to the degree centrality for undirected graphs.
For directed graphs, faces the same issue as the eigenvector centrality. From another perspective, we see that is a row-stochastic matrix, which specifies a Markov Chain corresponding to a random walk, and is the stationary distribution. To prevent the Markov chain being trapped in a sink node, the additional constant term helps PageRank jump to a node uniformly at random with probability .
- When assigning PageRank and previous centralities, each node plays two roles: it receives importance from its neighbors and contributes importance to its neighbors. A closer look inspires us to find the most influential and most influenced nodes using separate scores.
Hub and Authority
Consider each node has two roles: hub and authority. A hub node is more important if it points to more important authorities; an authority node is more important if it is pointed to by more important hubs. Associate each node with a hub score and an authority score . Formalize the above relationship with a linear form:
A fixed point solution satisfies
Rmk
counts the number of nodes that point to both and ; counts the number of nodes that are pointed to by both and .
Similar to Eigenvector centrality, if and are irreducible, then only their leading eigenvectors have all elements non-negative and survive the power iteration. Thus, let and and be the leading eigenvectors of and , referred to as the authority and hub centrality, respectively.
Closeness
- Previous centralities are based on the adjacency matrix. Can we propose other centralities based on path or connectivity?
Katz-Bonacich centrality gives an interpretation of counting walks: a node has a high centrality if it can initiate many short walks. In the same spirit, we can assign a high centrality to a node if it can reach many nodes through short paths.
The closeness centrality of a node is the reciprocal of the average shortest path length from it to all other nodes:
However, when the network is not connected, the above expression always gives zero. To fix this issue, we can use a harmonic mean instead:
Betweenness
- We can also assign a high centrality to a node if it is key to connecting other nodes.
The between centrality of a node is the proportion of shortest paths between pairs of nodes that pass through it. Formally, let be the set of shortest paths between nodes and , and be the set of those paths that pass through node . Then the betweenness centrality of node is defined as:
- All previous centralities all have a positive correlation with degree centrality, i.e., measure how well-connected a node is. However, a low-degree node that is distant from other nodes can have a high betweenness centrality.
Random Walk Betweenness
Replacing the shortest path in Betweenness centrality with a random walk gives the random walk betweenness centrality:
where is the probability that a random walk passes through conditioned on it starting from and ending at .