Network Centrality

Throughout $A$ is the adjacency matrix where $A_{ij} = 1$ if $j$ links to $i$ , and $C (i)$ is $i$ -th element of vector $C$ .

Degree

Count the number of neighbors of a node as the degree centrality.

C_{D} = A 1 .

Degree centrality does not capture the importance of indirect connections. Not all neighbors are equal.

Eigenvector

When the score is proportional to the sum of neighboring scores, we get

C_{E} = c A C_{E} .

Since $A$ has non-negative elements, by the Perron-Frobenius theorem, only the leading eigenvector (of each component) has all elements non-negative and survives the power iteration. Thus, we let $c = λ_{1}$ be the largest eigenvalue of $A$ and the eigenvector centrality

C_{E} = v_{1}

is the eigenvector corresponding to $λ_{1}$ .

Eigenvector centrality is vacuous for acyclic directed graphs, as the source nodes have zero score, and hence all nodes have zero score as well.

Katz-Bonacich

Extending the eigenvector centrality, the Katz-Bonacich centrality assigns each node an additional constant score. Normalizing this constant score to one gives

C_{K} = α A C_{K} + 1,

where $α$ is a discount factor that captures the relative importance of direct and indirect connections. The closed-form solution is

C_{K} = (I - α A)^{- 1} 1 .

As $α \to λ_{1}^{- 1}$ , $C_{K}$ aligns with the eigenvector centrality $C_{E} = v_{1}$ . As $α \to 0$ , $C_{K} - 1$ aligns with the degree centrality $C_{D} = A 1$ .

By the Neumann series expansion, the Katz-Bonacich centrality of a node counts the number of distinct walks starting from it, discounted by their length:

C_{K} (i) = k = 1 \sum \infty j \sum α^{k} (A^{k})_{ij} = k = 1 \sum \infty α^{k} j_{1}, \dots, j_{k} \sum A_{i j_{1}} A_{j_{1} j_{2}} \dots A_{j_{k - 1} j_{k}} = k = 1 \sum \infty α^{k} # {length- k walks from i} .

A node with a high Katz-Bonacich centrality increases the centrality of its neighbors by the same amount no matter how many neighbors it has, which may not be realistic in some settings.

PageRank

Extending the Katz-Bonacich centrality, PageRank dilutes the importance contribution of a node to its neighbors by its number of outgoing edges. Intuitively, a page is important if another page points to it, but is less important if that other page points to many pages. Formally,

C_{P} = α A D^{- 1} C_{P} + 1,

where $D$ is the out-degree matrix lower bounded by $1$ , giving

C_{P} = (I - α A D^{- 1})^{- 1} 1 .

Removing the constant term gives

C_{P}^{'} = α A D^{- 1} C_{P}^{'} .

Note that for undirected graphs $λ_{m a x} (A D^{- 1}) \leq 1$ and $C_{P}^{'} = A 1$ is the leading eigenvector of $A D^{- 1}$ with eigenvalue $1$ . Thus, $C_{P}^{'}$ reduces to the degree centrality $C_{D}$ for undirected graphs.

For directed graphs, $C_{P}^{'}$ faces the same issue as the eigenvector centrality. From another perspective, we see that $A D^{- 1}$ is a row-stochastic matrix, which specifies a Markov Chain corresponding to a random walk, and $C_{P}^{'}$ is the stationary distribution. To prevent the Markov chain being trapped in a sink node, the additional constant term helps PageRank jump to a node uniformly at random with probability $\frac{1}{n ( 1 + α )}$ .

When assigning PageRank and previous centralities, each node plays two roles: it receives importance from its neighbors and contributes importance to its neighbors. A closer look inspires us to find the most influential and most influenced nodes using separate scores.

Hub and Authority

Consider each node has two roles: hub and authority. A hub node is more important if it points to more important authorities; an authority node is more important if it is pointed to by more important hubs. Associate each node $i$ with a hub score $C_{H} (i)$ and an authority score $C_{A} (i)$ . Formalize the above relationship with a linear form:

C_{A} = α A C_{H}, C_{H} = β A^{T} C_{A} .

A fixed point solution satisfies

C_{A} = α β A A^{T} C_{A}, C_{H} = α β A^{T} A C_{H} .

Rmk

$(A^{T} A)_{ij} = \sum_{k} A_{ki} A_{kj}$ counts the number of nodes that point to both $i$ and $j$ ; $(A A^{T})_{ij} = \sum_{k} A_{ik} A_{jk}$ counts the number of nodes that are pointed to by both $i$ and $j$ .

Similar to Eigenvector centrality, if $A A^{T}$ and $A^{T} A$ are irreducible, then only their leading eigenvectors have all elements non-negative and survive the power iteration. Thus, let $α β = λ_{m a x}^{- 1}$ and $C_{A}$ and $C_{H}$ be the leading eigenvectors of $A A^{T}$ and $A^{T} A$ , referred to as the authority and hub centrality, respectively.

Closeness

Previous centralities are based on the adjacency matrix. Can we propose other centralities based on path or connectivity?

Katz-Bonacich centrality gives an interpretation of counting walks: a node has a high centrality if it can initiate many short walks. In the same spirit, we can assign a high centrality to a node if it can reach many nodes through short paths.

The closeness centrality of a node is the reciprocal of the average shortest path length from it to all other nodes:

C_{C} (i) = \frac{n - 1}{\sum _{j \neq = i} dist ( i , j )} .

However, when the network is not connected, the above expression always gives zero. To fix this issue, we can use a harmonic mean instead:

C_{C}^{'} (i) = \frac{\sum _{j \in i} \frac{1}{dist ( i , j )}}{n - 1}

Betweenness

We can also assign a high centrality to a node if it is key to connecting other nodes.

The between centrality of a node is the proportion of shortest paths between pairs of nodes that pass through it. Formally, let $S (i, j)$ be the set of shortest paths between nodes $i$ and $j$ , and $S (i, j ∣ k)$ be the set of those paths that pass through node $k$ . Then the betweenness centrality of node $k$ is defined as:

C_{B} (i) = i \neq = j \neq = k \sum \frac{∣ S ( j , k ∣ i ) ∣/∣ S ( j , k ) ∣}{( n - 1 ) ( n - 2 ) /2} .

All previous centralities all have a positive correlation with degree centrality, i.e., measure how well-connected a node is. However, a low-degree node that is distant from other nodes can have a high betweenness centrality.

Random Walk Betweenness

Replacing the shortest path in Betweenness centrality with a random walk gives the random walk betweenness centrality:

C_{R} (i) = i \neq = j \neq = k \sum \frac{P ( j \to i \to k ∣ j \to k )}{( n - 1 ) ( n - 2 ) /2},

where $P (j \to i \to k ∣ j \to k)$ is the probability that a random walk passes through $i$ conditioned on it starting from $j$ and ending at $k$ .

Networked Networks

Table of Contents

Backlinks

Graph View

Centrality

Table of Contents

Network Centrality

Degree

Eigenvector

Katz-Bonacich

PageRank

Hub and Authority

Closeness

Betweenness

Random Walk Betweenness

Backlinks

Graph View