Configuration Model

Motivation: The basic ER model has a binomial/Poisson degree distribution. How do we generate a network with any degree distribution/tail behavior?
Model: graphic sequences $(d_{i})_{i = 1}^{n}$ converging to a degree distribution $(p_{d})_{d \in N}$
- i.e., $(p_{d}^{(n)}) \to (p_{d})$ , so we can use the Local Branching property
- We use $⟨ d^{k} ⟩$ to denote the $k$ -th moment of the degree distribution, i.e., $⟨ d^{k} ⟩ = \sum_{d} d^{k} p_{d}$
Basic properties
- Expected excess degree: $⟨ \tilde{d} ⟩ = ⟨ d^{2} ⟩ / ⟨ d ⟩ - 1$
- Local Clustering: $E C_{i} = \frac{1}{n} \frac{(⟨ d ^{2} ⟩ - ⟨ d ⟩ ) ^{2}}{⟨ d ⟩ ^{3}}$
Phase transitions
- Giant component: $⟨ d^{2} ⟩ > 2 ⟨ d ⟩$
  - Small component size: $1 + \frac{⟨ d ⟩ ^{2}}{2 ⟨ d ⟩ - ⟨ d ^{2} ⟩}$
- Connectivity for power law: $γ \leq 2$
  - General: $g_{1} (0) = 0$ , where $g_{1}$ is the PGF of the excess degree distribution
Diameter: $ln n / ln ⟨ \tilde{d} ⟩$ when $⟨ \tilde{d} ⟩ > 1$

Degree specification

Given $n$ nodes, we can specify their degrees with three levels:

a fixed degree sequence $(d_{i})_{i = 1}^{n}$ such that $D_{i} = d_{i}$ (Algorithm 1 Uniform Stub-Matching);

a degree expectation sequence $(d_{i})_{i = 1}^{n}$ such that $E D_{i} = d_{i}$ (Chung–Lu Model); and

a degree distribution $(p_{d})_{d \in N}$ such that $P (D_{i} = d) = p_{d}$ (Algorithm 2 Sampling).

Generally, the configuration model is the family of random graph models that generate networks from any degree specification above. People also categorize the models into microcanonical and canonical (soft) configuration models, referring to the first and second degree specification levels respectively. Chung–Lu Model is the canonical canonical configuration model. Customarily, the configuration model refers a specific process of generating a random graph with a fixed degree sequence, which is the focus of this note. The properties in this note also hold in expectation when the fixed degree sequence is a realization of a degree distribution (the third specification).

Graphic sequence

Given a degree sequence $(d_{i})_{i = 1}^{n}$ with $d_{1} \geq d_{2} \geq \dots \geq d_{n}$ WLOG, a graph with this degree sequence exists if and only if $\sum_{i = 1}^{n} d_{i}$ is even and for all $k \in [n - 1]$ ,
$i = 1 \sum k d_{i} \leq k (k - 1) + i = k + 1 \sum n (d_{i} \land k) .$
A degree sequence satisfying these two conditions is called a graphic sequence.

Algorithm 1: Uniform Stub-Matching

List all nodes, each with $d_{i}$ stubs (half-edges).
Uniformly randomly match two stubs and form an edge, and delete the two stubs.
Repeat step 2 until no stubs are left.

Possible to generate a multigraph with self-loops and multiple edges.
Cumbersome to generate and does not scale with $n$ .

Multi-edges and Self-loops

Under suitable assumptions on the degree sequence and as $n$ grows, we can work directly with the generated multigraph, and show it has essentially the same properties as a randomly selected graph with the same degree sequence; or we can delete self-links and duplicate links in the generated multigraph, and show the proportion of deletions is suitably small, and we end up with a graph with a degree distribution close to the specification.

The justification is the following two propositions.

Prop

Consider an infinite degree sequence $(d_{1}, d_{2}, \dots)$ such that $max_{i \leq n} d_{i} = o (n^{1/3})$ . Then
$n \to \infty lim i \leq n max P (i has a self-loop or a multiple edge) = 0.$

Pf

Fix $n$ . Let $\overset{ˉ}{d}_{n}, \hat{d}_{n}$ be the average and maximum degree in the first $n$ nodes, respectively. Consider an equivalent procedure that first matches all stubs of an arbitrary node $i$ , and then move on to the next node, and so on. In this procedure, the probability of the first edge being a self-loop or a multiple edge is $\frac{d _{i} - 1}{n d ˉ _{n} - 1} \leq \frac{d ^ _{n}}{n d ˉ _{n} - d ^ _{n}}$ . Then, suppose the the first edge goes to node $j$ . The probability of the second edge being a self-loop or a multiple edge is $\frac{d _{i} - 2}{n d ˉ _{n} - 2} + \frac{d _{j} - 1}{n d ˉ _{n} - 2} \leq \frac{2 d ^ _{n}}{n d ˉ _{n} - d ^ _{n}}$ . By induction, the probability of node $i$ having no self-loop or multiple edge is at least
$k = 1 \prod d_{i} (1 - \frac{j d ^ _{n}}{n d ˉ _{n} - d ^ _{n}}) \geq (1 - \frac{d ^ _{n}^{2}}{n d ˉ _{n} - d ^ _{n}})^{\hat{d}_{n}} \approx exp (- \frac{d ^ _{n}^{3}}{n d ˉ _{n} - d ^ _{n}}) \to 1.$

The above proposition does not imply that the process will generate no self-links or duplicate links. When one aggregates across many nodes, there will tend to be some duplicate and self-links in this process, except under more extreme assumptions on the degree sequences. We now calculate the expected total number of multi-edges and self-loops.

Prop

The expected total number of self-loops is $\frac{⟨ d ^{2} ⟩ - ⟨ d ⟩}{2 ⟨ d ⟩}$ and the expected total number of multi-edges is $\frac{1}{2} (\frac{⟨ d ^{2} ⟩ - ⟨ d ⟩}{⟨ d ⟩})^{2}$ .

Pf

For node $i$ , the probability it has a self link is $(2 d _{i}) \cdot \frac{1}{2 m - 1}$ . Thus, the expected total number of self-loops is
$i \sum \frac{( 2 d _{i} )}{2 m - 1} \approx \frac{\sum _{i} d _{i} ( d _{i} - 1 )}{4 m} = \frac{⟨ d ^{2} ⟩ - ⟨ d ⟩}{2 ⟨ d ⟩} .$
Note that under the uniform stub-matching process, the probability of a link between two nodes $i$ and $j$ is $\frac{d _{i} d _{j}}{2 m - 1}$ , where $m = \frac{1}{2} \sum_{i} d_{i}$ is the total number of edges. Conditioned on that $i, j$ are connected by a first link, they form a second link with probability $\frac{( d _{i} - 1 ) ( d _{j} - 1 )}{2 m - 3}$ . The product of the two probabilities is the probability of a multi-edge between $i$ and $j$ (more than two links count as one multi-edge). Suppose $m ≳ n \to \infty$ . Then, the expected number of multi-edges is
$i < j \sum \frac{d _{i} d _{j} ( d _{i} - 1 ) ( d _{j} - 1 )}{( 2 m - 1 ) ( 2 m - 3 )} \approx \frac{( \sum _{i} d _{i} ( d _{i} - 1 )) ( \sum _{j} d _{j} ( d _{j} - 1 )) - \sum _{i} d _{i}^{2} ( d _{i} - 1 ) ^{2}}{8 m ^{2}} \approx \frac{1}{2} (\frac{⟨ d ^{2} ⟩ - ⟨ d ⟩}{⟨ d ⟩})^{2},$

where we assumed that all involved moments exist and grow slower than $n$ .

Algorithm 2: Sampling

Input: a degree specification $(p_{d})_{d \in N}$ of the second type, a size $n$ .
Generate $D_{i} \sim i.i.d. (p_{d})$ for $i = 1, \dots, n$ .
If $(D_{i})_{i = 1}^{n}$ is not a graphic sequence, go back to step 2.
Generate a uniform random match of stubs as in Algorithm 1.
If the generated graph has self-links or duplicate links, go back to step 4.

Re-sampling introduces correlations across nodes and edges, but negligible as a higher-order effect.
One can show that as $n$ grows:
- $# self-loops \sim d Poisson (λ_{self})$ with $λ_{self} = O (1)$ ,
- $# duplicate edges \sim d Poisson (λ_{dup})$ with $λ_{dup} = O (1)$ ,
- $# self-loops$ and $# duplicate edges$ are asymptotically independent with high probability, and thus
- the success rate of Algorithm 2 is $exp (- λ_{self} - λ_{dup}) = O (1)$ .

Local Branching

Configuration models including Chung–Lu Model generalize Erdos-Renyi Random Graph to accommodate arbitrary degree distributions. In all these models, the local graph structure is Tree-like, as the probability of an edge linking back to a previously visited node is negligible as $n$ grows.

The local tree structure can be thought of as generated by a Branching process, as all neighbors of a node are generated i.i.d. (not true when correlation exists, see e.g., Problem 3)

Formally, consider a local connected subgraph or a small component of size $s$ . Since it’s connected it has at least $s - 1$ links. If the degree distribution (or Excess Degree Distribution) has a constant mean w.r.t the graph size $n$ , the probability of having an additional link is

((2 s) - (s - 1)) \cdot \frac{⟨ d ⟩}{n} = o (1),

where $⟨ d ⟩ / n$ (or $⟨ \tilde{d} ⟩ / n$ ) is the average probability of a link between two nodes, which is $p$ in ER.

Excess Degree Distribution

To utilize the local branching property, we need to know that is the degree distribution of a neighbor of a node to figure out the offspring distribution. Since this neighbor already has one link to the node, we are interested in the number of its other neighbors, which is called the excess degree. For the following generation, the number of offsprings is specified by the excess degree distribution.

Suppose we have a graph generated by a configuration model and has a degree distribution $D_{i} \sim i.i.d. (p_{d})_{d \in N}$ . According to the process in Algorithm 1 Uniform Stub-Matching, the probability of a node being $i$ ‘s neighbor is proportional to its degree, so the excess degree a neighbor follows

P (\tilde{D}_{j} = d) = \frac{( d + 1 ) p _{d + 1}}{\sum _{d^{'}} d ^{'} p _{d^{'}}} = \frac{( d + 1 ) p _{d + 1}}{E D _{i}}, \forall d \in N,

which is independent of node $i$ . Thus, the extinction probability of this branching process depends on $⟨ \tilde{d} ⟩ = ⟨ d^{2} ⟩ / ⟨ d ⟩ - 1$ .

Random node vs random link

Randomly picking a node from a network vs. randomly following the end of a link, are two very different exercises. The latter is much more likely to find a high degree node.

Friendship paradox

The expected average neighbor degree is hence $E D_{i}^{2} / E D_{i}$ , which is no smaller and can be much larger than $E D_{i}$ .

This gives the following phase transition:

A Giant Component appears if $⟨ \tilde{d} ⟩ > 1$ , i.e., $⟨ d^{2} ⟩ > 2 ⟨ d ⟩$ . When $⟨ \tilde{d} ⟩ < 1$ , a component has an expected size of $1 + \frac{⟨ d ⟩ ^{2}}{2 ⟨ d ⟩ - ⟨ d ^{2} ⟩}$ .

When $D \sim Poisson (λ)$ , we have $E D_{i} = λ$ and $E D_{i}^{2} / E D_{i} = 1 + λ$ , so the phase transition occurs at $λ = 1$ , which is consistent with the ER model.
When $D$ follows a Power Law Distribution with exponent $γ$ , then $⟨ d^{2} ⟩ = ζ (γ - 2)$ and $⟨ d ⟩ = ζ (γ - 1)$ , where $ζ (\cdot)$ is the Riemann zeta function. Thus, the phase transition occurs at $ζ (γ - 2) > 2 ζ (γ - 1)$ , which is approximately $γ < 3.4788$ .
The above approximation applies to both Erdos-Renyi Random Graph and Chung–Lu Model.

Clustering Coefficient

A corollary of Local Branching property is a zero clustering coefficient. Intuitively, your neighbors are much more likely to be connected to other nodes than to each other. We formally calculate the expected clustering coefficient without assuming constant moments of the degree distribution:

C = E [E [1 {(j, k) \in E} ∣ (i, j), (i, k) \in E]] .

Note that the excess degrees of $j$ and $k$ are independent of $i$ and they randomly pick from the remaining $2 (m - 2)$ stubs. Thus,

C = \approx = = E [P ((j, k) \in E) ∣ \tilde{D}_{j} = d_{j}, \tilde{D}_{k} = d_{k}] d_{j} = 0, d_{k} = 0 \sum \infty \frac{d _{j} d _{k}}{2 m} \cdot \frac{( d _{j} + 1 ) p _{d_{j} + 1} ( d _{k} + 1 ) p _{d_{k} + 1}}{⟨ d ⟩ ^{2}} \frac{1}{2 m} (\frac{⟨ d ^{2} ⟩ - ⟨ d ⟩}{⟨ d ⟩})^{2} \frac{1}{n} \frac{(⟨ d ^{2} ⟩ - ⟨ d ⟩ ) ^{2}}{⟨ d ⟩ ^{3}} .

Therefore, if $⟨ d^{2} ⟩$ grows with $n$ , we could have a non-vanishing clustering coefficient.

Diameter

We calculated that the average number of other neighbors of a neighbor is $⟨ \tilde{d} ⟩ = ⟨ d^{2} ⟩ / ⟨ d ⟩ - 1$ . Thus, the number of neighbors at distance $2$ is $⟨ d^{2} ⟩ - ⟨ d ⟩$ . By induction, we know the number of neighbors at distance $r$ is approximately $⟨ d ⟩ ⟨ \tilde{d} ⟩^{r - 1}$ .

For a shortest path $(i, k_{1}, \dots, k_{l - 1}, j)$ from $i$ to $j$ , we know that $k_{s}$ must be at distance $s$ of $i$ and at distance $l - s$ of $j$ , i.e., the path also contains the shortest paths from $i$ to $k_{s}$ and from $j$ to $k_{s}$ , otherwise we can find a shorter path. Therefore, the distance between $i$ and $j$ is no larger than $l$ if there exists a node $k_{s}$ at distance $s$ of $i$ and a node $k_{s + 1}$ at distance $l - s - 1$ of $j$ such that $(k_{s}, k_{s + 1}) \in E$ . The probability of the existence of such a link is

⟨ d ⟩ ⟨ \tilde{d} ⟩^{s - 1} \cdot ⟨ d ⟩ ⟨ \tilde{d} ⟩^{l - s - 1} \cdot \frac{⟨ d ~ ⟩ ^{2}}{2 m} = \frac{⟨ d ⟩ ⟨ d ~ ⟩ ^{l}}{n} .

Thus, with probability at least $1 - δ$ , the distance between any two nodes is bounded by

\frac{ln δ + ln n - ln ⟨ d ⟩}{ln ⟨ d ~ ⟩} ≍ \frac{ln n}{ln ⟨ d ~ ⟩} .

Thus, Small-World Effect holds if $⟨ \tilde{d} ⟩ > 1$ .

Networked Networks

Table of Contents

Backlinks

Graph View

Configuration Model

Table of Contents

Configuration Model

Algorithm 1: Uniform Stub-Matching

Multi-edges and Self-loops

Algorithm 2: Sampling

Local Branching

Excess Degree Distribution

Clustering Coefficient

Diameter

Backlinks

Graph View