Configuration Model
- Motivation: The basic ER model has a binomial/Poisson degree distribution. How do we generate a network with any degree distribution/tail behavior?
- Model: graphic sequences converging to a degree distribution
- i.e., , so we can use the Local Branching property
- We use to denote the -th moment of the degree distribution, i.e.,
- Basic properties
- Expected excess degree:
- Local Clustering:
- Phase transitions
- Giant component:
- Small component size:
- Connectivity for power law:
- General: , where is the PGF of the excess degree distribution
- Giant component:
- Diameter: when
Degree specification
Given nodes, we can specify their degrees with three levels:
- a fixed degree sequence such that (Algorithm 1 Uniform Stub-Matching);
- a degree expectation sequence such that (Chung–Lu Model); and
- a degree distribution such that (Algorithm 2 Sampling).
Generally, the configuration model is the family of random graph models that generate networks from any degree specification above. People also categorize the models into microcanonical and canonical (soft) configuration models, referring to the first and second degree specification levels respectively. Chung–Lu Model is the canonical canonical configuration model. Customarily, the configuration model refers a specific process of generating a random graph with a fixed degree sequence, which is the focus of this note. The properties in this note also hold in expectation when the fixed degree sequence is a realization of a degree distribution (the third specification).
Graphic sequence
Given a degree sequence with WLOG, a graph with this degree sequence exists if and only if is even and for all ,
A degree sequence satisfying these two conditions is called a graphic sequence.
Algorithm 1: Uniform Stub-Matching
- List all nodes, each with stubs (half-edges).
- Uniformly randomly match two stubs and form an edge, and delete the two stubs.
- Repeat step 2 until no stubs are left.
- Possible to generate a multigraph with self-loops and multiple edges.
- Cumbersome to generate and does not scale with .
Multi-edges and Self-loops
Under suitable assumptions on the degree sequence and as grows, we can work directly with the generated multigraph, and show it has essentially the same properties as a randomly selected graph with the same degree sequence; or we can delete self-links and duplicate links in the generated multigraph, and show the proportion of deletions is suitably small, and we end up with a graph with a degree distribution close to the specification.
The justification is the following two propositions.
Prop
Consider an infinite degree sequence such that . Then
Pf
Fix . Let be the average and maximum degree in the first nodes, respectively. Consider an equivalent procedure that first matches all stubs of an arbitrary node , and then move on to the next node, and so on. In this procedure, the probability of the first edge being a self-loop or a multiple edge is . Then, suppose the the first edge goes to node . The probability of the second edge being a self-loop or a multiple edge is . By induction, the probability of node having no self-loop or multiple edge is at least
The above proposition does not imply that the process will generate no self-links or duplicate links. When one aggregates across many nodes, there will tend to be some duplicate and self-links in this process, except under more extreme assumptions on the degree sequences. We now calculate the expected total number of multi-edges and self-loops.
Prop
The expected total number of self-loops is and the expected total number of multi-edges is .
Pf
For node , the probability it has a self link is . Thus, the expected total number of self-loops is
Note that under the uniform stub-matching process, the probability of a link between two nodes and is , where is the total number of edges. Conditioned on that are connected by a first link, they form a second link with probability . The product of the two probabilities is the probability of a multi-edge between and (more than two links count as one multi-edge). Suppose . Then, the expected number of multi-edges is
where we assumed that all involved moments exist and grow slower than .
Algorithm 2: Sampling
- Input: a degree specification of the second type, a size .
- Generate for .
- If is not a graphic sequence, go back to step 2.
- Generate a uniform random match of stubs as in Algorithm 1.
- If the generated graph has self-links or duplicate links, go back to step 4.
- Re-sampling introduces correlations across nodes and edges, but negligible as a higher-order effect.
- One can show that as grows:
- with ,
- with ,
- and are asymptotically independent with high probability, and thus
- the success rate of Algorithm 2 is .
Local Branching
Configuration models including Chung–Lu Model generalize Erdos-Renyi Random Graph to accommodate arbitrary degree distributions. In all these models, the local graph structure is Tree-like, as the probability of an edge linking back to a previously visited node is negligible as grows.
The local tree structure can be thought of as generated by a Branching process, as all neighbors of a node are generated i.i.d. (not true when correlation exists, see e.g., Problem 3)
Formally, consider a local connected subgraph or a small component of size . Since it’s connected it has at least links. If the degree distribution (or Excess Degree Distribution) has a constant mean w.r.t the graph size , the probability of having an additional link is
where (or ) is the average probability of a link between two nodes, which is in ER.
Excess Degree Distribution
To utilize the local branching property, we need to know that is the degree distribution of a neighbor of a node to figure out the offspring distribution. Since this neighbor already has one link to the node, we are interested in the number of its other neighbors, which is called the excess degree. For the following generation, the number of offsprings is specified by the excess degree distribution.
Suppose we have a graph generated by a configuration model and has a degree distribution . According to the process in Algorithm 1 Uniform Stub-Matching, the probability of a node being ‘s neighbor is proportional to its degree, so the excess degree a neighbor follows
which is independent of node . Thus, the extinction probability of this branching process depends on .
Random node vs random link
Randomly picking a node from a network vs. randomly following the end of a link, are two very different exercises. The latter is much more likely to find a high degree node.
Friendship paradox
The expected average neighbor degree is hence , which is no smaller and can be much larger than .
This gives the following phase transition:
A Giant Component appears if , i.e., . When , a component has an expected size of .
- When , we have and , so the phase transition occurs at , which is consistent with the ER model.
- When follows a Power Law Distribution with exponent , then and , where is the Riemann zeta function. Thus, the phase transition occurs at , which is approximately .
- The above approximation applies to both Erdos-Renyi Random Graph and Chung–Lu Model.
Clustering Coefficient
A corollary of Local Branching property is a zero clustering coefficient. Intuitively, your neighbors are much more likely to be connected to other nodes than to each other. We formally calculate the expected clustering coefficient without assuming constant moments of the degree distribution:
Note that the excess degrees of and are independent of and they randomly pick from the remaining stubs. Thus,
Therefore, if grows with , we could have a non-vanishing clustering coefficient.
Diameter
We calculated that the average number of other neighbors of a neighbor is . Thus, the number of neighbors at distance is . By induction, we know the number of neighbors at distance is approximately .
For a shortest path from to , we know that must be at distance of and at distance of , i.e., the path also contains the shortest paths from to and from to , otherwise we can find a shorter path. Therefore, the distance between and is no larger than if there exists a node at distance of and a node at distance of such that . The probability of the existence of such a link is
Thus, with probability at least , the distance between any two nodes is bounded by
Thus, Small-World Effect holds if .