Assortativity
Homophily
tendency of individuals to associate disproportionately with others who are similar (e.g. ethnicity, age, profession, religion, etc) to themselves.
- Lazarsfeld, P.F. “Friendship as a social process: A substantive and methodological analysis.” Freedom and Control in Modern Society (1954)
The tendency of individuals to associate with others who are similar to themselves is known as homophily or assortative mixing. On the other hand, disassortative mixing, the tendency of individuals to associate with others who are unlike them, is also widely observed in information, technological, and biological networks [@newman2018Networks].
Modularity
To talk about assortativity, we need to specify the characteristic that defines the similarity between nodes. When such characteristics are unknown, the Unsupervised Learning task of grouping nodes into clusters is called community detection.
The idea of a community is that nodes in the same community have a higher probability of being connected than those in different communities. To capture this idea, modularity of a division is defined as
where is the set of edges within communities, and places all edges uniformly at random. A large modularity means a low entropy, which further suggests a strong community structure.
More formally, suppose returns the community index of node . Then,
If we break the network into stubs and reconnect the stubs uniformly at random, the expected number of edges within communities is
Thus,
Let . The matrix form of modularity is
Modularity Maximization
Modularity can serve as an objective function for community detection. The discrete optimization problem is NP-hard in general. Consider a partition into two communities with index and . Then,
Note that . Thus, for this binary partition, we have
recovering the ^spectral-partition with the graph Laplacian replaced by the modularity matrix . However, here we can directly let the relaxed real-valued be the leading eigenvector of the modularity matrix, without degenerating to the trivial case where all nodes are in the same community. This method is also referred to as the spectral modularity maximization.
Assortative Mixing
Now suppose nodes have a scalar characteristic. If nodes with similar values tend to be connected together more often than those with different values, then the network is considered assortatively mixed according to that characteristic.
To see if links are formed based on a scalar characteristic , we calculate the covariance between the characteristic values of the two endpoints of a randomly chosen edge:
where the marginal mean is defined as
Thus,
which recovers ^eq-mod by replacing the community indicator with the characteristic value. In matrix form, we have
Note that
We define the assortativity coefficient as the correlation:
means perfect assortative mixing, means no assortative mixing, and means perfect disassortative mixing.
Degree Correlation
When the characteristic is the degree, associative mixing gives core-periphery structures, where high-degree nodes tend to stick together to form a core surrounded by a periphery of low-degree nodes, commonly observed in social networks. Disassortative mixing by degree tend to give hub-and-spoke structures, where high-degree nodes are hubs that connect many low-degree nodes that connect to other hubs. Replacing with the degree vector gives the assortativity coefficient by degree.