Graphs and linear algebra
Formally, an unweighted graph is , where . Informally, consists of things we want to model and represents the relations between them. It is a very flexible representation: we use graphs to represent friendships between people, wires between routers, citations between papers, links between objects in a data structure, and many other things. When the bare topology of the relationships does not provide enough modeling power, we might also consider including functions on or corresponding to different attributes. The most common case is a scalar weight function assigned to each edge that corresponds to the importance of the relation: in a social network, for example, maybe a close and active friendship has more weight than a casual acquaintance. We use relationship information encoded in graphs to reason about logical groupings (whether we call them communities or clusters), about power relations and influence, and about dynamic processes like the spread of rumors or disease.
Matrix methods for network analysis rely on a sort of pun: we encode the network as a matrix, translate the question into linear algebraic terms , and commence to compute. There are many possible matrices associated with a graph, and we use them to reason about different things. We consider three interpretations of these network matrices:
A matrix may represent a linear map between different spaces, typically mapping vertex properties to edge properties, or vice-versa. Examples include the discrete gradient operator and the edge sum operator.
A matrix may represent an operator mapping the space of functions over the vertices (or over edges) to itself. Examples include transition matrices for random walks defined on the graph.
A matrix may represent a quadratic form mapping functions on the vertices (or edges) into scalars. Often the quadratic form has an easy to interpret meaning for special inputs; for example, the quadratic form for the combinatorial Laplacian counts cut edges in graph partitioning.
Adjacency and degrees
If we identify vertices in the graph with indices , then the (unweighted) adjacency matrix has entries For graphs in which edges have positive weights, we sometimes use the weighted adjacency matrix, with giving the edge weight for .
The degree of a node is the sum of the weights on the incident edges. When the graph is directed, the in-degree and out-degree may differ; for the moment, we will stick with the directed case. We let denote the vector of weighted node degrees, and let denote the matrix in which the weighted node degrees appear on the diagonal.
The adjacency matrix and the degree matrices are building blocks for several other matrices, but they are also useful on their own. First, as a linear operator, the adjacency matrix accumulates values from neighbors; that is, where is the neighborhood of . If is the number of paths of length leading from starting point to node , then is the number of paths of length to all neighbors of node — that is, the total number of paths of length to node . Therefore, We use this formula in many ways; for example, it lets us write number of triangles in an undirected graph (closed cycles of length three) as where we divide by three because each triangle is counted once for each of its vertices. We also know, for example, how to approximate the number of long paths between and in terms of the dominant eigenvector (and associated eigenvalue) of .
The matrix also defines a quadratic form that is useful for counting edges. Let be the indicator for a subset of vertices . Then i.e. is the total (directed) edge weight between nodes in , or twice the total undirected edge weight. If is an indicator then , and so If is a clique, the mean degree within is ; therefore since is the largest possible value for the Rayleigh quotient . Hence, the maximum clique size has the bound This is an example of a result in spectral graph theory, i.e. the study of graphs in terms of eigenvalues and eigenvectors of associated matrices. In fact, another continuous optimization problem due to Motzkin and Straus gives the clique number exactly: The optimization is now carried out over the simplex rather than over the Euclidean unit ball used in the spectral bound.
If is an indicator for a set , we can also use the quadratic form Therefore, We will see more of the combinatorial Laplacian matrix shortly.
Random walks and normalized adjacency
Now consider a random walk on a , i.e. a Markov process where the walker location at time is chosen randomly from among the neighbors of the previous location with probability determined by the edge weights. Using the properties of conditional probability, and the rule for randomly choosing a neighbor gives Letting be the column vector whose entries represent the probability that , and similarly with , we write the equation for conditional probability concisely as The matrix is the transition matrix for the random walk Markov chain. Powers of have an interpretation similar to that of powers of , but rather than counting length paths, computes probabilities: Assuming the graph is connected and and aperiodic, the matrix has a unique eigenvalue at 1, and all other eigenvalues are inside the unit circle. In this case, where is a probability vector representing the stationary distribution for the Markov chain. In the undirected case, the stationary distribution is rather simple: . Things are more interesting for directed graphs.
While the eigenvalue at and the associated stationary distribution are particularly interesting, the other eigenvalues and vectors are also interesting. In particular, suppose , and consider where is the condition number of the eigenvector matrix, is the diagonal matrix of eigenvalues with the eigenvalue at one replaced by zero, and is the maximum modulus of all eigenvalues other than the eigenvalue at one. Therefore, the asymptotic rate of convergence of the Markov chain, also known as the mixing rate is determined by the second-largest eigenvalue modulus of .
To understand the mixing rate in more detail (in the undirected case), it is helpful to consider the closely related normalized adjacency matrix Note that , so The eigenvalues of are critical points of substituting , we have If is an indicator for a set , this last expression represents the fraction of edges incident on that are to other nodes in . If is on and on , then and is minus twice the total weight of edges from to ; hence, If we restrict to the case where the same number of edges are incident on and , then is -orthogonal to the all one vector, and so is a lower bound on the eigenvalue closest to one. Thus, spectral analysis lets us bound the mixing rate in terms of the normalized cut size .
Discrete gradients and the Laplacian
For an unweighted graph, the discrete gradient is a matrix in which each row represents an edge by . If is an indicator for a set , then is nonzero () only on edges between and ; hence, We can rewrite this as where Each term in this sum contributes one to the and entries through the and products; the cross terms fill in . Putting everything together, we have The matrix is known as the combinatorial Laplacian, or sometimes simply as the Laplacian. The same construction holds in the weighted case, where it corresponds to where is the weight of the edge. In either case, the smallest eigenvalue of is zero, corresponding to an eigenvector of all ones. The multiplicity of the zero eigenvalue is equal to the number of connected components in the graph; assuming there is only one connected component, the second largest eigenvalue is a lower bound on for any ; if we choose to be a vector indicating the split between equal size sets and , then also gives four times the number of edges cut by the partitioning. Hence, is a lower bound on the minimal bisector size.
The combinatorial Laplacian also can be interpreted as a linear operator, and in this guise it plays a role as the generator for a continuous-time random walk involving a random walk in which the time elapsed between each consecutive pair of steps is given by an independent exponential random variable with mean one. In this case, we have The matrix is known as the heat kernel on the graph because it can also describe continuous-time diffusion of heat on a graph.
The normalized Laplacian is ; the eigenvalues of the normalized Laplacian can also be expressed as critical points of the generalized Rayleigh quotient Thus twice the Rayleigh quotient gives the fraction of all edges that go between and if is a vector which is on the set and on .
Discrete sums and the signless Laplacian
The discrete sum operator is where each row corresponds to an edge and has the form . The signless Laplacian is The signless Laplacian is positive semi-definite; but unlike the combinatorial Laplacian, it may or may not have a null vector. If there is a null vector for the signless Laplacian, then we must have whenever ; this implies that indicates bipartite structure where the set with positive elements can have edges to a set with negative elements, but neither nor may have any other edges. There has been some work on the spectral theory for the signless Laplacian, but it is generally much less used than the combinatorial Laplacian.
Modularity matrix
Suppose we want to find a tight cluster in our graph. One approach is to look for sets of nodes that make large; but large relative to what? For an undirected graph, we use the quadratic form to count internal edges in a set. But a set may have many internal edges purely as an accident of having many high-degree nodes: high-degree nodes are highly likely to connect to each other simply because they are highly likely to connect to anyone! Hence, we would like a reference model so that we can see whether the edge density in a subgraph is “unusually” high relative to that reference.
The simplest reference model that takes into account the effect of degree distribution is sometimes called the configuration model. In the configuration model, we prescribe the vector of expected node degrees. We then add edges, each of which is with probability ; self-loops and repeating edges are allowed. With this construction, the expected adjacency is which has the correct expected degree distribution. The modularity matrix is defined to be and if is an indicator for , the quadratic form indicates the number of “excess edges” within compared to what we would predict in the configuration model.
And many more
The list of graph matrices that we have discussed is by no means exhaustive. We will see a few more examples this week, including the heat kernel matrix and the PageRank matrix. But there are many more besides; for example, recent work by Leskovec and Benson used a motif adjacency matrix for which represents the number of triangles in the graph involving the edge . And one can define ever more exotic matrix representations. However, the adjacency, Laplacian, and their close neighbors suffice for many applications.