拉普拉斯嵌入法與譜分群¶

Spectral embedding and clustering

Creative Commons License
This work by Jephian Lin is licensed under a Creative Commons Attribution 4.0 International License.

$\newcommand{\trans}{^\top} \newcommand{\adj}{^{\rm adj}} \newcommand{\cof}{^{\rm cof}} \newcommand{\inp}[2]{\left\langle#1,#2\right\rangle} \newcommand{\dunion}{\mathbin{\dot\cup}} \newcommand{\bzero}{\mathbf{0}} \newcommand{\bone}{\mathbf{1}} \newcommand{\ba}{\mathbf{a}} \newcommand{\bb}{\mathbf{b}} \newcommand{\bc}{\mathbf{c}} \newcommand{\bd}{\mathbf{d}} \newcommand{\be}{\mathbf{e}} \newcommand{\bh}{\mathbf{h}} \newcommand{\bp}{\mathbf{p}} \newcommand{\bq}{\mathbf{q}} \newcommand{\br}{\mathbf{r}} \newcommand{\bx}{\mathbf{x}} \newcommand{\by}{\mathbf{y}} \newcommand{\bz}{\mathbf{z}} \newcommand{\bu}{\mathbf{u}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bw}{\mathbf{w}} \newcommand{\tr}{\operatorname{tr}} \newcommand{\nul}{\operatorname{null}} \newcommand{\rank}{\operatorname{rank}} %\newcommand{\ker}{\operatorname{ker}} \newcommand{\range}{\operatorname{range}} \newcommand{\Col}{\operatorname{Col}} \newcommand{\Row}{\operatorname{Row}} \newcommand{\spec}{\operatorname{spec}} \newcommand{\vspan}{\operatorname{span}} \newcommand{\Vol}{\operatorname{Vol}} \newcommand{\sgn}{\operatorname{sgn}} \newcommand{\idmap}{\operatorname{id}} \newcommand{\am}{\operatorname{am}} \newcommand{\gm}{\operatorname{gm}} \newcommand{\mult}{\operatorname{mult}} \newcommand{\iner}{\operatorname{iner}}$

Main idea¶

Sometimes a graph can have some "clusters" such that the vertices in each cluster are strongly related, while vertices from different clusters are weakly related.
The Laplacian matrix is good at find these clusters.

Let $G$ be a graph on $n$ vertices and $L$ its Laplacian matrix.
Let $\lambda_1 \leq \cdots \leq \lambda_n$ be the eigenvalues.
We knew that $\nul(L)$ is the lagest $k$ such that $\lambda_k = 0$, and it is also the number of connected components of $G$.
Similarly, we may pick a threshold $\epsilon > 0$ such that eigenvalues with $\lambda < \epsilon$ is considered as zeroish eigenvalues.
Then the largest $k$ such that $\lambda_k < \epsilon$ can be interpreted as the number of clusters.

Let $\{\bv_1,\ldots, \bv_n\}$ be the corresponding orthonormal eigenbasis.
We knew that $\bv_1 = \bone$.
The unit eigenvector $\bv_2$ corresponding to $\lambda_2$ is called the Fiedler vector.

One may partition a graph by its Fiedler vector as follows:

  • Let $(\bv_2)_i$ be the $i$-th entry of the Fiedler vector $\bv_2$.
  • Let $N_+ = \{ i : (\bv_2)_i > 0\}$.
  • Let $N_- = \{ i : (\bv_2)_i < 0\}$.
  • Let $N_0 = \{ i : (\bv_2)_i = 0\}$.
  • Then the graph can be partitioned either by $(N_+\cup N_0, N_-)$ or $(N_+, N_-\cup N_0)$.

Let $G$ be a graph and $W$ a subset of vertices.
Then the induced subgraph of $G$ on $W$ is denoted by $G[W]$ and is the graph on the vertex set $W$ and with all edges in $G$ whose both endpoints are in $W$.

Fiedler's partition theorem (simple version)¶

Let $G$ be a connected graph and $L$ its Laplacian matrix.
Assume the multiplicity of the second smallest eigenvalue $\lambda_2$ is $1$.
Let $\bv=(v_0,\ldots,v_{n-1})$ be the eigenvector of $L$ with respect to $\lambda_2$.
Let $N_+ = \{ i : (\bv_2)_i > 0\}$.
Let $N_- = \{ i : (\bv_2)_i < 0\}$.
Let $N_0 = \{ i : (\bv_2)_i = 0\}$.
Then $G[N_+]$ and $G[N_0\cup N_-]$ are connected subgraphs of $G$.

From this point of view, the Fiedler vector assigns a number to each vertex, so it can be viewed as a feature vector.
Indeed, one may use $\bv_3, \bv_4, \ldots$ as new feature vectors.
This lead to the idea called the spectral embedding of a graph.

Algorithm (spectral embedding)¶

Input: a graph $G$ on $n$ vertices, and the desired dimension $k$
Output: an $n\times k$ data matrix $Y$

  1. Let $L = L(G)$.
  2. Find a diagonal matrix $D$, whose diagonal entries are $\lambda_1 \leq \cdots \leq \lambda_d$, and an orthogonal matrix $Q$ such that $L = QDQ\trans$.
  3. Let $Y$ be the $n\times k$ matrix composed of the $2$-nd to the $(k+1)$-th columns of $Q$.

The importance of spectral embedding is to provide a way to transform a graph data into a data matrix.
Thus, one may use clustering algorithms, such as $k$-means, to cluster the vertices of the graph.
The combination of spectral embedding and $k$-means is called the spectral clustering .

Finally, all the theories in 614 and 615 works for weighted graphs, where each edge $\{i,j\}$ is assigned with a positive weight $w_{ij}$.
The weighted Laplacian matrix of a weighted graph is having $-w_{ij}$ on the edges and

$$ \sum_{ \substack{j \neq i\\\{j,i\}\in E(G)} } w_{ij}. $$

on the $i,i$-entry.

Side stories¶

  • $k$-means
  • image segmentation

Experiments¶

Exercise 1¶

執行以下程式碼。

Run the code below.
In [ ]:
### code
set_random_seed(0)
print_ans = False

k = choice([2,3,4])
sizes = [choice([4,5,6]) for i in range(k)]
n = sum(sizes)
psum = 0
cuts = [0]
for s in sizes:
    psum += s
    cuts.append(psum)

g = graphs.PathGraph(n)
g.set_pos(None)
for a,b in zip(cuts[:-1], cuts[1:]):
    for i in range(a, b - 1):
        for j in range(i + 1, b):
            g.add_edge(i,j)
    
g.show(figsize=(5,5), title="$G$")
L = g.laplacian_matrix()
eigs = L.eigenvalues()
eigs.sort()
print("eigenvalues:", eigs)

if print_ans:
    print("I would guess G has %s clusters."%k)
Exercise 1(a)¶

觀察畫出來的圖,你認為 $G$ 有幾個群?

Observe the given graphs. Based on your intuition, how many clusters are there in $G$?
Exercise 1(b)¶

觀察 $L(G)$ 的特徵值,你認為 $G$ 有幾群?

Observe the eigenvalues of $L(G)$. Based on your intuition, how many clusters are there in $G$?

Exercises¶

Exercise 2¶

逐行解釋以下程式碼。

Explain the code below line by line.
In [ ]:
### code
import numpy as np
from scipy.linalg import eigh

g = graphs.PathGraph(11)
L = np.array(g.laplacian_matrix())
eigs,vecs = eigh(L, eigvals=(1,1))
fvec = vecs.T[0]
eps = 0.00001
Np = [i for i in g.vertices() if fvec[i] > eps]
Nm = [i for i in g.vertices() if fvec[i] < -eps]
Nz = [i for i in g.vertices() if abs(fvec[i]) < eps]

g.show(vertex_colors={"red": Np, "green": Nm, "white": Nz})
Exercise 3¶

逐行解釋以下程式碼。

Explain the code below line by line.
In [ ]:
### code
import numpy as np
from scipy.linalg import eigh
import matplotlib.pyplot as plt
%matplotlib inline

g = graphs.CycleGraph(10)
n = g.order()
L = np.array(g.laplacian_matrix())
vals,vecs = eigh(L, subset_by_index=(1,2))
x,y = vecs.T

plt.axis("equal")
### plot points
plt.scatter(x, y, s=50, zorder=3)
### add vertex labels
for i in range(n):
    plt.annotate(i, (x[i], y[i]), zorder=4)
### add lines
for i,j in g.edges(labels=False):
    plt.plot([x[i],x[j]], [y[i],y[j]], 'c')
Exercise 4¶

逐行解釋以下程式碼。

Explain the code below line by line.
In [ ]:
### code
import numpy as np
from scipy.linalg import eigh
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
%matplotlib notebook

g = graphs.CubeGraph(3)
g.relabel()
n = g.order()
L = np.array(g.laplacian_matrix())
vals,vecs = eigh(L, subset_by_index=(1,3))
x,y,z = vecs.T

ax = plt.axes(projection='3d')
ax.set_xlim(-1,1)
ax.set_ylim(-1,1)
ax.set_zlim(-1,1)

### plot points
ax.scatter(x, y, z, s=50, zorder=3)
### add vertex labels
for i in range(n):
    ax.text(x[i], y[i], z[i], i, zorder=4)
### add lines
for i,j in g.edges(labels=False):
    ax.plot([x[i],x[j]], [y[i],y[j]], [z[i],z[j]], 'c')
Exercise 5¶

令 $G$ 為一 $n$ 個點的圖、$L = L(G)$。
令 $Y$ 為一 $n\times k$ 矩陣,其各列為 $\bx_1, \ldots, \bx_n\in\mathbb{R}^k$。

Let $G$ be a graph on $n$ vertices and $L = L(G)$. Let $Y$ be an $n\times k$ matrix whose rows are $\bx_1, \ldots, \bx_n\in\mathbb{R}^k$.
Exercise 5(a)¶

說明

$$ \tr(Y\trans LY) = \sum_{\{i,j\}\in E(G)} \|\bx_i - \bx_j\|^2. $$

Show that $$ \tr(Y\trans LY) = \sum_{\{i,j\}\in E(G)} \|\bx_i - \bx_j\|^2. $$
Exercise 5(b)¶

說明 $\bone\trans Y = \bzero$ 表示 $\bx_1, \ldots, \bx_n$ 的中心點在原點。

Explain why $\bone\trans Y = \bzero$ means $\bx_1, \ldots, \bx_n$ are centered at the origin.
Exercise 5(c)¶

說明 $Y\trans Y = I$ 表示 $Y$ 的各特徵向量(行向量)
彼此互相垂直、而且各自的變異數均為 $1$。

Explain why $Y\trans Y = I$ means the feature vectors (columns) of $Y$ are mutually orthogonal and each has variance $1$.
Exercise 5(d)¶

解釋拉普拉斯嵌入法做出來的 $Y$,就是以下最佳化問題的解:

minimize $\tr(Y\trans LY) = \sum_{\{i,j\}\in E(G)} \|\bx_i - \bx_j\|^2$,
subject to $\bone\trans Y = \bzero$, $Y\trans Y = I$

Show that the $Y$ from the spectral embedding is the solution to the optimization problem: minimize $\tr(Y\trans LY) = \sum_{\{i,j\}\in E(G)} \|\bx_i - \bx_j\|^2$, subject to $\bone\trans Y = \bzero$, $Y\trans Y = I$
Exercise 6¶

先執行(讀入)筆記最下方的所有函數,並執行以下程式碼。
查找資源解釋什麼是 $k$-means 分群演算法。

Run the code in the end of this note to load the functions, then run the code below. Search online to explain the $k$-means clustering algorithm.
In [ ]:
### code
%matplotlib inline
X = make_blobs()
y_new, centers = k_means(X, 3)
plt.scatter(*X.T, c=y_new)
Exercise 7¶

各種分群演算法都可以拿來將圖片的像素分群,整體的效果等同在將圖像分割。
先執行(讀入)筆記最下方的所有函數,並執行以下程式碼。
逐行解釋以下程式碼。

Any clustering algorithm can be used to partition the pixels of an image. And the effect of this is equivalent to image segmentation. Run the code in the end of this note to load the functions, then run the code below. Explain the code below line by line.
In [ ]:
### original image
import numpy as np
from PIL import Image

r = 10
img = Image.open('incrediville-side.jpg')
x,y = img.size
img = img.resize((x // r, y // r))

img
In [ ]:
### image segmentation by the k-means clustering algorithm
import matplotlib.pyplot as plt
k = 3

arr = np.array(img, dtype=float) / 256
m,n,r = arr.shape
N = m * n
arr = arr.reshape(N, r)
y_new, centers = k_means(arr, k)

fig,axs = plt.subplots(1, 3, figsize=np.array([3*n,m])/100)
for i in range(k):
    axs[i].axis('off')
    axs[i].imshow((y_new == i).reshape(m,n), cmap='binary')

本次練習所須的函數¶

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
In [ ]:
def make_blobs(N=150, k=3, d=2, seed=None):
    """
    Input:
        N: an integer, number of samples
        k: an integer, number of blobs
        d: an integer, dimension of the space
    Output:
        a dataset X of shape (N, d)
    """
    np.random.seed(seed)
    X = np.random.randn(N,d)
    blob_size = N // k
    centers = np.random.randn(k, d) * 3
    for i in range(k):
        left = blob_size * i
        right = blob_size * (i+1) if i != k-1 else N
        X[left:right] += centers[i]
    return X
In [ ]:
def dist_mtx(X, Y=None):
    """Return the distance matrix between rows of X and rows of Y
    
    Input:  
        X: an array of shape (N,d)
        Y: an array of shape (M,d)
            if None, Y = X
           
    Output:
        the matrix [d_ij] where d_ij is the distance between  
        the i-th row of X and the j-th row of Y
    """
    if isinstance(Y, np.ndarray):
        pass
    elif Y == None:
        Y = X.copy()
    else:
        raise TypeError("Y should be a NumPy array or None") 
    X_col = X[:, np.newaxis, :]
    Y_row = Y[np.newaxis, :, :]
    diff = X_col - Y_row
    dist = np.sqrt(np.sum(diff**2, axis=-1))
    return dist
In [ ]:
def k_means(X, k, init="random"):
    """k-means clustering algorithm
    
    Input:  
        X: an array of shape (N,d)  
            rows for samples and columns for features
        k: number of clusters
        init: "random" or an array of shape (k,d)
            if "random", k points are chosen randomly from X as the initial cluster centers  
            if an array, the array is used as the initial cluster centers
        
    Output:
        (y_new, centers)
        y_new: an array of shape (N,)  
            that records the labels in (0, ..., k-1) of each sample 
        centers: an array of shape (k,d)  
            that records the cluster centers
            
    Example:
        mu = np.array([3,3])
        cov = np.eye(2)
        X = np.vstack([np.random.multivariate_normal(mu, cov, 100), 
                       np.random.multivariate_normal(-mu, cov, 100)])
        y_new,centers = k_means(X, 2)
    """
    N,d = X.shape
    
    ### initialize y and center
    if isinstance(init, np.ndarray):
        centers = init.copy()
    elif init == "random":
        inds = np.random.choice(np.arange(N), k, replace=False)
        centers = X[inds, :]
    else:
        raise TypeError("init can only be a NumPy array or 'random'")

    dist = dist_mtx(X, centers)
    y_new = dist.argmin(axis=1)
    
    while True:        
        ### compute the new centers
        for i in range(k):
            mask = (y_new == i)
            centers[i] = X[mask].mean(axis=0)
        
        ### generate the new y_new
        dist = dist_mtx(X, centers)
        y_last = y_new.copy()
        y_new = dist.argmin(axis=1)
        
        if np.all(y_last == y_new):
            break

    return y_new, centers
In [ ]: