605-Singular-value-decomposition

$\newcommand{\trans}{^\top} \newcommand{\adj}{^{\rm adj}} \newcommand{\cof}{^{\rm cof}} \newcommand{\inp}[2]{\left\langle#1,#2\right\rangle} \newcommand{\dunion}{\mathbin{\dot\cup}} \newcommand{\bzero}{\mathbf{0}} \newcommand{\bone}{\mathbf{1}} \newcommand{\ba}{\mathbf{a}} \newcommand{\bb}{\mathbf{b}} \newcommand{\bc}{\mathbf{c}} \newcommand{\bd}{\mathbf{d}} \newcommand{\be}{\mathbf{e}} \newcommand{\bh}{\mathbf{h}} \newcommand{\bp}{\mathbf{p}} \newcommand{\bq}{\mathbf{q}} \newcommand{\br}{\mathbf{r}} \newcommand{\bx}{\mathbf{x}} \newcommand{\by}{\mathbf{y}} \newcommand{\bz}{\mathbf{z}} \newcommand{\bu}{\mathbf{u}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bw}{\mathbf{w}} \newcommand{\tr}{\operatorname{tr}} \newcommand{\nul}{\operatorname{null}} \newcommand{\rank}{\operatorname{rank}} %\newcommand{\ker}{\operatorname{ker}} \newcommand{\range}{\operatorname{range}} \newcommand{\Col}{\operatorname{Col}} \newcommand{\Row}{\operatorname{Row}} \newcommand{\spec}{\operatorname{spec}} \newcommand{\vspan}{\operatorname{span}} \newcommand{\Vol}{\operatorname{Vol}} \newcommand{\sgn}{\operatorname{sgn}} \newcommand{\idmap}{\operatorname{id}} \newcommand{\am}{\operatorname{am}} \newcommand{\gm}{\operatorname{gm}} \newcommand{\mult}{\operatorname{mult}} \newcommand{\iner}{\operatorname{iner}}$

In [ ]:

from lingeo import random_int_list

Continuing the introduction of the singular value decomposition in 314, this section will provide the theoretical foundation of it.

Singular value decomposition¶

Let $A$ be an $m\times n$ matrix.
Then there are orthonormal bases $\alpha$ and $\beta$ of $\mathbb{R}^n$ and $\mathbb{R}^m$, respectively, such that

$$ [f_A]_\alpha^\beta = \Sigma = \begin{bmatrix} \operatorname{diag}(\sigma_1, \ldots, \sigma_r) & O_{r,n-r} \\ O_{m-r,r} & O_{m-r,n-r} \end{bmatrix}, $$

where $\sigma_1\geq\cdots\geq\sigma_r$ and $\operatorname{diag}(\sigma_1,\ldots,\sigma_r)$ is the diagonal matrix with the given diagonal entries.

That is, there are $m\times m$ and $n\times n$ orthogonal matrices $U$ and $V$ such that $U^\top AV = \Sigma$ and $A = U\Sigma V^\top$.

Recall that $AB$ and $BA$ have the same set of nonzero eigenvalues.
(See 506-6.)
The values $\sigma_1 \geq \cdots \geq \sigma_r$ are called the singular values of $A$.

They are the (positive) square roots of the nonzero eigenvalues of $A\trans A$.
They are the (positive) square roots of the nonzero eigenvalues of $AA\trans$.
They are positive.
There are $r = \rank(A)$ of them.

Indeed, the columns of $V$ form an orthonormal eigenbasis of $A\trans A$,
while the columns of $U$ form an orthonormal eigenbasis of $AA\trans$.

The singular value decomposition of an $m\times n$ matirx can be found by the following steps:

Compute an orthonormal eigenbasis $\alpha$ of $A\trans A$.
Order the eigenvectors in $\alpha$ by the corresponding eigenvalues in the non-increasing order.
Let $\alpha_1$ and $\alpha_2$ be the sets of eigenvectors in $\alpha$ that correspond to positive and zero eigenvalues, respectively.
Let $\lambda_1 \geq \cdots \geq \lambda_r$ be the positive eigenvalues and $\sigma_i = \sqrt{\lambda_i}$ for $i = 1,\ldots, r$.
Let $\beta_1 = \{\frac{1}{\sigma_i}A\bv: \bv \in \alpha_1\}$.
Let $\beta_0$ be an orthonormal basis of $\ker(AA\trans)$.
Let $\beta = \beta_1 \cup \beta_0$.

Thus, the desired eigenbasis are found.

For the construction of the matrices.

Construct $V$ by using $\alpha$ as the columns vectors.
Construct $U$ by using $\beta$ as the columns vectors.
The singular values are the (positive) square roots of nonzero eigenvalues of $A\trans A$ (or $AA\trans$).

Side stories¶

$AB$ and $BA$ have the same nonzero eigenvalues
image compression
Moore–Penrose pseudo inverse (TBD)

Exercise 1¶

執行以下程式碼。
令 $\alpha = \{\bv_1, \ldots, \bv_n\}$ 為 $\mathbb{R}^n$ 中的垂直標準基、
$\beta = \{\bu_1, \ldots, \bu_m\}$ 為 $\mathbb{R}^m$ 中的垂直標準基。
已知一線性函數 $f : \mathbb{R}^n \rightarrow \mathbb{R}^m$ 具有 $[f]_\alpha^\beta = \Sigma$ 的性質。

Run the code below. Let $\alpha = \{\bv_1, \ldots, \bv_n\}$ be an orthonormal basis of $\mathbb{R}^n$, $\beta = \{\bu_1, \ldots, \bu_m\}$ an orthonormal basis of $\mathbb{R}^m$. It is known that $f : \mathbb{R}^n \rightarrow \mathbb{R}^m$ is a linear function with $[f]_\alpha^\beta = \Sigma$.

In [ ]:

### code
set_random_seed(0)
print_ans = False

r = choice([1,2,3])
while True:
    sigs = list(map(lambda k: abs(k), random_int_list(r)))
    sigs.sort(reverse=True)
    if sigs[-1] > 0:
        break

m,n = 3,4
Sigma = zero_matrix(m,n)
for i in range(r):
    Sigma[i,i] = sigs[i]

cs = random_int_list(n)
    
print("m,n = %s,%s"%(m, n))
pretty_print(LatexExpr(r"\Sigma ="), Sigma)
print("c_1, ..., c_n =", cs)

if print_ans:
    rs = [Sigma[i,i] for i in range(m)]
    cvs = " + ".join(r"(%s)\mathbf{v}_{%s}"%(cs[i], i + 1) for i in range(n))
    fcvs = " + ".join(r"(%s)\mathbf{u}_{%s}"%(rs[i] * cs[i], i + 1) for i in range(m))
    pretty_print(LatexExpr(cvs + "=" + fcvs))
    print("kernel of f is the span of v%s ~ vn"%(r+1))
    print("range of f is the span of u1 ~ u%s"%(r))

Exercise 1(a)¶

將 $f(c_1\bv_1 + \cdots + c_n\bv_n)$ 寫成 $\beta$ 的線性組合。

Write $f(c_1\bv_1 + \cdots + c_n\bv_n)$ as a linear combination of $\beta$.

Exercise 1(b)¶

用 $\alpha$ 中的向量來描述 $\ker(f)$。

Use vectors in $\alpha$ to describe $\ker(f)$.

Exercise 1(c)¶

用 $\beta$ 中的向量來描述 $\range(f)$。

Use vectors in $\beta$ to describe $\range(f)$.

Exercise 2¶

求以下矩陣的奇異值分解。

For each of the following matrices, find its singular value decomposition.

Exercise 2(a)¶

$$ A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}. $$

Exercise 2(b)¶

$$ A = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}. $$

Exercise 3¶

依照以下步驟求出

$$ A = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & -1 & -1 \end{bmatrix} $$

的奇異值分解。

Use the given instructions to find the singular value decomposition of $$ A = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & -1 & -1 \end{bmatrix}. $$

Exercise 3(a)¶

求出 $A\trans A$ 的所有特徵值 $\lambda_1 \geq \cdots \geq \lambda_4$ 及其對應的一組垂直單位長的特徵基底 $\alpha = \{\bv_1,\ldots, \bv_4\}$。

Find the eigenvalues $\lambda_1 \geq \cdots \geq \lambda_4$ of $A\trans A$ and the corresponding orthonormal eigenbasis $\alpha = \{\bv_1,\ldots, \bv_4\}$.

Exercise 3(b)¶

求出 $AA\trans$ 的所有特徵值 $\mu_1 \geq \mu_2$ 及其對應的一組垂直單位長的特徵基底 $\beta = \{\bu_1, \bu_2\}$。

Find the eigenvalues $\mu_1 \geq \mu_2$ of $AA\trans$ and the corresponding orthonormal eigenbasis $\beta = \{\bu_1, \bu_2\}$.

Exercise 3(c)¶

令 $\sigma_1 = \sqrt{\lambda_1}$、$\sigma_2 = \sqrt{\lambda_2}$。
判斷以下敘述是否正確：

$\lambda_1 = \mu_1$ 且 $\lambda_2 = \mu_2$。
$A\bv_1 = \sigma\bu_1$ 且 $A\bv_2 = \sigma_2\bu_2$。
$A\bv_1$ 是 $AA\trans$ 的特徵向量且特徵值為 $\lambda_1$。
$A\bv_2$ 是 $AA\trans$ 的特徵向量且特徵值為 $\lambda_2$。
$A\bv_1$ 的長度為 $\sigma_1$、$A\bv_2$ 的長度為 $\sigma_2$。

Let $\sigma_1 = \sqrt{\lambda_1}$、$\sigma_2 = \sqrt{\lambda_2}$. Determine if the following statements are true or false: 1. $\lambda_1 = \mu_1$ and $\lambda_2 = \mu_2$. 2. $A\bv_1 = \sigma\bu_1$ and $A\bv_2 = \sigma_2\bu_2$. 3. $A\bv_1$ is an eigenvector of $AA\trans$ with respect to the eigenvalue $\lambda_1$, while $A\bv_2$ is an eigenvector of $AA\trans$ with respect to the eigenvalue $\lambda_2$. 3. The length of $A\bv_1$ is $\sigma_1$, and the length of $A\bv_2$ is $\sigma_2$.

Exercise 3(d)¶

將 $\beta$ 做適當的修正使得 $\alpha$、$\beta$ 可以將 $A$ 奇異值分解。

Modify your $\beta$ so that $\alpha$ and $\beta$ lead to the singular decomposition of $A$.

Exercise 4¶

在 506-6 我們用連續性論證來說明 $AB$ 和 $BA$ 有相同的非零特徵值集合。
這裡提供另一種方式來證明。

In 506-6, we used the continuity argument to show that $AB$ and $BA$ have the same set of nonzero eigenvalues. Here we introduce an alternative proof.

Exercise 4(a)¶

令

$$ X = \begin{bmatrix} AB & A \\ O & O \end{bmatrix}, \quad Y = \begin{bmatrix} O & A \\ O & BA \end{bmatrix}. $$

參考 408 寫出「把 $X$ 的上區塊左乘 $B$ 加到下區塊」的區塊基本矩陣 $E$。
並驗證 $EXE^{-1} = Y$。

Let $$ X = \begin{bmatrix} AB & A \\ O & O \end{bmatrix}, \quad Y = \begin{bmatrix} O & A \\ O & BA \end{bmatrix}. $$ See 408 and write down the block elementary matrix $E$ for the operation of "adding $B$ times the upper block row to the lower block row" in $X$. Then verify that $EXE^{-1} = Y$.

Exercise 4(b)¶

證明 $AB$ 和 $BA$ 有相同的非零特徵值集合。

Prove that $AB$ and $BA$ have the same set of nonzero eigenvalues.

Exercise 5¶

執行以下程式碼，並用文字解釋輸出的各項是什麼。

Run the code below. Then explain the output in your own words.

In [ ]:

### code
import numpy as np

arr = np.array([
    [1,1,1],
    [1,1,1]
])
np.linalg.svd(arr)

Exercise 6¶

以下小題解釋如何利用奇異值分解進行影像壓縮（去除雜訊）。

The following exercises demonstrate an image compression (or noise removal) technique through the singular value decomposition.

Exercise 6(a)¶

執行以下程式碼。選用不同的 k 來看看結果有什麼差異。
用文字敘述 k 對結果的影響、
並選一個 k 是你能接受的最小值。

Run the code below. Set `k` to be different values and observe the output. Explain the effect of `k` to the output in your own words. Pick the smallest `k` so that you feel the image is still clear.

In [ ]:

### original image
import numpy as np
from PIL import Image

r = 10
img = Image.open('incrediville-side.jpg')
x,y = img.size
img = img.resize((x // r, y // r)).convert("L")

img

In [ ]:

### compressed image
k = 10

arr = np.array(img)
u,s,vh = np.linalg.svd(arr)
arr_compressed = (u[:,:k] * s[:k]).dot(vh[:k,:])

img_compressed = Image.fromarray(arr_compressed.astype('uint8'), 'L')
img_compressed

Exercise 6(b)¶

令 $A$ 為一 $m\times n$ 矩陣、且 $\rank(A) = r$。
已知 $A = U\Sigma V\trans$ 為其奇異值分解。
令 $\bu_1,\ldots,\bu_{m}$ 為 $U$ 的各行向量、
而 $\bv_1, \ldots, \bv_{n}$ 為 $V$ 的各行向量。

說明 $A$ 可寫成

$$ A = \sum_{i=1}^{r} \sigma_i \bu_i \bv_i\trans. $$

因此對任意 $k \leq r$ 而言，$A' = \sum_{i=1}^{k} \sigma_i \bu_i \bv_i\trans$ 都可視為 $A$ 的一個逼近。

Let $A$ be an $m\times n$ matrix and $\rank(A) = r$. It is known that $A = U\Sigma V\trans$ is the singular value decomposition. Let $\bu_1,\ldots,\bu_{m}$ be the columns of $U$ and $\bv_1, \ldots, \bv_{n}$ the columns of $V$. Show that $A$ can be written as $$ A = \sum_{i=1}^{r} \sigma_i \bu_i \bv_i\trans. $$ Therefore, $A' = \sum_{i=1}^{k} \sigma_i \bu_i \bv_i\trans$ can be viewed as an approximation of $A$ for any $k \leq r$.

Exercise 6(c)¶

在矩陣世界裡，我們定義長度平方為 $\|X\|^2 = \tr(X\trans X)$。
因此也可以計算兩矩陣 $X$ 和 $Y$ 的距離為 $\|X - Y\|$。

驗證對每一個 $i$ 都有 $\|\bu_i\bv_i\trans\| = 1$，
並說明 $\|A - A'\|^2 = \sum_{i = k+1}^r \sigma_i^2$。

（因此我們刪除小的奇異值是合理的，並不會改變 $A$ 太多。）

For matrices , we may define the norm by $\|X\|^2 = \tr(X\trans X)$. Consequently, this defines the distance between two matrices $X$ and $Y$ by $\|X - Y\|$. Verify that $\|\bu_i\bv_i\trans\| = 1$ for each $i$. Then show that $\|A - A'\|^2 = \sum_{i = k+1}^r \sigma_i^2$. (Therefore, it is fine to remove some small singular values, as it does not change $A$ too much.)

Exercise 6(d)¶

令 x 為電腦儲存一個浮點數所需的容量。
說明一張 $m\times n$ 畫素的灰階圖片大約佔 $mn$ x 的容量、
而給定 $k$ 經過壓縮後的圖片大約佔 $mk + nk + k$ x 的容量。

Let `x` be the unit representing the space required for a computer to store a float number. Explain that a grayscale image with $m\times n$ pixels occupies a space of about $mn$ `x` , while the compressed image by a given $k$ requires a space of about $mk + nk + k$ `x` to store it.

Exercise 7¶

以下練習討論一個矩陣 $A$ 的摩爾–彭若斯廣義反矩陣（Moore–Penrose pseudoinverse） 。
記作 $A^\dagger$。

The following exercises studies the **Moore–Penrose pseudoinverse** of a matrix $A$, which is denoted by $A^\dagger$.

Exercise 7(a)¶

若 $\sigma_1, \ldots, \sigma_r$ 為非零實數，且

$$ \Sigma = \begin{bmatrix} \operatorname{diag}(\sigma_1, \ldots, \sigma_r) & O_{r,n-r} \\ O_{m-r,r} & O_{m-r,n-r} \end{bmatrix}, $$

則

$$ \Sigma^\dagger = \begin{bmatrix} \operatorname{diag}(\sigma_1^{-1}, \ldots, \sigma_r^{-1}) & O_{r,m-r} \\ O_{n-r,r} & O_{n-r,m-r} \end{bmatrix}. $$

（注意零矩陣的大小。）

求

$$ \Sigma = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} $$

的 $\Sigma^\dagger$。

Let $\sigma_1, \ldots, \sigma_r$ be nonzero real numbers and $$ \Sigma = \begin{bmatrix} \operatorname{diag}(\sigma_1, \ldots, \sigma_r) & O_{r,n-r} \\ O_{m-r,r} & O_{m-r,n-r} \end{bmatrix}. $$ Then $$ \Sigma^\dagger = \begin{bmatrix} \operatorname{diag}(\sigma_1^{-1}, \ldots, \sigma_r^{-1}) & O_{r,m-r} \\ O_{n-r,r} & O_{n-r,m-r} \end{bmatrix}. $$ (Be aware of the sizes of the matrics.) For $$ \Sigma = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}, $$ find $\Sigma^\dagger$.

Exercise 7(b)¶

若 $A$ 的奇異值分解為 $U\Sigma V\trans$，
則 $A^\dagger = V\Sigma^\dagger U\trans$。

求

$$ A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ \end{bmatrix} $$

的 $A^\dagger$。

If the singular value decomposition of $A$ is $U\Sigma V\trans$, then $A^\dagger = V\Sigma^\dagger U\trans$. For $$ A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ \end{bmatrix}, $$ find $A^\dagger$.

Exercise 8¶

令 $A$ 為一 $m\times n$ 矩陣。
以下練習建立奇異值分解的理論基礎。

Let $A$ be an $m\times n$ matrix. The following exercises provide theoretical details about the singular value decomposition.

Exercise 8(a)¶

令 $\bv$ 為 $A\trans A$ 的特徵向量且 $A\trans A\bv = \lambda\bv$。
證明 $A\bv$ 滿足 $AA\trans (A\bv) = \lambda(A\bv)$。

Let $\bv$ be an eigenvector of $A\trans A$ and $A\trans A\bv = \lambda\bv$. Show that $A\bv$ has the property that $AA\trans (A\bv) = \lambda(A\bv)$.

Exercise 8(b)¶

令 $\bv$ 為 $A\trans A$ 的特徵向量且 $A\trans A\bv = \lambda\bv$。
證明對任意向量 $\bu\in\mathbb{R}^n$，都有 $\inp{A\bv}{A\bu} = \lambda\inp{\bv}{\bu}$。

Let $\bv$ be an eigenvector of $A\trans A$ and $A\trans A\bv = \lambda\bv$. Show that $\inp{A\bv}{A\bu} = \lambda\inp{\bv}{\bu}$ for any $\bu\in\mathbb{R}^n$.

Exercise 8(c)¶

藉由以上性質證明：

$\lambda \geq 0$。
$A\bv$ 是 $AA\trans$ 的特徵向量，對應的特徵值為 $\lambda$，且長度為 $\sqrt{\lambda}$。
若 $\alpha_1$ 為 $\Row(A\trans A)$ 的一組垂直標準特徵基底，
則 $\beta_1 = \{\frac{1}{\sigma_i}A\bv: \bv \in \alpha_1\}$ 為 $\Row(AA\trans)$ 的一組垂直標準特徵基底。

Based on the previous exercises, prove the following properties: 1. $\lambda \geq 0$. 2. $A\bv$ is an eigenvector of $AA\trans$ with respect to $\lambda$ of length $\sqrt{\lambda}$. 3. If $\alpha_1$ is an orthonormal basis of $\Row(A\trans A)$, then $\beta_1 = \{\frac{1}{\sigma_i}A\bv: \bv \in \alpha_1\}$ is an orthonormal basis of $\Row(AA\trans)$.

奇異值分解¶

Main idea¶

Singular value decomposition¶

Side stories¶

Experiments¶

Exercise 1¶

Exercise 1(a)¶

Exercise 1(b)¶

Exercise 1(c)¶

Exercises¶

Exercise 2¶

Exercise 2(a)¶

Exercise 2(b)¶

Exercise 3¶

Exercise 3(a)¶

Exercise 3(b)¶

Exercise 3(c)¶

Exercise 3(d)¶

Exercise 4¶

Exercise 4(a)¶

Exercise 4(b)¶

Exercise 5¶

Exercise 6¶

Exercise 6(a)¶

Exercise 6(b)¶

Exercise 6(c)¶

Exercise 6(d)¶

Exercise 7¶

Exercise 7(a)¶

Exercise 7(b)¶

Exercise 8¶

Exercise 8(a)¶

Exercise 8(b)¶

Exercise 8(c)¶