612-Covariance-matrix

$\newcommand{\trans}{^\top} \newcommand{\adj}{^{\rm adj}} \newcommand{\cof}{^{\rm cof}} \newcommand{\inp}[2]{\left\langle#1,#2\right\rangle} \newcommand{\dunion}{\mathbin{\dot\cup}} \newcommand{\bzero}{\mathbf{0}} \newcommand{\bone}{\mathbf{1}} \newcommand{\ba}{\mathbf{a}} \newcommand{\bb}{\mathbf{b}} \newcommand{\bc}{\mathbf{c}} \newcommand{\bd}{\mathbf{d}} \newcommand{\be}{\mathbf{e}} \newcommand{\bh}{\mathbf{h}} \newcommand{\bp}{\mathbf{p}} \newcommand{\bq}{\mathbf{q}} \newcommand{\br}{\mathbf{r}} \newcommand{\bx}{\mathbf{x}} \newcommand{\by}{\mathbf{y}} \newcommand{\bz}{\mathbf{z}} \newcommand{\bu}{\mathbf{u}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bw}{\mathbf{w}} \newcommand{\tr}{\operatorname{tr}} \newcommand{\nul}{\operatorname{null}} \newcommand{\rank}{\operatorname{rank}} %\newcommand{\ker}{\operatorname{ker}} \newcommand{\range}{\operatorname{range}} \newcommand{\Col}{\operatorname{Col}} \newcommand{\Row}{\operatorname{Row}} \newcommand{\spec}{\operatorname{spec}} \newcommand{\vspan}{\operatorname{span}} \newcommand{\Vol}{\operatorname{Vol}} \newcommand{\sgn}{\operatorname{sgn}} \newcommand{\idmap}{\operatorname{id}} \newcommand{\am}{\operatorname{am}} \newcommand{\gm}{\operatorname{gm}} \newcommand{\mult}{\operatorname{mult}} \newcommand{\iner}{\operatorname{iner}}$

In [ ]:

from lingeo import random_int_list

Let $\bx = (x_1, \ldots, x_N)$ be a collection of $N$ numbers.
The mean of $\bx$ is

$$ \mu = \frac{1}{n}(x_1 + \cdots + x_N), $$

which can also be computed by $\frac{1}{N}\inp{\bx}{\bone}$.

The variance of $\bx$ is

$$ \sigma^2 = \frac{1}{N-1}\left[(x_1 - \mu)^2 + \cdots + (x_N - \mu)^2\right], $$

which can also be computed by $\frac{1}{N-1}\inp{\bx -\mu\bone}{\bx - \mu\bone}$.

That is, one may shift the data and replace $\bx$ with $\bx - \mu\bone$.
Thus, the new data is centered at the origin,
and the variance of it is simply $\frac{1}{N-1}\inp{\bx}{\bx}$.

Let $\bx = (x_1, \ldots, x_N)$ and $\by = (y_1, \ldots, y_N)$ be two collections of numbers.
Let $\mu_\bx$ and $\mu_\by$ be the means of $\bx$ and $\by$, respectively.

The covariance of $\bx$ and $\by$ is

$$ \frac{1}{N-1}[(x_1 - \mu_\bx)(y_1 - \mu_\by) + \cdots + (x_N - \mu_\bx)(y_N - \mu_\by)]. $$

Similarly, the covariance can be computed by $\frac{1}{N-1}\inp{\bx - \mu_\bx\bone}{\by - \mu_\by\bone}$,
and the covariance of $\bx$ and $\bx$ is the variance of itself.

As mentioned in 606, data is often stored in a matrix such that each row represents a sample while each column represents a feature.
When $X$ is a such matrix of dimension $N\times d$, the row vectors are called the sameple vectors while the columns are called the feature vectors.

One may make each feature vector centered at $0$ in a few steps.

Let $\mu\trans = \frac{1}{N}\bone\trans X$ be the vector that records the mean of each feature vector.
Replace $X$ with $X - \bone\mu\trans = X - \frac{1}{N}JX = (I - \frac{1}{N}J)X$, where $J$ is the $N\times N$ all-ones matrix.

Once each feature is centered at $0$, the matrix $C = \frac{1}{N-1}X\trans X$ is called the covariance matrix, whose $i,j$-entry is the covariance between the $i$-th and the $j$-th feature.

Side stories¶

np.random.multivariate_normal

Exercise 1¶

執行以下程式碼。

Run the code below.

In [ ]:

### code
set_random_seed(0)
print_ans = False

n = 5
while True:
    X = matrix(n, random_int_list(2*n))
    if sum(X.transpose()[0]) % n == 0 and sum(X.transpose()[1]) % n == 0:
        break

pretty_print(LatexExpr("X ="), X)

if print_ans:
    mu = (1/n) * ones_matrix(1,n) * X
    print("means of x and y:", mu)
    X_shifted = X - (1/n) * ones_matrix(n,n) * X
    print("covariance matrix =")
    pretty_print(LatexExpr(r"\frac{1}{%s}"%(n-1)), X_shifted.transpose() * X_shifted)

Exercise 1(a)¶

令 $\bx$ 和 $\by$ 分別為 $X$ 的兩個行向量。
求 $\bx$ 的平均值和變異數、$\by$ 的平均值和變異數、以及 $\bx$ 和 $\by$ 的共變異數。

Let $\bx$ and $\by$ be the columns of $X$. Find the mean and the variance of $\bx$, the mean and the variance of $\by$, and the covariance of $\bx$ and $\by$.

Exercise 1(b)¶

將 $X$ 的各行當成資料的特徵。
求特徵和特徵之間的共變異數矩陣。

Consider the columns of $X$ as the features. Find the covariance matrix of them.

Exercise 2¶

說明資料矩陣的共變異數矩陣必定是半正定。

Show that the covariance matrix of any data is positive semidefinite.

Exercise 3¶

令 $\bx$ 和 $\by$ 為兩筆平均值為 $0$ 的特徵資料。
令 $X$ 為由 $\bx$ 和 $\by$ 作為行向量的資料矩陣。
若已知 $C = \frac{1}{N-1}X\trans X$ 為其共變異數矩陣。
利用 $C$ 來求 $\bx + \by$ 的變異數。

Let $\bx$ and $\by$ be two features with means $0$. Let $X$ be the matrix whose columns are $\bx$ and $\by$. Given the covariance matrix $C = \frac{1}{N-1}X\trans X$, find the variance of $\bx + \by$ by $C$.

Exercise 4¶

執行以下程式碼。
每張圖中都代表一筆二維資料，
每個點的 $x$-座標組成一個特徵向量 $\bx$，
而每個點的 $y$-座標組成一個特徵向量 $\by$。

Run the code below. Each figure shows a set of $2$-dimensional data points. In each figure, the $x$-coordinates of the points form a feature vector $\bx$, and the $y$-coordinates of the points form a feature vector $\by$.

In [ ]:

### code
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(12,4))
axs = fig.subplots(1, 3)
mu  = np.array([0,0])
covs = [np.array([[5,0],[0,1]]), 
        np.array([[1,0.9], [0.9,1]]), 
        np.array([[1,-0.9], [-0.9,1]])]
for i in range(3):
    axs[i].set_xlim(-5,5)
    axs[i].set_ylim(-5,5)
    data = np.random.multivariate_normal(mu, covs[i], (100,))
    axs[i].scatter(*data.T)

Exercise 4(a)¶

利用圖形來判斷哪一筆資料中的 $\bx$ 的變異數最大。

Observe the figures and determine which dataset has the maximum variance of $\bx$.

Exercise 4(b)¶

判斷每張圖中 $\bx$ 和 $\by$ 的共變異數為正、負、或是零。

For each figure, determine if the covariance of $\bx$ and $\by$ is positive, negative, or zero.

Exercise 5¶

查找資料並解釋 np.random.multivariate_normal 的用法。

Search online and explain the usage of `np.random.multivariate_normal` .

Exercise 6¶

令 $\bx$ 和 $\by$ 為兩筆長度為 $N$ 的特徵資料、
而 $\mu_\bx$ 和 $\mu_\by$ 分別為它們的平均數。
以下探討計算共變異數、變異數不同計算方法。

Let $\bx$ and $\by$ be $N$-dimensional feature vectors with means $\mu_\bx$ and $\mu_\by$, respectively. The following exercises provide alternative formulas for the covariance and the variance.

Exercise 6(a)¶

證明 $\bx$ 和 $\by$ 的共變異數也可寫成

$$ \frac{1}{N-1}(\inp{\bx}{\by} - N\mu_\bx\mu_\by). $$

Show that the covariance of $\bx$ and $\by$ can also be written as $$ \frac{1}{N-1}(\inp{\bx}{\by} - N\mu_\bx\mu_\by). $$

Exercise 6(a)¶

證明 $\bx$ 的變異數也可寫成

$$ \frac{1}{N-1}(\|\bx\|^2 - N\mu_\bx^2). $$

Show that the variance of $\bx$ can also be written as $$ \frac{1}{N-1}(\|\bx\|^2 - N\mu_\bx^2). $$

共變異數矩陣¶

Main idea¶

Side stories¶

Experiments¶

Exercise 1¶

Exercise 1(a)¶

Exercise 1(b)¶

Exercises¶

Exercise 2¶

Exercise 3¶

Exercise 4¶

Exercise 4(a)¶

Exercise 4(b)¶

Exercise 5¶

Exercise 6¶

Exercise 6(a)¶

Exercise 6(a)¶