0.Conclusion

To conclude with, Gram and Kernel matrices are both symmetric matrices so that they are eligible to Eigen-decomposition, i.e., $A = V\Lambda V^T = \sum_i \lambda_iv_iv_i^T$ , where $\lambda_i$ is a Eigenvalue and $v_i$ is its Eigenvector, $VV^T = I$ , and $V^{-1} = V^T$ . They are positive semi-definite matrices, which means $\forall i, \lambda_i \geq 0$ and $v^TAv \geq 0$ . If they are not singular, i.e., $\not \exists i, \lambda_i = 0$ , they are positive definite.

1.Gram Matrix and Kernel Matrix

Given a set of $n$ vectors $X=\{x_1, x_2, \cdots, x_n\}$ , the Gram matrix is defined as an $n\times n$ matrix whose elements are $G_{ij}=<x_i, x_j>$ . The notation $<\cdot, \cdot>$ denotes the operation of inner product [1]. If a kernel function $\kappa$ is utilised to evaluate the inner products in a feature space with feature map $\phi$ , i.e., the Gram matrix has entries:

$G_{ij}=<\phi(x_i), \phi(x_j)>=\kappa(x_i, x_j)=K_{ij}$ .

This matrix is often referred to as Kernel matrix $K$ .

It is obvious that the Kernel matrix is symmetric.

2.Symmetric Matrix! Eigenvalues and Eigenvectors

2.1 Orthogonal

Due to the fact that the Gram/Kernel matrix is symmetric, it could be inferred that its Eigenvectors corresponding to different Eigenvalues are orthogonal. This is because, if $\lambda, \mu$ are two different Eigenvalues corresponding to Eigenvectors $x, z$ in $n \times n$ symmetric matrix $A$ , we have:

$\begin{matrix} \lambda<x,z>&=&<Ax,z>\\ &=&(Ax)^Tz\\ &=&x^TA^Tz\\ &=&x^TAz\\ &=&\mu <x, z> \end{matrix}$

and therefore $<x, z> = x^Tz = 0$ , i.e., the two Eigenvectors are orthogonal.

For Eigenvalues that appear multiple times, one can always find an orthonormal basis (Eigenvectors) for the corresponding Eigenspace, e.g., through Gram-Schmidt. Overall, since the Eigenspaces of different Eigenvalues are mutually orthogonal, all the Eigenvectors can be chosen to give an orthonormal set.

2.2 Deflation

With this property in mind, we are able to further explore the deflation of a symmetric matrix $A$ . Given an Eigenvalue and Eigenvector pair $\lambda, x$ of matrix $A$ , the deflation is a transformation:

$A \rightarrow \widetilde{A} = A - \lambda xx^T$ .

Note that $x$ is normalised, i.e., $x^Tx=1$ , and therefore:

$\widetilde{A}x = Ax - \lambda xx^Tx = \mathbf{0} = 0x$ ,

which says the deflation maintains the Eigenvector, but reduces the corresponding Eigenvalue to 0.

2.3 Decomposition

By repeatedly finding the Eigenvector corresponding to the largest positive (or smallest negative) Eigenvalue and then deflating, we can always find an orthonormal set of Eigenvectors that helps decompose the original symmetric matrix. More formally, the eigen-decomposition of the $n \times n$ matrix $A$ is formulated as:

$A = V\Lambda V^T$ .

This is derived from the fact that $AV = V\Lambda$ , i.e., $Av_i = \lambda_i v_i$ , where the columns in $V$ are the orthonormal Eigenvectors such that $V^T = V^{-1}$ and $\Lambda$ is a diagonal matrix with $\Lambda_{ii} = \lambda_i, i=1,2, \cdots, n$ . We assume that $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n$ . Also, it follows that:

$A^{-1} = V\Lambda^{-1}V^T$ , and $A^{2} = V\Lambda^{2}V^T$ .

3.Positive Semi-definite Matrix!

A matrix is positive semi-definite if its Eigenvalues are all non-negative. A symmetric matrix is not necessarily a positive semi-definite matrix. However, gram and kernel matrices are positive semi-definite (it’s not because of they are symmetric).

Another way of saying a matrix $A$ is semi-definite is $\forall v$ , it has: $v^TAv \geq 0$ .

To prove Gram and Kernel matrices are positive semi-definite, for a general case of kernel matrix:

$K_{ij} = \kappa(x_i, x_j) = <\phi(x_i), \phi(x_j)>$ for $i,j = 1, 2, \cdots, n$ .

Therefore, for any vector $v$ , we have:

$v^TKv \\ = \sum_{i,j=1}^{n}v_iv_jK_{ij} \\ = \sum_{i,j=1}^{n}v_iv_j<\phi(x_i), \phi(x_j)>\\ = <\sum_i^n v_i\phi(x_i), \sum_j^n v_j\phi(x_j)>\\ = \Vert \sum_i^n v_i\phi(x_i) \Vert^2 \geq 0$

We also have the proposition that, a matrix $A$ is positive semi-definite if and only if $A = B^TB$ for some real matrix $B$ , i.e., $v^TAv = v^TB^TBv = \Vert Bv\Vert^2 \geq 0$ .

4.Determinant and Trace

4.1 About the Determinant

The determinant of a symmetric matrix $\mathrm{det}(A)$ is the product of the Eigenvalues of the matrix $\prod_{i=1}^n \lambda_i$ . If the matrix is positive definite, the determinant would be strictly positive. If the matrix is positive semi-definite, the determinant could be zero when the matrix is a singular matrix.

It also has: $\mathrm{det}(AB) = \mathrm{det}(A)\mathrm{det}(B)$

4.2 About the Trace

For the trace of a symmetric matrix, we have $\mathrm{tr}(AB) = \mathrm{tr}(BA)$ . In other words, for gram and kernel matrices, it has:

$\mathrm{tr}(V^TAV) = \mathrm{tr}(AVV^T) = \mathrm{tr}(A)$ ,

$\mathrm{tr}(V^TAV) = \mathrm{tr}(V^TV\Lambda V^TV) = \mathrm{tr}(\Lambda)$ .

5.Singular Matrix?

If a matrix $A$ is singular, it means that there is a non-trivial solution $x$ for the equation:

$Ax = \mathbf{0} = 0x$ .

And this shows equivalently that:

If a matrix is singular, its column vectors are linear dependent and span a space of dimension less than the number of columns;
If a matrix is singular, it has at least one eigenvalue equal to 0.

On the other hand:

If a matrix is non-singular, its column vectors are linear independent and span a space of dimension equal to the number of columns;
If a matrix is non-singular, due to the fact that its columns span the space of dimension equal to the number of columns, e.g., for matrix $A$ it has $Au_i=e_i$ and then $AU=I$ , we are able to find the multiplicative inverse of the matrix, i.e., $U=A^{-1}$ .

Note that, we cannot conclude whether Gram matrix and Kernel matrix are singular or not according to their definition. Consider a dataset that has two same data points, there will be two same columns in the Gram/Kernel Matrix such that the columns are linear dependent, i.e., singular. Therefore, singularity is an application-specific property.

[1] John Shawe-Taylor, Nello Cristianini, Kernel methods for pattern analysis, Cambridge University Press, 2004.