21. Matrix Diagonalisation

21.1. Motivation

Recall that a diagonal matrix is a square matrix with zeros everywhere except the main diagonal. Multiplying a vector by a diagonal matrix D is easy. Just multiply each entry of the vector by the the element in the corresponding position of the diagonal matrix:

(d1000d2000dn)(x1x2xn)=(d1x1d2x2dnxn).

In fact, for diagonal matrices we immediately have the eigenvalues and eigenvectors! The eigenvalues are the diagonal entries di and the eigenvectors are the standard coordinate vectors ei:

Dei=diei.

Multiplying by a diagonal matrix is easy because any vector is a linear sum of coordinate vectors. The diagonal entries of D act separately on each of the components of the vector x.

Dx=D(x1e1++Dxnen)=x1De1++xnDen=(x1d1)e1++(xndn)en

What about general (non-diagonal) square matrices? If x is an eigenvector of a square matrix A and λ its eigenvalue, then we have

Ax=λx.

If x is an eigenvector, A behaves like a diagonal matrix. Instead of writing x as a sum of coordinate vectors ei, we write it as a sum of eigenvectors vi:

x=a1v1++anvn,

Then

Ax=a1λ1v1++anλnvn.

We just need to find the values ai. It turns out that these are just the inverse of the matrix of eigenvectors, as we will see in the next section.

21.2. Diagonalisation A=XΛX1

Matrix Diagonalisation

Let A be an (n×n) matrix with n eigenvectors vi and corresponding eigenvalues λi. Let

X=(||v1vn||)

be the matrix of eigenvectors and

Λ=(λ1λn)

the diagonal matrix of eigenvalues. If X is invertible, then

X1AX=Λ

and

A=XΛX1.

Note that we use capital lambda Λ to represent the matrix of eigenvalues.

To see why the above result is true, note X1AX=Λ and A=XΛX1 are both exactly equivalent to:

AX=XΛ.

Then the left hand side AX is A times the eigenvectors:

AX=A(||v1vn||)=(||λ1v1λnvn||)

because Avi=λivi.

Whereas right hand side XΛ is the eigenvectors times the eigenvalues:

XΛ=(||v1vn||)(λ1λn)=(||λ1v1λnxn||).

Thus we see that AX=XΛ.

Example

Diagonalise the (2×2) matrix

A=(0110).

Solution

This matrix represents a reflection in the line y=x so we can immediately write down two eigenvalues and corresponding eigenvectors:

λ1=1,v1=(11),λ2=1,v2=(11).

Therefore we can write the matrix of eigenvectors X and matrix of eigenvalues Λ:

X=(1111),Λ=(1001).

To complete the diagonalisation we need to calculate X1:

X1=1det

Then

\begin{split}A = X\Lambda X^{-1} = \begin{pmatrix}1 & 1\\-1&1\end{pmatrix}\begin{pmatrix}1&0\\0&-1\end{pmatrix}\begin{pmatrix}\frac{1}{2}&-\frac{1}{2}\\\frac{1}{2}&\frac{1}{2}\end{pmatrix}.\end{split}

The matrix of eigenvectors X^{-1} is also called the change of basis matrix. Multiplying a vector x by X^{-1} transforms it from standard coordinates e_1, \ldots, e_n to eigenvector coordinates v_1, \ldots, v_n.

Suppose x is written as a sum of eigenvectors with coefficients a_i:

\begin{split}x = a_1v_1 + \cdots + a_nv_n = X\begin{pmatrix}a_1\\\vdots\\a_n\end{pmatrix}\end{split}

then left-multiplying by X^{-1} gives the vector of coefficients a_i:

(21.1)\begin{split}\begin{pmatrix}a_1\\\vdots\\a_n\end{pmatrix}= X^{-1}x.\end{split}

Note

Diagonalisation is not unique

1. If we write the eigenvalues and eigenvectors in a different order, we get a different matrix X. Swapping v_1 and v_2 we get another way to write (21.1):

\begin{split}A = \begin{pmatrix}1 & 1\\1&-1\end{pmatrix}\begin{pmatrix}-1&0\\0&1\end{pmatrix}\begin{pmatrix}\frac{1}{2}&\frac{1}{2}\\\frac{1}{2}&-\frac{1}{2}\end{pmatrix}.\end{split}

However it is important that the order of the eigenvalues is the same as the order of the eigenvectors. If you swap the eigenvectors, you must remember to also swap the eigenvalues!

2. Any eigenvector can be multiplied by a constant. For example replacing v_1 by 2v_1 in (21.1):

\begin{split}A = \begin{pmatrix}2 & 1\\-2&1\end{pmatrix}\begin{pmatrix}1&0\\0&-1\end{pmatrix}\begin{pmatrix}\frac{1}{4}&-\frac{1}{4}\\\frac{1}{2}&\frac{1}{2}\end{pmatrix}.\end{split}

3. The eigenvalues are unique, although different diagonalisations may result in a different order.

Exercise 21.1

Diagonalise the matrix

\begin{split}A = \begin{pmatrix}\frac{1}{2}&\frac{3}{2}\\\frac{3}{2}&\frac{1}{2}\end{pmatrix}.\end{split}

21.3. About Diagonalisation

Not all matrices can be diagonalised. To diagonalise a matrix, the matrix of eigenvectors X must be invertible. In this section we will investigate conditions under which this is the case.

Recall that if v is an eigenvector then so is any multiple av. Suppose a (2 \times 2) matrix has eigenvector v. Then the vector 2v is also an eigenvector, but the eigenvector matrix

\begin{split}X = \begin{pmatrix}|&|\\v&2v\\|&|\end{pmatrix}\end{split}

is not invertible since the second column is a multiple of the first (and therefore its determinant is zero).

We can extend this idea to (3 \times 3) matrices. Suppose we have two independent eigenvectors v_2 and v_2 and a third eigenvector v_3 such that v_3 = av_1 + bv_2 for some scalars a and b. Then the matrix of eigenvectors

\begin{split}X = \begin{pmatrix}|&|&|\\v_1&v_2&v_3\\|&|&|\end{pmatrix}\end{split}

is not invertible. To extend this idea to the general case, we need to introduce the important concept of linear independence.

21.4. Linear Independence

A set of vectors is linearly dependent if one vector can be written as a linear sum of the other vectors.

Definition

Let v_1, \ldots, v_n \in \mathbb{R}^m be a set of vectors.

The vectors are linearly independent if the equation

a_1v_1 + \cdots + a_nv_n = 0

has only the trivial solution

a_1 = \cdots = a_n = 0.

Otherwise, we say the vectors are linearly dependent.

Exercise 21.2

Show that a set of vectors is linearly dependent if (and only if) one of the vectors is a linear sum of the others.

If we have n linearly dependent vectors v_i \in \mathbb{R}^n then it is easy to show that the matrix

\begin{split}X = \begin{pmatrix}|&&|\\v_1&\cdots &v_n\\|&&|\end{pmatrix}\end{split}

is not invertible. Therefore we can extend the invertible matrix theorem with two more equivalent conditions for invertibility:

Invertible Matrix Theorem (II)

Let A be an (n \times n) matrix. Then the following statements are equivalent:

  1. A is invertible.

  2. \mathrm{det}(A) \neq 0.

  3. A has n pivots.

  4. The null space of A is 0.

  5. Ax=b has a unique solution for every b \in \mathbb{R}^n.

  6. The columns of A are linearly independent.

  7. The rows of A are linearly independent.

21.5. More Eigenvalues

Theorem

Eigenvectors v_1, \ldots, v_n corresponding to distinct eigenvalues are linearly independent.

An (n \times n) matrix with n distinct eigenvalues is diagonalisable.

To prove this, suppose that v_1 and v_2 are linearly dependent eigenvectors of the matrix A with distinct eigenvalues \lambda_1 and \lambda_2.

(21.2)v_2 = av_1

for some a\neq 0.

Multiply by A:

\lambda_2v_2 = a\lambda_1v_1

then divide by \lambda_2 (since they are distinct we can assume that at least one of the eigenvalues is nonzero).

(21.3)v_2 = \frac{a\lambda_1}{\lambda_2}v_2.

Comparing (21.2) and (21.3) we see that \lambda_1/\lambda_2=1 and so

\lambda_1 = \lambda_2.

We have shown that two linearly dependent eigenvectors must have identical eigenvalues. We will not show it here, but it is not difficult to extend this to the general case: eigenvectors from distinct eigenvalues are linearly independent.

We can conclude that if an (n \times n) matrix has n distinct eigenvalues then it has n linearly independent eigenvectors. By the invertible matrix theorem we can therefore can conclude that its matrix of eigenvectors X is invertible, and the matrix is invertible.

Example

Determine the characteristic equation of each of the following matrices and identify which are diagonalisable:

\begin{split}A = \begin{pmatrix}0&-1\\-1&0\end{pmatrix}, \quad B=\begin{pmatrix}1&1\\1&1\end{pmatrix}, \quad C=\begin{pmatrix}1&1\\0&1\end{pmatrix}.\end{split}

Solution

A has characteristic polynomial \lambda^2 - 1 = (\lambda-1)(\lambda+1). It has two distinct eigenvalues \lambda_1=1,\lambda_2=-1 and therefore two independent eigenvectors. It is diagonalisable and we already calculated A = X\Lambda X^{-1} (21.1).

B has characteristic polynomial \lambda^2-2\lambda = \lambda(\lambda - 2). It has two distinct eigenvalues \lambda_1=0,\lambda_2=2 and therefore has two independent eigenvectors and is diagonalisable.

C has characteristic polynomial \lambda^2-2\lambda+1 = (\lambda-1)^2. It has a single eigenvalue \lambda_1=1. To determine if C is diagonalisable, we need to check if there are two independent eigenvectors in the eigenspace of \lambda_1.

\begin{split}\mathrm{Null}(C - \lambda_1I) = \mathrm{Null}\begin{pmatrix}0&1\\0&0\end{pmatrix} = \begin{pmatrix}1\\0\end{pmatrix}.\end{split}

C has only one linearly independent eigenvector and therefore it is not diagonalisable.

Exercise 21.3

  1. Diagonalise the matrix B in the example above.

  2. Find a diagonalisable matrix with only one distinct eigenvalue.

Attention

Diagonalisability is not related to invertibility. Non-invertible matrices can be diagonalisable, as in B above. Likewise, non-diagonalisable matrices can be invertable as in C above.

21.6. Algebraic and Geometric Multiplicity

The eigenvalues of a (n \times n) square matrix are the roots of its characteristic polynomial f(\lambda). We have already seen that if there are n distinct roots then there are n linearly independent eigenvectors and the matrix is diagonalisable. For example a (3 \times 3) matrix with the following characteristic polynomial is diagonalisable.

f(\lambda) = (\lambda-1)(\lambda-2)(\lambda-3)

If there are fewer then n distinct roots then at least one of the roots must be repeated. For example, the characteristic polynomial

(21.4)f(\lambda) =(\lambda-2)(\lambda-1)^2

results in two distinct eigenvalues \lambda_1=2 and \lambda_2=1. \lambda_2 is a repeated root with multiplicity 2 since the factor (\lambda-1) divides the polynomial twice. To determine whether the matrix is diagonalisable, we need to determine how many independent eigenvectors there are in the eigenspace of each of the eigenvectors.

Theorem

Let A be a square matrix and \lambda an eigenvalue of A. Then,

geometric multiplicity of \lambda \leq algebraic multiplicity of \lambda

where the algebraic multiplicity of \lambda is its multiplicity as a root of the characteristic polynomial of A and the geometric multiplicity of \lambda is the number of independent eigenvectors in the eigenspace of \lambda.

This means that for (21.4) there could be one or two independent eigenvectors in the eigenspace of \lambda_2. If there are two, then the matrix is diagonalisable; if only one then it is not diagonalisable. To check, we have to calculate the null space of A-\lambda_2I.

21.7. Matrix Powers A^k

The eigenvector matrix X produces A=X\Lambda X^{-1}. This factorisation is useful for computing powers because X^{-1} multiplies with X to get I:

\begin{split}\begin{align*}A^2 &= X\Lambda X^{-1}X\Lambda X^{-1} = X\Lambda I\Lambda X^{-1}=X\Lambda^2X^{-1}\\ A^3 & = X\Lambda^2X^{-1}X\Lambda X^{-1} = X\Lambda^3X^{-1}\end{align*}\end{split}

and so on. Because \Lambda is a diagonal matrix, its powers \Lambda^k are easy to calculate.

Theorem

Let A be a diagonalisable square matrix with A = X\Lambda X^{-1} and k\in \mathbb{N}. Then

\begin{split}A^k = X\Lambda^k X^{-1} = X\begin{pmatrix}\lambda_1^k&&\\&\ddots&\\&&\lambda_n^k \end{pmatrix}X^{-1}.\end{split}

21.8. Complex Eigenvalues

We have seen that from a square matrix A we can calculate the characteristic polynomial f(\lambda). For an (n \times n) matrix the polynomial is degree n:

f(\lambda) = \lambda^n + a_{n-1}\lambda^{n-1} + \cdots + a_1\lambda + a_0

where a_i are real numbers.

By the fundamental theorem of algebra, f(\lambda) can be factorised into n factors \lambda - \lambda_i (some of which may be repeated):

f(\lambda) = (\lambda - \lambda_1)(\lambda - \lambda_2) \cdots (\lambda - \lambda_n).

The roots \lambda_i are the eigenvalues of the matrix.

In this section we consider the case where some of the roots are not real numbers.

21.8.1. Rotations in 2D

Let A be the matrix of an anticlockwise rotation \pi/2 around the origin:

\begin{split}A = \begin{pmatrix}0&-1\\1&0\end{pmatrix}.\end{split}

The characteristic polynomials is \det(A-\lambda I) which equals

\begin{split}\begin{vmatrix}-\lambda&-1\\1&-\lambda\end{vmatrix} = \lambda^2+1.\end{split}

The polynomial \lambda^2+1 does not have real roots. Its roots are \pm i where i is the imaginary number \sqrt{-1}:

\lambda^2+1 = (\lambda+i)(\lambda-i)

resulting in two complex eigenvalues \lambda_1 = i and \lambda_2= -i.

We also find that the eigenvectors contain the imaginary number i:

\begin{split}A - \lambda_1I = \begin{pmatrix}-i&-1\\1&-i\end{pmatrix} \underrightarrow{\mathrm{~RREF~}} \begin{pmatrix}1&-i\\0&0\end{pmatrix}\end{split}

and hence the eigenvector corresponding to the eigenvalue \lambda_1 = i is

\begin{split}v_1 = \begin{pmatrix}i\\1\end{pmatrix}.\end{split}

Likewise the eigenspace corresponding to the eigenvalue \lambda_2 = -i is

\begin{split}v_2 = \begin{pmatrix}-i\\1\end{pmatrix}.\end{split}

Example

Find the eigenvalues of an anticlockwise rotation by an angle \theta.

Solution

The matrix

\begin{split}A = \begin{pmatrix}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{pmatrix}\end{split}

represents an anticlockwise rotation by an angle \theta. The characteristic polynomial f(\lambda) is given by

\begin{split}\begin{align*}\det(A-\lambda I) &= \begin{vmatrix}\cos\theta-\lambda&-\sin\theta\\\sin\theta&\cos\theta-\lambda\end{vmatrix}\\ &= (\cos\theta - \lambda)^2+\sin^2\theta.\end{align*}\end{split}

Setting this to zero and solving for \lambda gives eigenvalues

\lambda = \cos\theta\pm i\sin\theta = e^{\pm i\theta}.

Exercise 21.4

Find the complex eigenvectors corresponding to the two complex eigenvalues of

\begin{split}A = \begin{pmatrix}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{pmatrix}.\end{split}

21.9. Trace and Determinant

Calculating eigenvalues is (in general) a difficult problem. However, in some cases we can use some ‘tricks’ to help find them.

Definition

The trace of a matrix is the sum of the diagonal entries. Given a matrix A with entries a_{ij}:

\mathrm{tr}(A) = a_{11} + \cdots +a_{nn}.

Theorem

Let A be an (n \times n) matrix with eigenvalues \lambda_1, \ldots, \lambda_n. Then

\lambda_1 + \cdots + \lambda_n = \mathrm{tr}(A)

and

\lambda_1\lambda_2 \cdots \lambda_n = \det(A).

The sum of the eigenvalues is the sum of the diagonal entries of A. The product of the eigenvalues is the determinant of A.

Example

Calculate the determinant of the matrix

\begin{split}A = \begin{pmatrix}1&1&1\\1&1&1\\1&1&1\end{pmatrix}.\end{split}

Solution

This matrix is clearly not invertible, and so has zero determinant and at least one zero eigenvalue.

In fact there are two independent eigenvectors in the zero eigenspace (check this!). This means that we have \lambda_1=\lambda_2=0. To find the third eigenvalue, use the identity

\lambda_1 + \lambda_2 + \lambda_3 = \mathrm{tr}(A)

to determine that \lambda_3=3.

Exercise 21.5

Show that the eigenvalues of a triangular matrix are its diagonal entries.

21.10. Solutions to Exercises

Solution to Exercise 21.1

First find the eigenvalues and eigenvectors of A. The characteristic polynomial is given by:

f(\lambda) = \det(A-\lambda I) = \lambda^2-\lambda-2 = (\lambda+1)(\lambda-2)

therefore the two eigenvalues are \lambda_1 = -1 and \lambda_2=2.

To find the corresponding eigenvectors, calculate the nullspace of A - \lambda I for each eigenvalue. For \lambda_1:

\begin{split}A +1I = \begin{pmatrix}\frac{3}{2}&\frac{3}{2}\\\frac{3}{2}&\frac{3}{2}\end{pmatrix} \underrightarrow{\mathrm{~RREF~}} \begin{pmatrix}1&1\\0&0\end{pmatrix}\end{split}

Therefore v_1 = \begin{pmatrix}-1\\1\end{pmatrix} is an eigenvector corresponding to eigenvalue \lambda_1=-1.

For \lambda_2:

\begin{split}A - 2I = \begin{pmatrix}-\frac{3}{2}&\frac{3}{2}\\\frac{3}{2}&-\frac{3}{2}\end{pmatrix} \underrightarrow{\mathrm{~RREF~}} \begin{pmatrix}1&-1\\0&0\end{pmatrix}\end{split}

Therefore v_2 = \begin{pmatrix}1\\1\end{pmatrix} is an eigenvector corresponding to eigenvalue \lambda_2=2.

The matrix of eigenvectors is

\begin{split}X = \begin{pmatrix}|&|\\v_1&v_2\\|&|\end{pmatrix} = \begin{pmatrix}-1&1\\1&1\end{pmatrix}\end{split}

and its inverse

\begin{split}X^{-1} = \begin{pmatrix}-\frac{1}{2}&\frac{1}{2}\\\frac{1}{2}&\frac{1}{2}\end{pmatrix}.\end{split}

The matrix of eigenvalues is

\begin{split}\Lambda = \begin{pmatrix}-1&0\\0&2\end{pmatrix}\end{split}

and so

\begin{split}A = X\Lambda X^{-1} = \begin{pmatrix}-1&1\\1&1\end{pmatrix}\begin{pmatrix}-1&0\\0&2\end{pmatrix}\begin{pmatrix}-\frac{1}{2}&\frac{1}{2}\\\frac{1}{2}&\frac{1}{2}\end{pmatrix}.\end{split}

Solution to Exercise 21.3

1. Eigenvalues are \lambda_1 = 0, \lambda_2=1.

B-\lambda_1I = \begin{pmatrix}1&1\\1&1\end{pmatrix}\underrightarrow{\mathrm{~RREF~}}\begin{pmatrix}1&1\\0&0\end{pmatrix}

Therefore v_1 = \begin{pmatrix}1\\-1\end{pmatrix}.

B-\lambda_2I = \begin{pmatrix}-1&1\\1&-1\end{pmatrix}\underrightarrow{\mathrm{~RREF~}}\begin{pmatrix}-1&1\\0&0\end{pmatrix}

Therefore v_2 = \begin{pmatrix}1\\1\end{pmatrix}.

\begin{split}X = \begin{pmatrix}1&1\\-1&1\end{pmatrix}\end{split}

and

\begin{split}X^{-1} = \begin{pmatrix}\frac{1}{2}&-\frac{1}{2}\\\frac{1}{2}&\frac{1}{2}\end{pmatrix}.\end{split}
\begin{split}B = X\Lambda X^{-1} = \begin{pmatrix}1&1\\-1&1\end{pmatrix}\begin{pmatrix}0&0\\0&1\end{pmatrix}\begin{pmatrix}\frac{1}{2}&-\frac{1}{2}\\\frac{1}{2}&\frac{1}{2}\end{pmatrix}.\end{split}

The eigenvalue matrix \Lambda has a zero on the diagonal because one of the eigenvalues is zero.

2. A matrix of the form

\begin{split}\begin{pmatrix}a&0\\0&a\end{pmatrix}\end{split}

has characteristic polynomial

(a-\lambda)^2

and therefore a single repeated eigenvalue \lambda=a.

But it is diagonalisable (it is diagonalised by the identity matrix). In fact, every vector v \in \mathbb{R}^2 is an eigenvector of the matrix.