Eigenstuff

A non-zero vector v is an eigenvector of a square matrix A with eigenvalue \lambda \in \mathbb{C} if

A v = \lambda v

An eigenvector is a representative from a vector subspace in which any vector is simply stretched (scalar multiplied) by a factor \lambda when multiplied by A. So it makes more sense to think of "eigenspaces" rather than eigenvectors, but that terminology is less universal.


Eigenstuff Example

It is relatively easy to visualize the behavior of 2 by 2 matrix transformations, and these exhibit most of what can happen in higher dimensions. Consider the particular example of a matrix

A = \left(\begin{array}{rr} -3.0 & 3.0 \\ 1.0 & 2.0 \end{array}\right)

and its associated transformation of 2-d vectors x \rightarrow Ax. We can visualize this mapping by looking at what happens to points on the unit circle. These have been colored with a loop in color space in the figure below. The image of these points is an ellipse, with each image point shown colored in the same way as its preimage. The eigenlines (spans of each eigenvector) are shown in gray. In this case, the eigenvalues are \frac{-1 \pm \sqrt{37}}{2} \approx -3.54, 2.54. By inspecting the colors of the domain and image circles, can you see which of the eigenlines corresponds to the negative eigenvalue?

Eigenspaces
Image of a unit circle and its eigenlines.

The first time most people learn about eigenstuff (my general term for eigenvalues, eigenvectors, and eigenspaces) they are taught to compute the eigenvalues from the characteristic equation

\det ( A - \lambda I) = 0

and then for each eigenvalue \lambda we can compute the kernel (nullpace) of A - \lambda I to find the eigenvectors.

This process is a numerically horrible way to compute the eigenstuff of larger matrices, because of the possible loss of precision in computing the eigenvalues as roots of a polynomial. We will consider some other methods later in this section.

For now we will review the previous method on some simple examples.

Eigenvalue example

We will use the characteristic equation method to compute the eigenvalues and eigenvectors of the matrix

A = \left(\begin{array}{rrr} 2 & 4 & 4 \\ -2 & -2 & -5 \\ 0 & 0 & 3 \end{array}\right)

The characteristic equation is

\det (A - \lambda I) = \det \left(\begin{array}{rrr} 2 - \lambda & 4 & 4 \\ -2 & -2- \lambda & -5 \\ 0 & 0 & 3- \lambda \end{array}\right)
= (2 - \lambda) \det \left ( \begin{array}{cc} -2- \lambda & -5 \\ 0 & 3- \lambda \end{array}\right) - (-2- \lambda ) \det \left(\begin{array}{cc} 2- \lambda & 4 \\ 0 & 3- \lambda \end{array}\right) + (3- \lambda ) \det \left ( \begin{array}{cc} 2 - \lambda & 4 \\ -2 & -2 - \lambda \end{array} \right )

$$ = (3 - \lambda)(4 + \lambda^2) = 0 $$

so the eigenvalues are 3, 2i, and -2i.

Gershgorin circle theorem

The Gershgorin circle theorem restricts the location of the eigenvalues of an n by n matrix A; the weakest form of the theorem states that all of the eigenvalues of A must be contained in n disks in the complex plane whose centers are the diagonal elements A_{i,i} and with radii \sum_{j \neq i } |A_{i,j}|. We will show the proof of this theorem and then some ways it can be strengthened.

To prove it, consider an eigenvector v of A with eigenvalue \lambda. Eigenvectors are only defined up to nonzero scalar multiples, so without loss of generality we can assume that we have scaled v so that its largest entry in absolute magnitude is 1: i.e. v_i = 1 for some index i, and |v_j| \le 1 for all j. Now consider the ith entry of the relation Av = \lambda v:

(Av)_i = \sum_{j=1}^n A_{i,j} v_j = \lambda v_i = \lambda

Now we can split up the left-hand side of that equality to

\sum_{j=1}^n A_{i,j} v_j = A_{i,i} + \sum_{j\neq i} A_{i,j} v_j = A_{i,i} + z

We can now bound the magnitude of z (which is possibly complex):

|z| = | \sum_{j\neq i} A_{i,j} v_j | \le \sum_{j\neq i} |A_{i,j}| |v_j | \le \sum_{j\neq i} |A_{i,j}|

and we see that it must be less than or equal to the sum of the magnitudes of the non-diagonal entries of row i of A. This means that \lambda = A_{i,i} + z must be in a disk in the complex plane around the point A_{i,i} with that radius.

We can strengthen the theorem in two ways. The first is to note that the eigenvalues of A and A^T are the same, so we can apply the theorem to the row sums as well as the columns. The second is to exploit the fact that the eigenvalues are continuous functions of the matrix entries, and consider a homotopy from a diagonal matrix to the given matrix D(1-t) + At, where D has the same diagonal values as A. This implies that every connected component of the union of Gershgorin disks has a number of eigenvalues in it equal to the number of disks in that component (counting eigenvalues with their algebraic multiplicity).

Gershgorin circle theorem example

In the example below the matrix is

A = \left(\begin{array}{rrrr} 10 & 1 & 0 & 1 \\ -1 & 8 & 0 & 1 \\ 1 & 0 & 3 & 0 \\ -1 & 0 & 2 & 0 \end{array}\right)

and the disks with radii determined from the row sums are shaded blue, while the disks from the column sums are shaded light red. The actual eigenvalues are indicated with small black points.

Gershgorin example
Gershgorin theorem example.

Normality and the Schur factorization

A matrix which satisfies the property A^* A = A A^{*}, where A^{*} is the complex-conjugate transpose of A, is called normal. The class of normal matrices includes real-symmetric matrices (A^{T} = A), real-antisymmetric matrices (A^{T} = -A), real orthogonal matrices (A^{T} = A^{-1}) unitary matrices (A^{*} = A^{-1}), complex Hermitian matrices (A^{*} = A), and complex anti-Hermitian matrices (A^{*} = -A). Many of these arise frequently in applications. A very important property of normal matrices is:

A matrix A is unitarily diagonalizable if and only it is normal.

(A is unitarily diagonalizable if there exists a unitary matrix Q such that Q^T A Q = D where D is a diagonal matrix.)

Here is an example of a matrix which is normal, but is not symmetric or orthogonal or any of the other common sub-types mentioned above:

\left(\begin{array}{rrr} 2 & -1 & 0 \\ 0 & 2 & -1 \\ 1 & 0 & 2 \end{array}\right)

When A is normal it is easier to compute its eigenvalues. In the general case, the best we can do is the Schur factorization:

A = Q T Q^{*}

in which Q is unitary and T is upper-triangular. If we can compute a Schur factorization (or something close to it) of A, then the diagonal entries of T will be the eigenvalues.

Companion matrices

To any polynomial $$ p(x) = a_0 + a_1 x + \ldots + a_{n-1} x^{n-1} + x^n $$

we can associate a matrix, the companion matrix:

\mathcal{C}_{p(x)} = \begin{pmatrix} 0 & 0 & 0 & \dots & 0 & -a_0 \\ 1 & 0 & 0 & \dots & \dots & -a_1\\ 0 & 1 & 0 & \dots & \dots & -a_2 \\ 0 & 0 & \ddots & & & \vdots \\ \vdots & \vdots & & \ddots & & \vdots \\ 0 & 0 & \dots & \dots & 1 & -a_{n-1} \end{pmatrix}

whose eigenvalues are the roots of p(x). This means polynomial roots can be found from an eigenvalue computation. It also implies the bad news that finding eigenvalues is at least as difficult as computing the roots of polynomials. Thanks to the work of Ruffini, Galois, and Abel in the early 19th century we know there is no finite elementary formula for the roots of polynomials of degree 5 or greater, so we can not usually expect to be able to find eigenvalues exactly. Thus every general eigenvalue/eigenvector finding algorithm is iterative and approximate.

Rayleigh quotients

The Rayleigh quotient is an estimate of the eigenvalue of a matrix A, given an approximate eigenvector v:

R(v) = \frac{v^* A v}{v^* v}

The QR method for eigenvalues

The most basic QR method for finding eigenvalues is very simple. First we set A^{(0)} = A. At the kth step of the algorithm we compute a QR factorization A^{(k-1)} = Q^{(k)} R^{(k)}, and then set A^{(k)} = R^{(k)} Q^{(k)}. If you are lucky, this will converge to an upper-triangular A^{(k)}, or a diagonal A^{(k)} if A is normal.

The element of luck can be removed with some other techniques, such as shifting the eigenvalues with an offset A^{(k)} - \mu^{(k)} I for some choice of scalar \mu^{(k)} at each step.

Upper Hessenberg form

Before using the QR algorithm or its variants, the problem is made easier by first transforming a matrix A into its upper Hessenberg form with a unitary transformation:

A = Q H Q^{*}

where H is an upper Hessenberg matrix, meaning it is almost upper-triangular: H_{i,j} = 0 if i > j + 1. This can be done with Householder reflections.


Example of Computing Hessenberg Form

We will compute the upper Hessenberg form of the matrix

A = \left(\begin{array}{rrr} 1 & 1 & 1 \\ 1 & 1 & 2 \\ 1 & 2 & 3 \end{array}\right)

using Householder reflections. Unlike in the QR algorithm, we will be preserving the similarity class of the matrix by multiplying on both the right and left side. We use a reflection Q of the form

Q = \left(\begin{array}{rr} 1 & 0 \\ 0 & Q_1 \end{array}\right)

where Q_1 reflects all but the first entry of the first column onto a multiple of the first coordinate vector (in this case (1,0)^T. The portion of the column vector we want is (1,1)^T, and we want to reflect it to (\sqrt{2},0)^T. The difference between these is v = (\sqrt{2}-1,-1), and

Q_1 = I - 2 \frac{v \ v^T}{v^T v} = \left(\begin{array}{rr} -\frac{\sqrt{2} - 1}{\sqrt{2} - 2} & -\frac{\sqrt{2} - 1}{\sqrt{2} - 2} \\ -\frac{\sqrt{2} - 1}{\sqrt{2} - 2} & \frac{\sqrt{2} - 1}{\sqrt{2} - 2} \end{array}\right)

Then we obtain:

Q A Q^T = \left(\begin{array}{rrr} 1 & \sqrt{2} & 0 \\ \sqrt{2} & 4 & -1 \\ 0 & -1 & 0 \end{array} \right)

In this example, our starting matrix A was symmetric, so QAQ^T is both upper- and lower-Hessenberg (i.e. tridiagonal).


Eigenstuff Exercises