# Invarian Factors and Elementary Divisors

### 3.01 Elementary transformations.

By an elementary transformation of a matric polynomial a(λ) = ||a_{ij}|| is meant one of the following operations on the rows or columns.

Type I. The operation of adding to a row (column) a different row (column) multiplied by a scalar polynomial θ(λ).

Type II. The operation of interchanging two rows (columns).

Type III. The operation of multiplying a row (column) by a constant k ≠ 0.

These transformations can be performed algebraically as follows.

*Type I.* Let

P_{ij} = 1 + θ(λ)e_{ij} (i ≠ j),

θ(λ) being a scalar polynomial; then |P_{ij}| = 1 and

$P_{ij} a = \sum \limits_{p,q} a_{pq} e_{pq} + \theta \sum \limits_q a_{jq} e_{iq}$

which is the matrix derived from a(λ) by adding θ times the jth row to the ith. The corresponding operation on the columns is equivalent to forming the product aP_{ji}.

*Type II.* Let Q_{ij} be the matrix

Q_{ij} = 1 - e_{ii} — e_{jj} + e_{ij} + e_{ji} (i ≠ j)

that is, Q_{ij} is the matrix derived from the identity matrix by inserting 1 in place of 0 in the coefficients of e_{ij} and e_{ji} and 0 in place of 1 in the coefficients of e_{ii} and e_{jj} then |Q_{ij}| = — 1 and

$Q_{ij} a = \sum \limits_{p,q} a_{pq} e_{pq} - \sum \limits_q a_{iq} e_{iq} - \sum \limits_q a_{jq} e_{jq} + \sum \limits_q a_{jq} e_{iq} + \sum \limits_q a_{iq} e_{jq}$

that is, Q_{ij}a is derived from a by interchanging the ith and jth rows. Similarly aQ_{ij} is obtained by interchanging the ith and jth columns.

Since any permutation can be effected by a succession of transpositions, the corresponding transformation in the rows (columns) of a matrix can be produced by a succession of transformations of Type II.

*Type III.* This transformation is effected on the rth row (column) by multiplying on the left (right) by R = 1 + (k — l)e_{rr}; it is used only when it is convenient to make the leading coefficient in some term equal to 1.

The inverses of the matrices used in these transformations are

P^{-1}_{ij} = 1 - θe_{ij}, Q^{-1}_{ij} = Q_{ij}, R^{-1} = 1 + (k^{-1} - 1)e_{rr};

these inverses are elementary transformations. The transverses are also elementary since P'_{ij} = P_{ji} and Qa and R are symmetric.

A matric polynomial b(λ) which is derived from a(λ) by a sequence of elementary transformations is said to be equivalent to a(λ); every such polynomial has the form p(λ)a(λ)q(λ) where p and q are products of elementary transformations. Since the inverse of an elementary transformation is elementary, a(λ) is also equivalent to b(λ). Further, the inverses of p and q are polynomials so that these are what we have already called elementary polynomials; we shall see later that every elementary polynomial can be derived from 1 by a sequence of elementary transformations.

In the following sections we require two lemmas whose proofs are almost immediate.

**Lemma 1.** * The rank of a matrix is not altered by an elementary transformation. *

For if |P| ≠ 0, AP and PA have the same rank as A (§1.10).

**Lemma 2.** * The highest common factor of the coordinates of a matric polynomial is not altered by an elementary transformation.*

This follows immediately from the definition of elementary transformations.

### 3.02 The normal form of a matrix.

The theorem we shall prove in this section is as follows.

**Theorem 1.** * If a(λ) is a matric polynomial of rank r, it can be reduced by elementary transformations to a diagonal matrix
$(1) \ \ \ \ \ \ \sum \limits_i^r a_i(\lambda) e_{ji} = \begin{matrix} \alpha_1(\lambda) & & & & & & & & & & \\ & \alpha_2(\lambda) & & & & & & & & & \\ & & . & & & & & & & & \\ & & & . & & & & & & & \\ & & & & . & & & & & & \\ & & & & & \alpha_r(\lambda) & & & & & \\ & & & & & & 0 & & & & \\ & & & & & & & . & & & \\ & & & & & & & & . & & \\ & & & & & & & & & . & \\ & & & & & & & & & & 0 \end{matrix} = P(\lambda) a(\lambda) Q(\lambda)$,
where the coefficient of the highest power of λ in each polynomial α*

_{i}(λ) is 1, is a factor of α

_{i+1}, ...., α

_{r}(i = 1, 2,...., r — 1), and P(λ), Q(λ) are elementary polynomials.

We shall first show that, if the coordinate of a(λ) of minimum degree m, say a_{pq} is not a factor of every other coordinate, then a(λ) is equivalent to a matrix in which the degree of the coordinate of minimum degree is less than m.

Suppose that a_{pq} is not a factor of a_{pi} for some i; then we may set a_{pi} = βa_{pq} + a'_{pi} where 0 is integral and a_{pi} is not 0 and is of lower degree than m. Subtracting β times the ith column from the ith we have an equivalent matrix in which the coordinate (p, i) is a'_{pi} whose degree is less than m. The same reasoning applies if a_{pq} is not a factor of every coordinate a_{iq} in the qth column.

After a finite number of such steps we arrive at a matrix in which a coordinate of minimum degree, say k_{pq}, is a factor of all the coordinates which lie in the same row or column, but is possibly not a factor of some other coordinate k_{ij}. When this is so, let k_{pj} = βk_{pq}, k_{iq} = γk_{pq} where β and γ are integral. If we now add (1 — β) times the qth column to the jth, (p, j) and (i, j) become respectively

k'_{pj} = k_{pj} + (1 - β)k_{pq} = k_{pq}, k'_{ij} = k_{ij} + (1 - β)k_{iq} = k_{ij} + (1 - β)γk_{pq}.

Here either the degree of k'_{ij} is less than that of k_{pq}, or k'_{pj} has the minimum degree and is not a factor of k'_{ij} which lies in the same column, and hence the minimum degree can be lowered as above.

The process just described can be repeated so long as the coordinate of lowest degree is not a factor of every other coordinate and, since each step lowers the minimum degree, we derive in a finite number of steps a matrix ||b'_{ij}|| which is equivalent to a(λ) and in which the coordinate of minimum degree is in fact a divisor of every other coordinate; and further we may suppose that b'_{11} = a_{1}(λ) is a coordinate of minimum degree and set b'_{1i} = γ_{i}b'_{11}, b'_{j1} = δ_{j}b'_{11}. Subtracting γ_{i} times the first column from the ith and then δ_{j} times the first row from the jth (i, j = 2, 3,...., n) all the coordinates in the first row and column except b'_{11} become 0, and we have an equivalent matrix in the form

$(2) \ \ \ \ \ \ \ \ \begin{matrix} \alpha_1(\lambda) & 0 & 0 & ... & 0 \\ 0 & b_{22} & b_{23} & ... & b_{2n} \\ 0 & b_{32} & b_{33} & ... & b_{3n} \\ . & . & . & ... & . \\ 0 & b_{n2} & b_{n3} & ... & b_{nn} \end{matrix}$

in which α_{1} is a factor of every b_{ij}. The coefficient of the highest power of λ in α_{1} may be made 1 by a transformation of type III.

The theorem now follows readily by induction. For, assuming it is true for matrices of order n — 1, the matrix of this order formed by the b's in (2) can be reduced to the diagonal matrix

$\begin{matrix} \alpha_2(\lambda) & & & & & & & & & & \\ & \alpha_3(\lambda) & & & & & & & & & \\ & & . & & & & & & & & \\ & & & . & & & & & & & \\ & & & & . & & & & & & \\ & & & & & \alpha_s(\lambda) & & & & & \\ & & & & & & 0 & & & & \\ & & & & & & & . & & & \\ & & & & & & & & . & & \\ & & & & & & & & & . & \\ & & & & & & & & & & 0 \end{matrix}$

where the α's satisfy the conditions of the theorem and each has α_{1} as a factor (§3.01, Lemma 2). Moreover, the elementary transformations by which this reduction is carried out correspond to transformations affecting the last n — 1 rows and columns alone in (2) and, because of the zeros in the first row and column, these transformations when applied to (2) do not affect its first row and column; also, sinoe elementary transformations do not affect the rank (§3.01, Lemma 1), s equals r and a(X^ has therefore been reduced to the form required by the theorem.

The theorem is clearly true for matrices of order 1 and hence is true for any order.

*Corollary.* A matric polynomial whose determinant is independent of λ and is not 0, that is, an elementary polynomial, can be derived from 1 by the product of a finite number of elementary transformations.

The polynomials α_{i} are called the *invariant factors* of a(λ).

### 3.03 Determinantal and invariant factors.

The *determinantal factor* of the sth order, D_{s}, of a matric polynomial a(λ) is defined as the highest common factor of all minors of order s, the coefficient of the highest power of λ being taken as 1. An elementary transformation of type I either leaves a given minor unaltered or changes it into the sum of that minor and a multiple of another of the same order, and a transformation of type II simply permutes the minors of a given order among themselves, while one of type III merely multiplies a minor by a constant different from 0. Hence equivalent matrices have the same determinantal factors. Bearing this in mind we see immediately from the form of (1) that the determinantal factors of a(λ) are given by

D_{s} = α_{1},α_{2},...., α_{s} (s = 1, 2, • • ., r), D_{s} = 0 (s > r),

so that

α_{s} = D_{s}/D_{s-1}

The invariant factors are therefore known when the determinantal factors are given, and vice versa.

The definitions of this and the preceding sections have all been made relative to the fundamental basis. But we have seen in §1.08 that, if a_{1} is the matrix with the same array of coordinates as a but relative to another basis, then there exists a non-singular constant matrix b such that a = b^{-1}a_{1}b so that a and a_{1} are equivalent matrices. In terms of the new basis a_{1} has the same invariant factors as a does in terms of the old and a, being equivalent to a,sub>1, has therefore the same invariant factors in terms of the new basis as it has in the old. Hence the invariant and determinantal factors of a matric polynomial are independent of the (ccmstant) basis in terms of which its coordinates are expressed.

The results of this section may be summarized as follows.

**Theorem 2.** * Two matric polynomials are equivalent if, and only if, they have the same invariant factors.*

### 3.04 Non-singular linear polynomials.

In the case of linear polynomials Theorem 2 can be made more precise as follows.

**Theorem 3.** * If aλ + b and cλ + d are non-singular linear polynomials which have the same invariant factors, and if |c| ≠ 0, there exist non-singular constant matrices p and q such that
p(aλ + b)q = cλ + d. *

We have seen in Theorem 2 that there exist elementary polynomials P(λ), Q(λ) such that

(3) cλ + d = P(λ)(aλ + b)Q(λ).

Since |c| ≠ 0, we can employ the division transformation to find matric polynomials p_{1}, q_{1} and constant matrices p, q such that

P(λ) = (cλ + d)p_{1} + p, Q(λ) = q_{1}(cλ + d) + q.

Using this in (3) we have

(4) cλ + d = p(aλ + b)q + (cλ + d)p_{1}(aλ + b)Q + P(aλ + b)q_{1}(cλ + d)

- (cλ + d)p_{1}(aλ + b)q_{1}(cλ + d)

and, since from (3)

(aλ + b)Q = P^{-1}(cλ + d), P(aλ + b) = (cλ + d)Q^{-1},

we may write in place of (4)

(5) p(aλ + b)q = [1 - (cλ + d)(p_{1}P^{-1} + Q^{-1}q_{1} - p_{1}(aλ + b)q_{1})](cλ + d)

= [1 - (cλ + d)R](cλ + d)

where R = pP^{-1}_{1} + Q^{-1}q_{1} — p_{1}(aλ + b)q_{1} which is integral in λ since P and Q are elementary. If R ≠ 0, then, since |c| ≠ 0, the degree of the right side of (5) is at least 2, whereas the degree of the left side is only 1; hence R = 0 so that (5) gives p(aλ + b)q = cλ + d. Since cλ + d is not singular, neither p nor q can be singular, and hence the theorem is proved.

^{i}in the invariant factors of aλ + bμ which have no counterpart in those of aλ + b.

If |c| = 0 but |cλ + d| ≠ 0, there exist values, λ_{1} ≠ 0, μ_{1} such that |cλ_{1} + dμ_{1}| ≠ 0 and, if we make the transformation

(6) λ = λ_{1}α, μ = μ_{1}α + β,

aλ + bμ, cλ + dμ become a_{1}α + b_{1}β, c_{1}α + d_{1}β where a_{1} = aλ_{1} + bμ_{1}, C_{1} = cλ_{1} + dμ_{1}, and therefore |c| ≠ 0. Further, when a&lamnbda; + bμ and cλ + dμ have the same invariant factors, this is also true of a_{1}α + b_{1}β and c_{1}α + d_{1}η. Since |c_{1}| ≠ 0, the proof of Theorem 3 is applicable, so that there are constant non-singular matrices p, q for which p(a_{1}α + b_{1}β)q = c_{1}α + d_{1}β, and on reversing the substitution (6) we have

p(aλ + bμ)q = cλ + dμ.

Theorem 3 can therefore be extended as follows.

**Theorem 4.*** If the non-tingular polynomials aλ + bμ, cλ + dμ have the same invariant factors, there exist non-singular constant matrices p, q such that p(aλ + bμ)q = cλ + dμ.*

An important particular case of Theorem 3 arises when the polynomials have the form λ — b, λ — d. For if p(λ — b)q = λ — d, on equating coefficients we have pq = 1, pbq = d; hence b = p^{-1}dp, that is, b and d are similar. Conversely, if b and d are similar, then λ — b and λ — d are equivalent, and hence we have the following theorem.

**Theorem 5.** * Two constant matrices b, d are similar if, and only if,λ — b and λ — d have the same invariant factors.*

### 3.05 Elementary divisors.

If D = |aλ + b| is not identically zero and if λ_{1}, λ_{2},...., λ_{s} are its distinct roots, say

D = (λ - λ_{1})^{ν1}(λ - λ_{2})^{ν2}.....(λ - λ_{s})^{νs},

then the invariant factors of aλ + b, being factors of D, have the form

α_{1} = (λ - λ_{1})^{ν11}(λ - λ_{2})^{ν12}.....(λ - λ_{s})^{ν1s}

α_{2} = (λ - λ_{1})^{ν21}(λ - λ_{2})^{ν22}.....(λ - λ_{s})^{ν2s}

...................................................................................

α_{i} = (λ - λ_{1})^{νi1}(λ - λ_{2})^{νi2}.....(λ - λ_{s})^{νis}

...................................................................................

α_{n} = (λ - λ_{1})^{νn1}(λ - λ_{2})^{νn2}.....(λ - λ_{s})^{νns}

where $\sum \limits_{j=1}^r v_{ji = v_i}$ and, since α_{j} is a factor of α_{j+1},

(8) ν_{1i} ≤ ν_{2i} ≤ ..... ≤ ν_{ni} (i = 1, 2,....., s).

Such of the factors (λ — λ_{j})^{νij} as are not constants, that is, those for which ν_{ij} > 0, are called the *elementary divisors* of aλ + b. The elementary divisors of λ — b are also called the elementary divisors of b. When all the exponents ν_{ij} which are not 0 equal 1, b is said to have *simple* elementary divisors.

For some purposes the degrees of the elementary divisors are of more importance than the divisors themselves and, when this is the case, they are indicated by writing

(9) [(ν_{n1}, ν_{n-1,1},...., ν_{11}),(ν_{n2}, ν_{n-1,2},...., ν_{12}).....]

where exponents belonging to the same linear factor are in the same parenthesis, zero exponents being omitted; (9) is sometimes called the characteristic of aλ + b. If a root, say λ_{1}, is zero, it is convenient to indicate this by writing ν^{0}_{i1} in place of ν_{i1}.

The maximum degree of |aλ + b| is n and therefore $\sum \limits_{i,j} v_{ij} \leq n$ where the equality sign holds only when |a| ≠ 0.

The modifications necessary when the homogeneous polynomial aλ + bμ is taken in place of aλ + b are obvious and are left to the reader.

### 3.06 Matrices with given elementary divisors.

The direct investigation of the form of a matrix with given elementary divisors is somewhat tedious. It can be carried out in a variety of ways; but, since the form once found is easily verified, we shall here state this form and give the verification, merely saying in passing that it is suggested by the results of §2.07 together with a study of a matrix whose reduced characteristic function is (λ — λ_{1})^{ν}.

**Theorem 6.** * If λ _{1}, λ_{2},...., λ_{s} are any constants_{y} not necessarily all different and ν_{1}, ν_{2},...., ν_{s} are positive integers whose sum is n, and if a_{i} is the array of ν_{i} rows and columns given by
*

(10) $\begin{matrix} \lambda_i & 1 & 0 & ... & 0 & 0 \\ 0 & \lambda_i & 1 & ... & 0 & 0 \\ . & . & . & ... & . & . \\ . & . & . & ... & . & . \\ 0 & 0 & 0 & ... & \lambda_i & 1 \\ 0 & 0 & 0 & ... & 0 & \lambda_i \end{matrix}$

where each coordinate on the main diagonal equals λ_{i}, those on the parallel on its right are 1, and the remaining ones are 0, and if a is the matrix of n rows and columns given by

$(11) \ \ \ \ \ \ \ a = \begin{Vmatrix} a_1 & & & & & \\ & a_2 & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & a_s \end{Vmatrix}$

composed of blocks of terms defined by (10) arranged so thai the main diagonal of each lies on the main diagonal of a, the other coordinates being 0, then λ — a has the elementary divisors

(12) (λ - λ_{1})^{ν1}, (λ - λ_{2})^{ν2}, ..., (λ - λ_{s})^{νs}

In addition to using a_{i} to denote the block given in (10) we shall also use it for the matrix having this block in the position indicated in (11) and zeros elsewhere. In the same way, if ƒ_{i} is a block with ν_{i} rows and columns with 1's in the main diagonal and zeros elsewhere, we may also use ƒ_{i} for the corresponding matrix of order n. We can then write

λ - a = ∑(λƒ_{i} - a_{i}), ƒ_{i}a = a_{i} = aƒ_{i}, ∑ƒ_{i} = 1.

The block of terms corresponding to λƒ_{i} — a_{i} has then the form

$(13) \ \ \ \ \ \ \ \begin{matrix} \lambda - \lambda_i & -1 & & & & \\ & & \lambda - \lambda_i & -1 & & \\ & & . & . & & \\ & & & . & . & \\ & & & & . & -1 \\ & & & & & \lambda - \lambda_i \end{matrix} (v_i\ rows\ and\ columns)$

where only the non-zero terms are indicated. The determinant of these ν_{i} rows and columns is (λ — λ_{i})^{νi} and this determinant has a first minor equal to ±1; the invariant factors of λƒ_{i} — a_{i}, regarded as a matrix of order ν_{i} are therefore 1,1,....., 1, (λ — λ_{i})^{νi} and hence it oan be reduced by elementary transformation to the diagonal form

$\begin{matrix} (\lambda - \lambda_i)^{v_i} & & & & & \\ & 1 & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & 1 \end{matrix}$

If we apply the same elementary transformations to the corresponding rows and columns of λ — a, the effect is the same as regards the block of terms λƒ_{i} — a_{i} (corresponding to a_{i} in (11)) since all the other coordinates in the rows and columns which contain elements of this block are 0; moreover these transformations do not affect the remaining blocks λƒ_{j}- — a_{j}- (j ≠ i) nor any 0 coordinate. Carrying out this process for i = 1, 2,...., s and permuting rows and columns, if necessary, we arrive at the form

$\begin{matrix} (\lambda - \lambda_1)^v & & & & & & & & & & \\ & (\lambda - \lambda_2)^{v_2} & & & & & & & & & \\ & & . & & & & & & & & \\ & & & . & & & & & & & \\ & & & & . & & & & & & \\ & & & & & (\lambda - \lambda_s)^{v_s} & & & & & \\ & & & & & & 1 & & & & \\ & & & & & & & . & & & \\ & & & & & & & & . & & \\ & & & & & & & & & . & \\ & & & & & & & & & & 1 \end{matrix}$

Suppose now that the notation is so arranged that

λ_{1} = λ_{2} = ..... = λ_{p} = α, ν_{1} ≥ ν_{2} ≥ ..... ≥ ν_{p},

but λ_{i} &ne α for i > p. The nth determinantal factor D_{n} then contains (λ — a) to the power $\sum \limits_1^p v_i$ exactly. Each minor of order n — 1 contains at least p — 1 of the factors

(14) (λ - α)^{ν1}, (λ - α)^{ν2},....., (λ - α)^{νp}

and in one the highest power (λ — α)^{ν} is lacking; hence D_{n-1} contains (λ — a) to exactly the power $\sum \limits_2^p v_i$ and hence the nth invariant factor α_{n} contains it to exactly the ν_{1}>ith power. Similarly the minors of order n — 2 each contain at least p — 2 of the factors (14) and one lacks the two factors of highest degree; hence (λ — α) is contained in D_{n-2} to exactly the power $\sum \limits_3^p v_i$ and in α_{n-1} to the power ν_{2}. Continuing in this way we see that (14) gives the elementary divisors of a which are powers of (λ — α) and, treating the other roots in the same way, we see that the complete list of elementary divisors is given by (12) as required by the theorem.

### 3.07

If A is a matrix with the same elementary divisors as a, it follows from Theorem 5 that there is a matrix P such that A = PaP^{-1} and hence, if we choose in place of the fundamental basis (e_{1}, e_{2},....., e_{n}) the basis (Pe_{1}, Pe_{2},...., Pe_{n}), it follows from Theorem 6 of chapter 1 that (11) gives the form of A relative to the new basis. This form is called the canonical form of A. It follows immediately from this that

$(15) \ \ \ \ \ \ \ \ P^{-1} A^k P = \begin{Vmatrix} a_1^k & & & & & \\ & a_2^k & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & a_s^k \end{Vmatrix}$

where a^{k}_{i} is the block of terms derived by forming the kth power of a_{i} regarded as a matrix of order ν_{i}.

Since D_{n} equals |λ — a|, it is the characteristic function of a (or A) and, since D_{n-1} is the highest common factor of the first minors, it follows from Theorem 3 of chapter 2 that α_{n} is the reduced characteristic function.

If we add ƒ's together in groups each group consisting of all the ƒ's that correspond to the same value of λ_{i}, we get a set of idempotent matrices, say φ_{1}, φ_{2},...., φ_{n} corresponding to the distinct roots of a, say α_{1}, α_{2},...., α_{r}. These are the principal idempotent elements of a; for (i) aφ_{i} = φ_{i}a, (ii) (a — α_{i})φ_{i} is nilpotent, (iii) ∑φ_{i} = ∑ƒ_{i} = 1 and φ_{i}φ_{j} = 0 (i ≠ j) so that the conditions of §2.11 are satisfied.

When the same root α_{i} occurs in several elementary divisors, the corresponding ƒ's are called *partial idempotent elements* of a; they are not unique as is seen immediately by taking a = 1.

If α is one of the roots of A, the form of A — α is sometimes important. Suppose that λ_{1} = λ_{2} = ..... = λ_{p} = α, λ_{i} ≠ α (i > p) and set

b_{i} = a_{i} — αƒ_{i}

the corresponding array in the t'th block of a — α (cf. (10), (11)) being

$(16) \ \ \ \ \ \ \ \ \ \ \begin{matrix} \lambda_i - \alpha & 1 & & & & \\ & \lambda_i - \alpha & 1 & & & \\ & & . & . & & \\ & & & . & . & \\ & & & & . & 1 \\ & & & & & \lambda_i - \alpha \end{matrix}$

In the case of the first p blocks λ_{i} — α = 0 and the corresponding b_{1}, b_{2},...., b_{p} are nilpotent, the index of b_{i} being ν_{i} and, assuming ν_{1} ≥ ν_{2} ≥ .... ≥ ν_{p} as before, (A — α)^{k} has the form

$P^{-1} (A - \alpha)^k P = \begin{Vmatrix} b_1^k & & & & & \\ & b_2^k & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & b_s^k \end{Vmatrix}$

or, when k ≥ ν_{1},

$(17) \ \ \ \ \ \ \ P^{-1} (A - \alpha)^k P = \begin{Vmatrix} 0 & & & & & & & & & \\ & . & & & & & & & & \\ & & . & & & & & & & \\ & & & . & & & & & & \\ & & & & 0 & & & & & \\ & & & & & b_{p+1}^k & & & & \\ & & & & & & . & & & \\ & & & & & & & . & & \\ & & & & & & & & . & \\ & & & & & & & & & b_s^k \end{Vmatrix} $

Since none of the diagonal coordinates of b_{p+1},....., b_{s} are 0, the rank of (A — α)^{k}, when k ≥ ν_{1} is exactly n — $\sum \limits_1^p v_i = \sum \limits_{p+1}^s v_i$ and the nullspace of (A — α)^{k} is then the same as that of (A — α)^{ν1}. Hence, if there exists a vector z such that (A — α)^{k}z = 0 but (A — α)^{k-1} z ≠ 0, then (i) k ≤ ν_{1}, (ii) z lies in the nullspace of (A — α)^{k}.

### 3.08 Invariant vectors.

If A is a matrix with the elementary divisors given in the statement of Theorem 6, then λ — A is equivalent to λ — a and by Theorem 5 there is a non-singular matrix P such that A = PaP^{-1}. If we denote the unit vectors corresponding to the rows and columns of a_{i} in (10) by e^{i}_{1}, e^{i}_{2},...., e^{i}_{νi} and set

$(18) \ \ \ \ \ x_j^i = \left\{ \begin{array}{l l} P e_j^i & (j = 1, 2, ... , v_i; i = 1, 2, ... , s) \\ 0 & (j < 1 \ or \ > v_i; \ or \ i < 1 \ or > s)\\ \end{array} \right. \]$

then

ae^{i}_{1} = λ^{i}_{1}, ae^{i}_{2} = λ_{i}e^{i}_{2} + e^{i}_{1},....., ae^{i}_{νi} = λ_{i}e^{i}_{νi} + e^{i}_{νi}-1

and hence

(19) Ax^{i}_{j} = λ_{i}x^{i}_{j} + x^{i}_{j-1}, (j = 1, 2,...., ν_{i}; i = 1, 2,....., s).

The vectors x^{i}_{j} are called a set of *invariant vectors* of A.

The matrix A can be expressed in terms of its invariant vectors as follows. We have from (10)

$a_i = \sum \limits_i (\lambda_i e_j^i + e_{j-1}^i ) S e_j^i = \sum \limits_i e_j^i S (\lambda_i e_j^i + e_{j +1}^i )$

and hence, if

(20) y^{i}_{j} = (P')^{-1}e^{i}_{j} = (PP')^{-1}x^{i}_{j},

then

$(21) \ \ \ \ \ \ \ \ \ \ A = \sum \limits_{i,j} (\lambda_i x_j^i + x_{j-1}^i ) S y_j^i = \sum \limits_{i,j} x_j^i S (\lambda_i y_j^i + y_{j +1}^i )$

where it should be noted that the y's form a system reciprocal to the x's and that each of these systems forms a basis of the vector space since |P| ≠ 0.

If we form the transverse of A, we have from (21)

$(22) \ \ \ \ \ \ \ \ \ \ A' = \sum \limits_{i,j} (\lambda_i y_j^i + y_{j +1}^i ) S x_j^i$

so that the invariant vectors of A'are obtained by forming the system reciprocal to the x's and inverting the order in each group of vectors corresponding to a given elementary divisor; thus

A'y^{i}_{νi} = λ_{i}y^{i}_{νi}, A'y^{i}_{νi-1} = λ_{i}y^{i}_{νi-1} + y^{i}_{νi},...., A'y^{i}_{1} = λ_{i}y^{i}_{1} + y^{i}_{2}.

A matrix A and its transverse clearly have the same elementary divisors and are therefore similar. The matrix which transforms A into A' can be given explicitly as follows. Let q_{i} be the symmetric array

$\begin{matrix} 0 & 0 & ... & 0 & 1 \\ 0 & 0 & ... & 1 & 0 \\ . & . & ... & . & . \\ . & . & ... & . & . \\ 0 & 1 & ... & 0 & 0 \\ 1 & 0 & ... & 0 & 0 \end{matrix} (v_i \ rows \ and \ columns)$

It is easily seen that q_{i}a_{i} = a'_{i}q_{i} and hence, if Q is the matrix

$Q = \begin{Vmatrix} q_1 & & & & & \\ & q_2 & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & q_s \end{Vmatrix}$

we have Qa = a'Q, and a short calculation gives A' = R^{-1}AR where R is the symmetric matrix

(23) R = PQ^{-1}P' = PQP'.

If the elementary divisors of A are simple, then Q = 1 and R = PP'.

If the roots λ_{i} of the elementary divisors (12) are all different, the nullity of (A — λ_{i}) is 1, and hence x^{i}_{1} is unique to a scalar multiplier. But the remaining x^{i}_{j} are not unique. In fact, if the x's denote one choice of the invariant vectors, we may take in place of x^{i}_{j}

z^{i}_{j} = k^{i}_{1}x^{i}_{j} + k^{i}_{2}x^{i}_{j-1} + .... + k^{i}_{j}x^{i}_{1} (j = 1, 2,...., ν_{i})

where the k's are any constant scalars subject to the condition k^{i}_{1} ≠ 0. Suppose now that λ_{1} = λ_{2} = ..... = λ_{p} = α, λ_{i} ≠ α (i > p) and ν_{1} ≥ ν_{2} ≥ .... ≥ ν_{1}, z_{2},....., z_{k} is a *chain* of invariant vectors *belonging to the exponent k* if

(24) z_{i} = (A - α)^{k-i}z_{k} (i = 1, 2,...,k) (A - α)^{k}z_{k} = 0.

It is also convenient to set z_{i} = 0 for i < 0 or > k. We have already seen that k ≤ ν_{i} and that z_{k} lies in the nullspace of (A — α)^{ν1}; and from (17) it is seen that the nullspace of (A — α)^{ν1} has the basis (x^{i}_{j}; j = 1, 2,..., ν_{i}, i = 1. 2,...., p).

Since z_{k} belongs to the nullspace of (A — α)^{ν1}, we may set

$(25) \ \ \ \ \ \ \ \ \ z_k = \sum \limits_{i=1}^p \sum \limits_{j=1}^{v_i} \xi_{ij} x_j^i$

and therefore by repeated application of (15) with λ_{i} = α

$(26) \ \ \ \ \ \ \ \ \ (A - \alpha)^r z_k = \sum \limits_{i,j} \xi_{ij} x_{j-2}^i$

From this it follows that, in order that (A — α)^{k}z_{k} = 0, only values of j which are less than or equal to k can actually occur in (25) and in order that (A — α)^{k-1}z_{k} ≠ 0 at least one ζ_{ik} must be different from 0; hence

(27) $z_k = \sum \limits_i (\xi_{ik} x_k^i + \xi_{i,k-1} x_{k-1}^i + ...) \\ z_{k-1} = \sum \limits_i (\xi_{ik} x_{k-1}^i + \xi_{i,k-1} x_{k-2}^i + ...) \\ ........................................................................................\\ z_1 = \sum \limits_i \xi_{ik} x_1^i$

Finally, if we impose the restriction that z_{k} does not belong to any chain pertaining to an exponent greater than k, it is necessary and sufficient that k be one of the numbers ν_{1}, ν_{2},...., ν_{p} and that no value of i corresponding to an exponent greater than k occur in (27).

### 3.09

The actual determination of the vectors x^{i}_{j} can be carried out by the processes of §3.02 and §3.04 or alternatively as follows. Suppose that the first s_{1} of the exponents ν_{i} equal n_{1}, the next s_{2} equal n_{2}, and so on, and finally the last s_{q} equal n_{q}. Let R_{1} be the nullspace of (A — α)^{n1} and R'_{1} the nullspace of (A — α)^{n1-1}; then R_{1} contains R'_{1}. If M_{1} is a space complementary to R'_{1} in R_{1}, then for any vector x in M_{1} we have (A — α)^{r}x = 0 only when r ≥ n_{1}. Also, if x_{1}, x_{2},....., x_{m1} is a basis of M_{1}, the vectors

(28) (A - α)^{r}x_{i} (r = 0,1,....,n_{1} - 1)

are linearly independent; for, if

$\sum \limits_{r=a}^{n_1 - 1} \sum \limits_i \xi_{ir} (A - \alpha)^r x_i = 0,$

some ξ_{ir} being different from 0, then multiplying by (A — α)^{n1-s-1} we have

$(A - \alpha)^{n_1 - 1} \sum \limits_i \xi_{is} x_i = 0$

which is only possible if every ξ_{is} = 0 since x_{1}, x_{2},...., x_{m1} form a basis of M_{1} and for no other vector of M_{1} is (A — α)^{n1-1}x = 0. The space defined by (28) clearly lies in R_{1}; we shall denote it by ℘_{1}. If we set R_{1} = R_{2} + ℘_{1} where R_{2} is complementary to ℘_{1} in R_{1}, then R_{2} contains all vectors which are members of sets belonging to the exponents n_{2}, n_{3},.... but not lying in sets with the exponent n_{1}.

We now set R_{2} = R'_{2} + M_{2}, where R'_{2} is the subspace of vectors x in R_{2} such that (A — α)^{n1-1}x = 0. As before the elements of M_{2} generate sets with exponent n_{2} but are not members of sets with higher exponents; and by a repetition of this process we can determine step by step the sets of invariant vectors corresponding to each exponent n_{1}.