Invarian Factors and Elementary Divisors

3.01 Elementary transformations.

By an elementary transformation of a matric polynomial a(λ) = ||a_ij|| is meant one of the following operations on the rows or columns.

Type I. The operation of adding to a row (column) a different row (column) multiplied by a scalar polynomial θ(λ).

Type II. The operation of interchanging two rows (columns).

Type III. The operation of multiplying a row (column) by a constant k ≠ 0.

These transformations can be performed algebraically as follows.
Type I.      Let
         P_ij = 1 + θ(λ)e_ij      (i ≠ j),
θ(λ) being a scalar polynomial; then |P_ij| = 1 and
         $P_{ij} a = \sum \limits_{p,q} a_{pq} e_{pq} + \theta \sum \limits_q a_{jq} e_{iq}$
which is the matrix derived from a(λ) by adding θ times the jth row to the ith. The corresponding operation on the columns is equivalent to forming the product aP_ji.

Type II.      Let Q_ij be the matrix
         Q_ij = 1 - e_ii — e_jj + e_ij + e_ji      (i ≠ j)
that is, Q_ij is the matrix derived from the identity matrix by inserting 1 in place of 0 in the coefficients of e_ij and e_ji and 0 in place of 1 in the coefficients of e_ii and e_jj then |Q_ij| = — 1 and
     $Q_{ij} a = \sum \limits_{p,q} a_{pq} e_{pq} - \sum \limits_q a_{iq} e_{iq} - \sum \limits_q a_{jq} e_{jq} + \sum \limits_q a_{jq} e_{iq} + \sum \limits_q a_{iq} e_{jq}$
that is, Q_ija is derived from a by interchanging the ith and jth rows. Similarly aQ_ij is obtained by interchanging the ith and jth columns.

Since any permutation can be effected by a succession of transpositions, the corresponding transformation in the rows (columns) of a matrix can be produced by a succession of transformations of Type II.

Type III. This transformation is effected on the rth row (column) by multiplying on the left (right) by R = 1 + (k — l)e_rr; it is used only when it is convenient to make the leading coefficient in some term equal to 1.

The inverses of the matrices used in these transformations are
P^-1_ij = 1 - θe_ij, Q^-1_ij = Q_ij, R^-1 = 1 + (k^-1 - 1)e_rr;
these inverses are elementary transformations. The transverses are also elementary since P'_ij = P_ji and Qa and R are symmetric.

A matric polynomial b(λ) which is derived from a(λ) by a sequence of elementary transformations is said to be equivalent to a(λ); every such polynomial has the form p(λ)a(λ)q(λ) where p and q are products of elementary transformations. Since the inverse of an elementary transformation is elementary, a(λ) is also equivalent to b(λ). Further, the inverses of p and q are polynomials so that these are what we have already called elementary polynomials; we shall see later that every elementary polynomial can be derived from 1 by a sequence of elementary transformations.

In the following sections we require two lemmas whose proofs are almost immediate.

Lemma 1. The rank of a matrix is not altered by an elementary transformation.

For if |P| ≠ 0, AP and PA have the same rank as A (§1.10).

Lemma 2. The highest common factor of the coordinates of a matric polynomial is not altered by an elementary transformation.

This follows immediately from the definition of elementary transformations.

3.02 The normal form of a matrix.

The theorem we shall prove in this section is as follows.

Theorem 1. If a(λ) is a matric polynomial of rank r, it can be reduced by elementary transformations to a diagonal matrix
$(1) \ \ \ \ \ \ \sum \limits_i^r a_i(\lambda) e_{ji} = \begin{matrix} \alpha_1(\lambda) & & & & & & & & & & \\ & \alpha_2(\lambda) & & & & & & & & & \\ & & . & & & & & & & & \\ & & & . & & & & & & & \\ & & & & . & & & & & & \\ & & & & & \alpha_r(\lambda) & & & & & \\ & & & & & & 0 & & & & \\ & & & & & & & . & & & \\ & & & & & & & & . & & \\ & & & & & & & & & . & \\ & & & & & & & & & & 0 \end{matrix} = P(\lambda) a(\lambda) Q(\lambda)$,
where the coefficient of the highest power of λ in each polynomial α_i(λ) is 1, is a factor of α_i+1, ...., α_r (i = 1, 2,...., r — 1), and P(λ), Q(λ) are elementary polynomials.

We shall first show that, if the coordinate of a(λ) of minimum degree m, say a_pq is not a factor of every other coordinate, then a(λ) is equivalent to a matrix in which the degree of the coordinate of minimum degree is less than m.

Suppose that a_pq is not a factor of a_pi for some i; then we may set a_pi = βa_pq + a'_pi where 0 is integral and a_pi is not 0 and is of lower degree than m. Subtracting β times the ith column from the ith we have an equivalent matrix in which the coordinate (p, i) is a'_pi whose degree is less than m. The same reasoning applies if a_pq is not a factor of every coordinate a_iq in the qth column.

After a finite number of such steps we arrive at a matrix in which a coordinate of minimum degree, say k_pq, is a factor of all the coordinates which lie in the same row or column, but is possibly not a factor of some other coordinate k_ij. When this is so, let k_pj = βk_pq, k_iq = γk_pq where β and γ are integral. If we now add (1 — β) times the qth column to the jth, (p, j) and (i, j) become respectively
k'_pj = k_pj + (1 - β)k_pq = k_pq, k'_ij = k_ij + (1 - β)k_iq = k_ij + (1 - β)γk_pq.
Here either the degree of k'_ij is less than that of k_pq, or k'_pj has the minimum degree and is not a factor of k'_ij which lies in the same column, and hence the minimum degree can be lowered as above.

The process just described can be repeated so long as the coordinate of lowest degree is not a factor of every other coordinate and, since each step lowers the minimum degree, we derive in a finite number of steps a matrix ||b'_ij|| which is equivalent to a(λ) and in which the coordinate of minimum degree is in fact a divisor of every other coordinate; and further we may suppose that b'₁₁ = a₁(λ) is a coordinate of minimum degree and set b'_1i = γ_ib'₁₁, b'_j1 = δ_jb'₁₁. Subtracting γ_i times the first column from the ith and then δ_j times the first row from the jth (i, j = 2, 3,...., n) all the coordinates in the first row and column except b'₁₁ become 0, and we have an equivalent matrix in the form
$(2) \ \ \ \ \ \ \ \ \begin{matrix} \alpha_1(\lambda) & 0 & 0 & ... & 0 \\ 0 & b_{22} & b_{23} & ... & b_{2n} \\ 0 & b_{32} & b_{33} & ... & b_{3n} \\ . & . & . & ... & . \\ 0 & b_{n2} & b_{n3} & ... & b_{nn} \end{matrix}$
in which α₁ is a factor of every b_ij. The coefficient of the highest power of λ in α₁ may be made 1 by a transformation of type III.

The theorem now follows readily by induction. For, assuming it is true for matrices of order n — 1, the matrix of this order formed by the b's in (2) can be reduced to the diagonal matrix
$\begin{matrix} \alpha_2(\lambda) & & & & & & & & & & \\ & \alpha_3(\lambda) & & & & & & & & & \\ & & . & & & & & & & & \\ & & & . & & & & & & & \\ & & & & . & & & & & & \\ & & & & & \alpha_s(\lambda) & & & & & \\ & & & & & & 0 & & & & \\ & & & & & & & . & & & \\ & & & & & & & & . & & \\ & & & & & & & & & . & \\ & & & & & & & & & & 0 \end{matrix}$
where the α's satisfy the conditions of the theorem and each has α₁ as a factor (§3.01, Lemma 2). Moreover, the elementary transformations by which this reduction is carried out correspond to transformations affecting the last n — 1 rows and columns alone in (2) and, because of the zeros in the first row and column, these transformations when applied to (2) do not affect its first row and column; also, sinoe elementary transformations do not affect the rank (§3.01, Lemma 1), s equals r and a(X^ has therefore been reduced to the form required by the theorem.

The theorem is clearly true for matrices of order 1 and hence is true for any order.

Corollary. A matric polynomial whose determinant is independent of λ and is not 0, that is, an elementary polynomial, can be derived from 1 by the product of a finite number of elementary transformations.

The polynomials α_i are called the invariant factors of a(λ).

3.03 Determinantal and invariant factors.

The determinantal factor of the sth order, D_s, of a matric polynomial a(λ) is defined as the highest common factor of all minors of order s, the coefficient of the highest power of λ being taken as 1. An elementary transformation of type I either leaves a given minor unaltered or changes it into the sum of that minor and a multiple of another of the same order, and a transformation of type II simply permutes the minors of a given order among themselves, while one of type III merely multiplies a minor by a constant different from 0. Hence equivalent matrices have the same determinantal factors. Bearing this in mind we see immediately from the form of (1) that the determinantal factors of a(λ) are given by
D_s = α₁,α₂,...., α_s (s = 1, 2, • • ., r), D_s = 0 (s > r),
so that
α_s = D_s/D_s-1
The invariant factors are therefore known when the determinantal factors are given, and vice versa.

The definitions of this and the preceding sections have all been made relative to the fundamental basis. But we have seen in §1.08 that, if a₁ is the matrix with the same array of coordinates as a but relative to another basis, then there exists a non-singular constant matrix b such that a = b^-1a₁b so that a and a₁ are equivalent matrices. In terms of the new basis a₁ has the same invariant factors as a does in terms of the old and a, being equivalent to a,sub>1, has therefore the same invariant factors in terms of the new basis as it has in the old. Hence the invariant and determinantal factors of a matric polynomial are independent of the (ccmstant) basis in terms of which its coordinates are expressed.

The results of this section may be summarized as follows.

Theorem 2. Two matric polynomials are equivalent if, and only if, they have the same invariant factors.

3.04 Non-singular linear polynomials.

In the case of linear polynomials Theorem 2 can be made more precise as follows.

Theorem 3. If aλ + b and cλ + d are non-singular linear polynomials which have the same invariant factors, and if |c| ≠ 0, there exist non-singular constant matrices p and q such that
p(aλ + b)q = cλ + d.

We have seen in Theorem 2 that there exist elementary polynomials P(λ), Q(λ) such that
(3)          cλ + d = P(λ)(aλ + b)Q(λ).
Since |c| ≠ 0, we can employ the division transformation to find matric polynomials p₁, q₁ and constant matrices p, q such that
     P(λ) = (cλ + d)p₁ + p, Q(λ) = q₁(cλ + d) + q.
Using this in (3) we have
(4)      cλ + d = p(aλ + b)q + (cλ + d)p₁(aλ + b)Q + P(aλ + b)q₁(cλ + d)
         - (cλ + d)p₁(aλ + b)q₁(cλ + d)
and, since from (3)
         (aλ + b)Q = P^-1(cλ + d),      P(aλ + b) = (cλ + d)Q^-1,
we may write in place of (4)
(5)      p(aλ + b)q = [1 - (cλ + d)(p₁P^-1 + Q^-1q₁ - p₁(aλ + b)q₁)](cλ + d)
        = [1 - (cλ + d)R](cλ + d)
where R = pP^-1₁ + Q^-1q₁ — p₁(aλ + b)q₁ which is integral in λ since P and Q are elementary. If R ≠ 0, then, since |c| ≠ 0, the degree of the right side of (5) is at least 2, whereas the degree of the left side is only 1; hence R = 0 so that (5) gives p(aλ + b)q = cλ + d. Since cλ + d is not singular, neither p nor q can be singular, and hence the theorem is proved.

When |c| = 0 (and therefore also |a| = 0) the remaining conditions of Theorem 3 are not sufficient to ensure that we can find constant matrices in place of P and Q, but these conditions are readily modified so as to apply to this case also. If we replace λ by λ/μ and then multiply by μ, aλ + b is replaced by the homogeneous polynomial aλ + bμ and the definition of invariant factors applies immediately to such polynomials. In fact, if |a| ≠ 0, the invariant factors of aλ + bμ are simply the homogeneous polynomials which are equivalent to the corresponding invariant factors of aλ + b. If, however, |a| = 0, then |aλ + bμ| is divisible by a power of μ which leads to factors of the form μⁱ in the invariant factors of aλ + bμ which have no counterpart in those of aλ + b.

If |c| = 0 but |cλ + d| ≠ 0, there exist values, λ₁ ≠ 0, μ₁ such that |cλ₁ + dμ₁| ≠ 0 and, if we make the transformation
(6) λ = λ₁α, μ = μ₁α + β,
aλ + bμ, cλ + dμ become a₁α + b₁β, c₁α + d₁β where a₁ = aλ₁ + bμ₁, C₁ = cλ₁ + dμ₁, and therefore |c| ≠ 0. Further, when a&lamnbda; + bμ and cλ + dμ have the same invariant factors, this is also true of a₁α + b₁β and c₁α + d₁η. Since |c₁| ≠ 0, the proof of Theorem 3 is applicable, so that there are constant non-singular matrices p, q for which p(a₁α + b₁β)q = c₁α + d₁β, and on reversing the substitution (6) we have
p(aλ + bμ)q = cλ + dμ.
Theorem 3 can therefore be extended as follows.

Theorem 4. If the non-tingular polynomials aλ + bμ, cλ + dμ have the same invariant factors, there exist non-singular constant matrices p, q such that p(aλ + bμ)q = cλ + dμ.

An important particular case of Theorem 3 arises when the polynomials have the form λ — b, λ — d. For if p(λ — b)q = λ — d, on equating coefficients we have pq = 1, pbq = d; hence b = p^-1dp, that is, b and d are similar. Conversely, if b and d are similar, then λ — b and λ — d are equivalent, and hence we have the following theorem.

Theorem 5. Two constant matrices b, d are similar if, and only if,λ — b and λ — d have the same invariant factors.

3.05 Elementary divisors.

If D = |aλ + b| is not identically zero and if λ₁, λ₂,...., λ_s are its distinct roots, say
         D = (λ - λ₁)^ν₁(λ - λ₂)^ν₂.....(λ - λ_s)^ν_s,
then the invariant factors of aλ + b, being factors of D, have the form
     α₁ = (λ - λ₁)^ν₁₁(λ - λ₂)^ν₁₂.....(λ - λ_s)^ν_1s
     α₂ = (λ - λ₁)^ν₂₁(λ - λ₂)^ν₂₂.....(λ - λ_s)^ν_2s
     ...................................................................................
     α_i = (λ - λ₁)^ν_i1(λ - λ₂)^ν_i2.....(λ - λ_s)^ν_is
     ...................................................................................
     α_n = (λ - λ₁)^ν_n1(λ - λ₂)^ν_n2.....(λ - λ_s)^ν_ns
where $\sum \limits_{j=1}^r v_{ji = v_i}$ and, since α_j is a factor of α_j+1,
(8)          ν_1i ≤ ν_2i ≤ ..... ≤ ν_ni (i = 1, 2,....., s).
Such of the factors (λ — λ_j)^ν_ij as are not constants, that is, those for which ν_ij > 0, are called the elementary divisors of aλ + b. The elementary divisors of λ — b are also called the elementary divisors of b. When all the exponents ν_ij which are not 0 equal 1, b is said to have simple elementary divisors.

For some purposes the degrees of the elementary divisors are of more importance than the divisors themselves and, when this is the case, they are indicated by writing
(9) [(ν_n1, ν_n-1,1,...., ν₁₁),(ν_n2, ν_n-1,2,...., ν₁₂).....]
where exponents belonging to the same linear factor are in the same parenthesis, zero exponents being omitted; (9) is sometimes called the characteristic of aλ + b. If a root, say λ₁, is zero, it is convenient to indicate this by writing ν⁰_i1 in place of ν_i1.

The maximum degree of |aλ + b| is n and therefore $\sum \limits_{i,j} v_{ij} \leq n$ where the equality sign holds only when |a| ≠ 0.

The modifications necessary when the homogeneous polynomial aλ + bμ is taken in place of aλ + b are obvious and are left to the reader.

3.06 Matrices with given elementary divisors.

The direct investigation of the form of a matrix with given elementary divisors is somewhat tedious. It can be carried out in a variety of ways; but, since the form once found is easily verified, we shall here state this form and give the verification, merely saying in passing that it is suggested by the results of §2.07 together with a study of a matrix whose reduced characteristic function is (λ — λ₁)^ν.

Theorem 6. If λ₁, λ₂,...., λ_s are any constants_y not necessarily all different and ν₁, ν₂,...., ν_s are positive integers whose sum is n, and if a_i is the array of ν_i rows and columns given by
(10) $\begin{matrix} \lambda_i & 1 & 0 & ... & 0 & 0 \\ 0 & \lambda_i & 1 & ... & 0 & 0 \\ . & . & . & ... & . & . \\ . & . & . & ... & . & . \\ 0 & 0 & 0 & ... & \lambda_i & 1 \\ 0 & 0 & 0 & ... & 0 & \lambda_i \end{matrix}$
where each coordinate on the main diagonal equals λ_i, those on the parallel on its right are 1, and the remaining ones are 0, and if a is the matrix of n rows and columns given by
$(11) \ \ \ \ \ \ \ a = \begin{Vmatrix} a_1 & & & & & \\ & a_2 & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & a_s \end{Vmatrix}$
composed of blocks of terms defined by (10) arranged so thai the main diagonal of each lies on the main diagonal of a, the other coordinates being 0, then λ — a has the elementary divisors
(12) (λ - λ₁)^ν₁, (λ - λ₂)^ν₂, ..., (λ - λ_s)^ν_s

In addition to using a_i to denote the block given in (10) we shall also use it for the matrix having this block in the position indicated in (11) and zeros elsewhere. In the same way, if ƒ_i is a block with ν_i rows and columns with 1's in the main diagonal and zeros elsewhere, we may also use ƒ_i for the corresponding matrix of order n. We can then write
λ - a = ∑(λƒ_i - a_i), ƒ_ia = a_i = aƒ_i, ∑ƒ_i = 1.
The block of terms corresponding to λƒ_i — a_i has then the form
$(13) \ \ \ \ \ \ \ \begin{matrix} \lambda - \lambda_i & -1 & & & & \\ & & \lambda - \lambda_i & -1 & & \\ & & . & . & & \\ & & & . & . & \\ & & & & . & -1 \\ & & & & & \lambda - \lambda_i \end{matrix} (v_i\ rows\ and\ columns)$
where only the non-zero terms are indicated. The determinant of these ν_i rows and columns is (λ — λ_i)^ν_i and this determinant has a first minor equal to ±1; the invariant factors of λƒ_i — a_i, regarded as a matrix of order ν_i are therefore 1,1,....., 1, (λ — λ_i)^ν_i and hence it oan be reduced by elementary transformation to the diagonal form
$\begin{matrix} (\lambda - \lambda_i)^{v_i} & & & & & \\ & 1 & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & 1 \end{matrix}$

If we apply the same elementary transformations to the corresponding rows and columns of λ — a, the effect is the same as regards the block of terms λƒ_i — a_i (corresponding to a_i in (11)) since all the other coordinates in the rows and columns which contain elements of this block are 0; moreover these transformations do not affect the remaining blocks λƒ_j- — a_j- (j ≠ i) nor any 0 coordinate. Carrying out this process for i = 1, 2,...., s and permuting rows and columns, if necessary, we arrive at the form
$\begin{matrix} (\lambda - \lambda_1)^v & & & & & & & & & & \\ & (\lambda - \lambda_2)^{v_2} & & & & & & & & & \\ & & . & & & & & & & & \\ & & & . & & & & & & & \\ & & & & . & & & & & & \\ & & & & & (\lambda - \lambda_s)^{v_s} & & & & & \\ & & & & & & 1 & & & & \\ & & & & & & & . & & & \\ & & & & & & & & . & & \\ & & & & & & & & & . & \\ & & & & & & & & & & 1 \end{matrix}$

Suppose now that the notation is so arranged that
λ₁ = λ₂ = ..... = λ_p = α, ν₁ ≥ ν₂ ≥ ..... ≥ ν_p,
but λ_i &ne α for i > p. The nth determinantal factor D_n then contains (λ — a) to the power $\sum \limits_1^p v_i$ exactly. Each minor of order n — 1 contains at least p — 1 of the factors
(14) (λ - α)^ν₁, (λ - α)^ν₂,....., (λ - α)^ν_p
and in one the highest power (λ — α)^ν is lacking; hence D_n-1 contains (λ — a) to exactly the power $\sum \limits_2^p v_i$ and hence the nth invariant factor α_n contains it to exactly the ν₁>ith power. Similarly the minors of order n — 2 each contain at least p — 2 of the factors (14) and one lacks the two factors of highest degree; hence (λ — α) is contained in D_n-2 to exactly the power $\sum \limits_3^p v_i$ and in α_n-1 to the power ν₂. Continuing in this way we see that (14) gives the elementary divisors of a which are powers of (λ — α) and, treating the other roots in the same way, we see that the complete list of elementary divisors is given by (12) as required by the theorem.

3.07

If A is a matrix with the same elementary divisors as a, it follows from Theorem 5 that there is a matrix P such that A = PaP^-1 and hence, if we choose in place of the fundamental basis (e₁, e₂,....., e_n) the basis (Pe₁, Pe₂,...., Pe_n), it follows from Theorem 6 of chapter 1 that (11) gives the form of A relative to the new basis. This form is called the canonical form of A. It follows immediately from this that
$(15) \ \ \ \ \ \ \ \ P^{-1} A^k P = \begin{Vmatrix} a_1^k & & & & & \\ & a_2^k & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & a_s^k \end{Vmatrix}$
where a^k_i is the block of terms derived by forming the kth power of a_i regarded as a matrix of order ν_i.

Since D_n equals |λ — a|, it is the characteristic function of a (or A) and, since D_n-1 is the highest common factor of the first minors, it follows from Theorem 3 of chapter 2 that α_n is the reduced characteristic function.

If we add ƒ's together in groups each group consisting of all the ƒ's that correspond to the same value of λ_i, we get a set of idempotent matrices, say φ₁, φ₂,...., φ_n corresponding to the distinct roots of a, say α₁, α₂,...., α_r. These are the principal idempotent elements of a; for (i) aφ_i = φ_ia, (ii) (a — α_i)φ_i is nilpotent, (iii) ∑φ_i = ∑ƒ_i = 1 and φ_iφ_j = 0 (i ≠ j) so that the conditions of §2.11 are satisfied.

When the same root α_i occurs in several elementary divisors, the corresponding ƒ's are called partial idempotent elements of a; they are not unique as is seen immediately by taking a = 1.

If α is one of the roots of A, the form of A — α is sometimes important. Suppose that λ₁ = λ₂ = ..... = λ_p = α, λ_i ≠ α (i > p) and set
b_i = a_i — αƒ_i
the corresponding array in the t'th block of a — α (cf. (10), (11)) being
$(16) \ \ \ \ \ \ \ \ \ \ \begin{matrix} \lambda_i - \alpha & 1 & & & & \\ & \lambda_i - \alpha & 1 & & & \\ & & . & . & & \\ & & & . & . & \\ & & & & . & 1 \\ & & & & & \lambda_i - \alpha \end{matrix}$
In the case of the first p blocks λ_i — α = 0 and the corresponding b₁, b₂,...., b_p are nilpotent, the index of b_i being ν_i and, assuming ν₁ ≥ ν₂ ≥ .... ≥ ν_p as before, (A — α)^k has the form
$P^{-1} (A - \alpha)^k P = \begin{Vmatrix} b_1^k & & & & & \\ & b_2^k & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & b_s^k \end{Vmatrix}$
or, when k ≥ ν₁,
$(17) \ \ \ \ \ \ \ P^{-1} (A - \alpha)^k P = \begin{Vmatrix} 0 & & & & & & & & & \\ & . & & & & & & & & \\ & & . & & & & & & & \\ & & & . & & & & & & \\ & & & & 0 & & & & & \\ & & & & & b_{p+1}^k & & & & \\ & & & & & & . & & & \\ & & & & & & & . & & \\ & & & & & & & & . & \\ & & & & & & & & & b_s^k \end{Vmatrix} $

Since none of the diagonal coordinates of b_p+1,....., b_s are 0, the rank of (A — α)^k, when k ≥ ν₁ is exactly n — $\sum \limits_1^p v_i = \sum \limits_{p+1}^s v_i$ and the nullspace of (A — α)^k is then the same as that of (A — α)^ν₁. Hence, if there exists a vector z such that (A — α)^kz = 0 but (A — α)^k-1 z ≠ 0, then (i) k ≤ ν₁, (ii) z lies in the nullspace of (A — α)^k.

3.08 Invariant vectors.

If A is a matrix with the elementary divisors given in the statement of Theorem 6, then λ — A is equivalent to λ — a and by Theorem 5 there is a non-singular matrix P such that A = PaP^-1. If we denote the unit vectors corresponding to the rows and columns of a_i in (10) by eⁱ₁, eⁱ₂,...., eⁱ_{ν_i} and set
$(18) \ \ \ \ \ x_j^i = \left\{ \begin{array}{l l} P e_j^i & (j = 1, 2, ... , v_i; i = 1, 2, ... , s) \\ 0 & (j < 1 \ or \ > v_i; \ or \ i < 1 \ or > s)\\ \end{array} \right. \]$
then
aeⁱ₁ = λⁱ₁, aeⁱ₂ = λ_ieⁱ₂ + eⁱ₁,....., aeⁱ_{ν_i} = λ_ieⁱ_{ν_i} + eⁱ_νi-1
and hence
(19) Axⁱ_j = λ_ixⁱ_j + xⁱ_j-1, (j = 1, 2,...., ν_i; i = 1, 2,....., s).
The vectors xⁱ_j are called a set of invariant vectors of A.

The matrix A can be expressed in terms of its invariant vectors as follows. We have from (10)
$a_i = \sum \limits_i (\lambda_i e_j^i + e_{j-1}^i ) S e_j^i = \sum \limits_i e_j^i S (\lambda_i e_j^i + e_{j +1}^i )$
and hence, if
(20) yⁱ_j = (P')^-1eⁱ_j = (PP')^-1xⁱ_j,
then
$(21) \ \ \ \ \ \ \ \ \ \ A = \sum \limits_{i,j} (\lambda_i x_j^i + x_{j-1}^i ) S y_j^i = \sum \limits_{i,j} x_j^i S (\lambda_i y_j^i + y_{j +1}^i )$
where it should be noted that the y's form a system reciprocal to the x's and that each of these systems forms a basis of the vector space since |P| ≠ 0.

If we form the transverse of A, we have from (21)
$(22) \ \ \ \ \ \ \ \ \ \ A' = \sum \limits_{i,j} (\lambda_i y_j^i + y_{j +1}^i ) S x_j^i$
so that the invariant vectors of A'are obtained by forming the system reciprocal to the x's and inverting the order in each group of vectors corresponding to a given elementary divisor; thus
A'yⁱ_{ν_i} = λ_iyⁱ_{ν_i}, A'yⁱ_{ν_i-1} = λ_iyⁱ_{ν_i-1} + yⁱ_{ν_i},...., A'yⁱ₁ = λ_iyⁱ₁ + yⁱ₂.

A matrix A and its transverse clearly have the same elementary divisors and are therefore similar. The matrix which transforms A into A' can be given explicitly as follows. Let q_i be the symmetric array
         $\begin{matrix} 0 & 0 & ... & 0 & 1 \\ 0 & 0 & ... & 1 & 0 \\ . & . & ... & . & . \\ . & . & ... & . & . \\ 0 & 1 & ... & 0 & 0 \\ 1 & 0 & ... & 0 & 0 \end{matrix} (v_i \ rows \ and \ columns)$
It is easily seen that q_ia_i = a'_iq_i and hence, if Q is the matrix
         $Q = \begin{Vmatrix} q_1 & & & & & \\ & q_2 & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & q_s \end{Vmatrix}$
we have Qa = a'Q, and a short calculation gives A' = R^-1AR where R is the symmetric matrix
(23)          R = PQ^-1P' = PQP'.
If the elementary divisors of A are simple, then Q = 1 and R = PP'.

If the roots λ_i of the elementary divisors (12) are all different, the nullity of (A — λ_i) is 1, and hence xⁱ₁ is unique to a scalar multiplier. But the remaining xⁱ_j are not unique. In fact, if the x's denote one choice of the invariant vectors, we may take in place of xⁱ_j
zⁱ_j = kⁱ₁xⁱ_j + kⁱ₂xⁱ_j-1 + .... + kⁱ_jxⁱ₁ (j = 1, 2,...., ν_i)
where the k's are any constant scalars subject to the condition kⁱ₁ ≠ 0. Suppose now that λ₁ = λ₂ = ..... = λ_p = α, λ_i ≠ α (i > p) and ν₁ ≥ ν₂ ≥ .... ≥ νp as in §3.07. We shall say that z₁, z₂,....., z_k is a chain of invariant vectors belonging to the exponent k if
(24) z_i = (A - α)^k-iz_k (i = 1, 2,...,k) (A - α)^kz_k = 0.
It is also convenient to set z_i = 0 for i < 0 or > k. We have already seen that k ≤ ν_i and that z_k lies in the nullspace of (A — α)^ν₁; and from (17) it is seen that the nullspace of (A — α)^ν₁ has the basis (xⁱ_j; j = 1, 2,..., ν_i, i = 1. 2,...., p).

Since z_k belongs to the nullspace of (A — α)^ν₁, we may set
$(25) \ \ \ \ \ \ \ \ \ z_k = \sum \limits_{i=1}^p \sum \limits_{j=1}^{v_i} \xi_{ij} x_j^i$
and therefore by repeated application of (15) with λ_i = α
$(26) \ \ \ \ \ \ \ \ \ (A - \alpha)^r z_k = \sum \limits_{i,j} \xi_{ij} x_{j-2}^i$
From this it follows that, in order that (A — α)^kz_k = 0, only values of j which are less than or equal to k can actually occur in (25) and in order that (A — α)^k-1z_k ≠ 0 at least one ζ_ik must be different from 0; hence
(27) $z_k = \sum \limits_i (\xi_{ik} x_k^i + \xi_{i,k-1} x_{k-1}^i + ...) \\ z_{k-1} = \sum \limits_i (\xi_{ik} x_{k-1}^i + \xi_{i,k-1} x_{k-2}^i + ...) \\ ........................................................................................\\ z_1 = \sum \limits_i \xi_{ik} x_1^i$
Finally, if we impose the restriction that z_k does not belong to any chain pertaining to an exponent greater than k, it is necessary and sufficient that k be one of the numbers ν₁, ν₂,...., ν_p and that no value of i corresponding to an exponent greater than k occur in (27).

3.09

The actual determination of the vectors xⁱ_j can be carried out by the processes of §3.02 and §3.04 or alternatively as follows. Suppose that the first s₁ of the exponents ν_i equal n₁, the next s₂ equal n₂, and so on, and finally the last s_q equal n_q. Let R₁ be the nullspace of (A — α)^n₁ and R'₁ the nullspace of (A — α)^n₁-1; then R₁ contains R'₁. If M₁ is a space complementary to R'₁ in R₁, then for any vector x in M₁ we have (A — α)^rx = 0 only when r ≥ n₁. Also, if x₁, x₂,....., x_m₁ is a basis of M₁, the vectors
(28)          (A - α)^rx_i (r = 0,1,....,n₁ - 1)
are linearly independent; for, if
         $\sum \limits_{r=a}^{n_1 - 1} \sum \limits_i \xi_{ir} (A - \alpha)^r x_i = 0,$
some ξ_ir being different from 0, then multiplying by (A — α)^n₁-s-1 we have
         $(A - \alpha)^{n_1 - 1} \sum \limits_i \xi_{is} x_i = 0$
which is only possible if every ξ_is = 0 since x₁, x₂,...., x_m₁ form a basis of M₁ and for no other vector of M₁ is (A — α)^n₁-1x = 0. The space defined by (28) clearly lies in R₁; we shall denote it by ℘₁. If we set R₁ = R₂ + ℘₁ where R₂ is complementary to ℘₁ in R₁, then R₂ contains all vectors which are members of sets belonging to the exponents n₂, n₃,.... but not lying in sets with the exponent n₁.

We now set R₂ = R'₂ + M₂, where R'₂ is the subspace of vectors x in R₂ such that (A — α)^n₁-1x = 0. As before the elements of M₂ generate sets with exponent n₂ but are not members of sets with higher exponents; and by a repetition of this process we can determine step by step the sets of invariant vectors corresponding to each exponent n₁.