Invarian Factors and Elementary Divisors
3.01 Elementary transformations.
By an elementary transformation of a matric polynomial a(λ) = ||aij|| is meant one of the following operations on the rows or columns.
Type I. The operation of adding to a row (column) a different row (column) multiplied by a scalar polynomial θ(λ).
Type II. The operation of interchanging two rows (columns).
Type III. The operation of multiplying a row (column) by a constant k ≠ 0.
These transformations can be performed algebraically as follows.
Type I. Let
Pij = 1 + θ(λ)eij (i ≠ j),
θ(λ) being a scalar polynomial; then |Pij| = 1 and
$P_{ij} a = \sum \limits_{p,q} a_{pq} e_{pq} + \theta \sum \limits_q a_{jq} e_{iq}$
which is the matrix derived from a(λ) by adding θ times the jth row to the ith. The corresponding operation on the columns is equivalent to forming the product aPji.
Type II. Let Qij be the matrix
Qij = 1 - eii — ejj + eij + eji (i ≠ j)
that is, Qij is the matrix derived from the identity matrix by inserting 1 in place of 0 in the coefficients of eij and eji and 0 in place of 1 in the coefficients of eii and ejj then |Qij| = — 1 and
$Q_{ij} a = \sum \limits_{p,q} a_{pq} e_{pq} - \sum \limits_q a_{iq} e_{iq} - \sum \limits_q a_{jq} e_{jq} + \sum \limits_q a_{jq} e_{iq} + \sum \limits_q a_{iq} e_{jq}$
that is, Qija is derived from a by interchanging the ith and jth rows. Similarly aQij is obtained by interchanging the ith and jth columns.
Since any permutation can be effected by a succession of transpositions, the corresponding transformation in the rows (columns) of a matrix can be produced by a succession of transformations of Type II.
Type III. This transformation is effected on the rth row (column) by multiplying on the left (right) by R = 1 + (k — l)err; it is used only when it is convenient to make the leading coefficient in some term equal to 1.
The inverses of the matrices used in these transformations are
P-1ij = 1 - θeij, Q-1ij = Qij, R-1 = 1 + (k-1 - 1)err;
these inverses are elementary transformations. The transverses are also elementary since P'ij = Pji and Qa and R are symmetric.
A matric polynomial b(λ) which is derived from a(λ) by a sequence of elementary transformations is said to be equivalent to a(λ); every such polynomial has the form p(λ)a(λ)q(λ) where p and q are products of elementary transformations. Since the inverse of an elementary transformation is elementary, a(λ) is also equivalent to b(λ). Further, the inverses of p and q are polynomials so that these are what we have already called elementary polynomials; we shall see later that every elementary polynomial can be derived from 1 by a sequence of elementary transformations.
In the following sections we require two lemmas whose proofs are almost immediate.
Lemma 1. The rank of a matrix is not altered by an elementary transformation.
For if |P| ≠ 0, AP and PA have the same rank as A (§1.10).
Lemma 2. The highest common factor of the coordinates of a matric polynomial is not altered by an elementary transformation.
This follows immediately from the definition of elementary transformations.
3.02 The normal form of a matrix.
The theorem we shall prove in this section is as follows.
Theorem 1. If a(λ) is a matric polynomial of rank r, it can be reduced by elementary transformations to a diagonal matrix
$(1) \ \ \ \ \ \ \sum \limits_i^r a_i(\lambda) e_{ji} = \begin{matrix} \alpha_1(\lambda) & & & & & & & & & & \\ & \alpha_2(\lambda) & & & & & & & & & \\ & & . & & & & & & & & \\ & & & . & & & & & & & \\ & & & & . & & & & & & \\ & & & & & \alpha_r(\lambda) & & & & & \\ & & & & & & 0 & & & & \\ & & & & & & & . & & & \\ & & & & & & & & . & & \\ & & & & & & & & & . & \\ & & & & & & & & & & 0 \end{matrix} = P(\lambda) a(\lambda) Q(\lambda)$,
where the coefficient of the highest power of λ in each polynomial αi(λ) is 1, is a factor of αi+1, ...., αr (i = 1, 2,...., r — 1), and P(λ), Q(λ) are elementary polynomials.
We shall first show that, if the coordinate of a(λ) of minimum degree m, say apq is not a factor of every other coordinate, then a(λ) is equivalent to a matrix in which the degree of the coordinate of minimum degree is less than m.
Suppose that apq is not a factor of api for some i; then we may set api = βapq + a'pi where 0 is integral and api is not 0 and is of lower degree than m. Subtracting β times the ith column from the ith we have an equivalent matrix in which the coordinate (p, i) is a'pi whose degree is less than m. The same reasoning applies if apq is not a factor of every coordinate aiq in the qth column.
After a finite number of such steps we arrive at a matrix in which a coordinate of minimum degree, say kpq, is a factor of all the coordinates which lie in the same row or column, but is possibly not a factor of some other coordinate kij. When this is so, let kpj = βkpq, kiq = γkpq where β and γ are integral. If we now add (1 — β) times the qth column to the jth, (p, j) and (i, j) become respectively
k'pj = kpj + (1 - β)kpq = kpq, k'ij = kij + (1 - β)kiq = kij + (1 - β)γkpq.
Here either the degree of k'ij is less than that of kpq, or k'pj has the minimum degree and is not a factor of k'ij which lies in the same column, and hence the minimum degree can be lowered as above.
The process just described can be repeated so long as the coordinate of lowest degree is not a factor of every other coordinate and, since each step lowers the minimum degree, we derive in a finite number of steps a matrix ||b'ij|| which is equivalent to a(λ) and in which the coordinate of minimum degree is in fact a divisor of every other coordinate; and further we may suppose that b'11 = a1(λ) is a coordinate of minimum degree and set b'1i = γib'11, b'j1 = δjb'11. Subtracting γi times the first column from the ith and then δj times the first row from the jth (i, j = 2, 3,...., n) all the coordinates in the first row and column except b'11 become 0, and we have an equivalent matrix in the form
$(2) \ \ \ \ \ \ \ \ \begin{matrix} \alpha_1(\lambda) & 0 & 0 & ... & 0 \\ 0 & b_{22} & b_{23} & ... & b_{2n} \\ 0 & b_{32} & b_{33} & ... & b_{3n} \\ . & . & . & ... & . \\ 0 & b_{n2} & b_{n3} & ... & b_{nn} \end{matrix}$
in which α1 is a factor of every bij. The coefficient of the highest power of λ in α1 may be made 1 by a transformation of type III.
The theorem now follows readily by induction. For, assuming it is true for matrices of order n — 1, the matrix of this order formed by the b's in (2) can be reduced to the diagonal matrix
$\begin{matrix} \alpha_2(\lambda) & & & & & & & & & & \\ & \alpha_3(\lambda) & & & & & & & & & \\ & & . & & & & & & & & \\ & & & . & & & & & & & \\ & & & & . & & & & & & \\ & & & & & \alpha_s(\lambda) & & & & & \\ & & & & & & 0 & & & & \\ & & & & & & & . & & & \\ & & & & & & & & . & & \\ & & & & & & & & & . & \\ & & & & & & & & & & 0 \end{matrix}$
where the α's satisfy the conditions of the theorem and each has α1 as a factor (§3.01, Lemma 2). Moreover, the elementary transformations by which this reduction is carried out correspond to transformations affecting the last n — 1 rows and columns alone in (2) and, because of the zeros in the first row and column, these transformations when applied to (2) do not affect its first row and column; also, sinoe elementary transformations do not affect the rank (§3.01, Lemma 1), s equals r and a(X^ has therefore been reduced to the form required by the theorem.
The theorem is clearly true for matrices of order 1 and hence is true for any order.
Corollary. A matric polynomial whose determinant is independent of λ and is not 0, that is, an elementary polynomial, can be derived from 1 by the product of a finite number of elementary transformations.
The polynomials αi are called the invariant factors of a(λ).
3.03 Determinantal and invariant factors.
The determinantal factor of the sth order, Ds, of a matric polynomial a(λ) is defined as the highest common factor of all minors of order s, the coefficient of the highest power of λ being taken as 1. An elementary transformation of type I either leaves a given minor unaltered or changes it into the sum of that minor and a multiple of another of the same order, and a transformation of type II simply permutes the minors of a given order among themselves, while one of type III merely multiplies a minor by a constant different from 0. Hence equivalent matrices have the same determinantal factors. Bearing this in mind we see immediately from the form of (1) that the determinantal factors of a(λ) are given by
Ds = α1,α2,...., αs (s = 1, 2, • • ., r), Ds = 0 (s > r),
so that
αs = Ds/Ds-1
The invariant factors are therefore known when the determinantal factors are given, and vice versa.
The definitions of this and the preceding sections have all been made relative to the fundamental basis. But we have seen in §1.08 that, if a1 is the matrix with the same array of coordinates as a but relative to another basis, then there exists a non-singular constant matrix b such that a = b-1a1b so that a and a1 are equivalent matrices. In terms of the new basis a1 has the same invariant factors as a does in terms of the old and a, being equivalent to a,sub>1, has therefore the same invariant factors in terms of the new basis as it has in the old. Hence the invariant and determinantal factors of a matric polynomial are independent of the (ccmstant) basis in terms of which its coordinates are expressed.
The results of this section may be summarized as follows.
Theorem 2. Two matric polynomials are equivalent if, and only if, they have the same invariant factors.
3.04 Non-singular linear polynomials.
In the case of linear polynomials Theorem 2 can be made more precise as follows.
Theorem 3. If aλ + b and cλ + d are non-singular linear polynomials which have the same invariant factors, and if |c| ≠ 0, there exist non-singular constant matrices p and q such that
p(aλ + b)q = cλ + d.
We have seen in Theorem 2 that there exist elementary polynomials P(λ), Q(λ) such that
(3) cλ + d = P(λ)(aλ + b)Q(λ).
Since |c| ≠ 0, we can employ the division transformation to find matric polynomials p1, q1 and constant matrices p, q such that
P(λ) = (cλ + d)p1 + p, Q(λ) = q1(cλ + d) + q.
Using this in (3) we have
(4) cλ + d = p(aλ + b)q + (cλ + d)p1(aλ + b)Q + P(aλ + b)q1(cλ + d)
- (cλ + d)p1(aλ + b)q1(cλ + d)
and, since from (3)
(aλ + b)Q = P-1(cλ + d), P(aλ + b) = (cλ + d)Q-1,
we may write in place of (4)
(5) p(aλ + b)q = [1 - (cλ + d)(p1P-1 + Q-1q1 - p1(aλ + b)q1)](cλ + d)
= [1 - (cλ + d)R](cλ + d)
where R = pP-11 + Q-1q1 — p1(aλ + b)q1 which is integral in λ since P and Q are elementary. If R ≠ 0, then, since |c| ≠ 0, the degree of the right side of (5) is at least 2, whereas the degree of the left side is only 1; hence R = 0 so that (5) gives p(aλ + b)q = cλ + d. Since cλ + d is not singular, neither p nor q can be singular, and hence the theorem is proved.
If |c| = 0 but |cλ + d| ≠ 0, there exist values, λ1 ≠ 0, μ1 such that |cλ1 + dμ1| ≠ 0 and, if we make the transformation
(6) λ = λ1α, μ = μ1α + β,
aλ + bμ, cλ + dμ become a1α + b1β, c1α + d1β where a1 = aλ1 + bμ1, C1 = cλ1 + dμ1, and therefore |c| ≠ 0. Further, when a&lamnbda; + bμ and cλ + dμ have the same invariant factors, this is also true of a1α + b1β and c1α + d1η. Since |c1| ≠ 0, the proof of Theorem 3 is applicable, so that there are constant non-singular matrices p, q for which p(a1α + b1β)q = c1α + d1β, and on reversing the substitution (6) we have
p(aλ + bμ)q = cλ + dμ.
Theorem 3 can therefore be extended as follows.
Theorem 4. If the non-tingular polynomials aλ + bμ, cλ + dμ have the same invariant factors, there exist non-singular constant matrices p, q such that p(aλ + bμ)q = cλ + dμ.
An important particular case of Theorem 3 arises when the polynomials have the form λ — b, λ — d. For if p(λ — b)q = λ — d, on equating coefficients we have pq = 1, pbq = d; hence b = p-1dp, that is, b and d are similar. Conversely, if b and d are similar, then λ — b and λ — d are equivalent, and hence we have the following theorem.
Theorem 5. Two constant matrices b, d are similar if, and only if,λ — b and λ — d have the same invariant factors.
3.05 Elementary divisors.
If D = |aλ + b| is not identically zero and if λ1, λ2,...., λs are its distinct roots, say
D = (λ - λ1)ν1(λ - λ2)ν2.....(λ - λs)νs,
then the invariant factors of aλ + b, being factors of D, have the form
α1 = (λ - λ1)ν11(λ - λ2)ν12.....(λ - λs)ν1s
α2 = (λ - λ1)ν21(λ - λ2)ν22.....(λ - λs)ν2s
...................................................................................
αi = (λ - λ1)νi1(λ - λ2)νi2.....(λ - λs)νis
...................................................................................
αn = (λ - λ1)νn1(λ - λ2)νn2.....(λ - λs)νns
where $\sum \limits_{j=1}^r v_{ji = v_i}$ and, since αj is a factor of αj+1,
(8) ν1i ≤ ν2i ≤ ..... ≤ νni (i = 1, 2,....., s).
Such of the factors (λ — λj)νij as are not constants, that is, those for which νij > 0, are called the elementary divisors of aλ + b. The elementary divisors of λ — b are also called the elementary divisors of b. When all the exponents νij which are not 0 equal 1, b is said to have simple elementary divisors.
For some purposes the degrees of the elementary divisors are of more importance than the divisors themselves and, when this is the case, they are indicated by writing
(9) [(νn1, νn-1,1,...., ν11),(νn2, νn-1,2,...., ν12).....]
where exponents belonging to the same linear factor are in the same parenthesis, zero exponents being omitted; (9) is sometimes called the characteristic of aλ + b. If a root, say λ1, is zero, it is convenient to indicate this by writing ν0i1 in place of νi1.
The maximum degree of |aλ + b| is n and therefore $\sum \limits_{i,j} v_{ij} \leq n$ where the equality sign holds only when |a| ≠ 0.
The modifications necessary when the homogeneous polynomial aλ + bμ is taken in place of aλ + b are obvious and are left to the reader.
3.06 Matrices with given elementary divisors.
The direct investigation of the form of a matrix with given elementary divisors is somewhat tedious. It can be carried out in a variety of ways; but, since the form once found is easily verified, we shall here state this form and give the verification, merely saying in passing that it is suggested by the results of §2.07 together with a study of a matrix whose reduced characteristic function is (λ — λ1)ν.
Theorem 6. If λ1, λ2,...., λs are any constantsy not necessarily all different and ν1, ν2,...., νs are positive integers whose sum is n, and if ai is the array of νi rows and columns given by
(10) $\begin{matrix} \lambda_i & 1 & 0 & ... & 0 & 0 \\ 0 & \lambda_i & 1 & ... & 0 & 0 \\ . & . & . & ... & . & . \\ . & . & . & ... & . & . \\ 0 & 0 & 0 & ... & \lambda_i & 1 \\ 0 & 0 & 0 & ... & 0 & \lambda_i \end{matrix}$
where each coordinate on the main diagonal equals λi, those on the parallel on its right are 1, and the remaining ones are 0, and if a is the matrix of n rows and columns given by
$(11) \ \ \ \ \ \ \ a = \begin{Vmatrix} a_1 & & & & & \\ & a_2 & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & a_s \end{Vmatrix}$
composed of blocks of terms defined by (10) arranged so thai the main diagonal of each lies on the main diagonal of a, the other coordinates being 0, then λ — a has the elementary divisors
(12) (λ - λ1)ν1, (λ - λ2)ν2, ..., (λ - λs)νs
In addition to using ai to denote the block given in (10) we shall also use it for the matrix having this block in the position indicated in (11) and zeros elsewhere. In the same way, if ƒi is a block with νi rows and columns with 1's in the main diagonal and zeros elsewhere, we may also use ƒi for the corresponding matrix of order n. We can then write
λ - a = ∑(λƒi - ai), ƒia = ai = aƒi, ∑ƒi = 1.
The block of terms corresponding to λƒi — ai has then the form
$(13) \ \ \ \ \ \ \ \begin{matrix} \lambda - \lambda_i & -1 & & & & \\ & & \lambda - \lambda_i & -1 & & \\ & & . & . & & \\ & & & . & . & \\ & & & & . & -1 \\ & & & & & \lambda - \lambda_i \end{matrix} (v_i\ rows\ and\ columns)$
where only the non-zero terms are indicated. The determinant of these νi rows and columns is (λ — λi)νi and this determinant has a first minor equal to ±1; the invariant factors of λƒi — ai, regarded as a matrix of order νi are therefore 1,1,....., 1, (λ — λi)νi and hence it oan be reduced by elementary transformation to the diagonal form
$\begin{matrix} (\lambda - \lambda_i)^{v_i} & & & & & \\ & 1 & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & 1 \end{matrix}$
If we apply the same elementary transformations to the corresponding rows and columns of λ — a, the effect is the same as regards the block of terms λƒi — ai (corresponding to ai in (11)) since all the other coordinates in the rows and columns which contain elements of this block are 0; moreover these transformations do not affect the remaining blocks λƒj- — aj- (j ≠ i) nor any 0 coordinate. Carrying out this process for i = 1, 2,...., s and permuting rows and columns, if necessary, we arrive at the form
$\begin{matrix} (\lambda - \lambda_1)^v & & & & & & & & & & \\ & (\lambda - \lambda_2)^{v_2} & & & & & & & & & \\ & & . & & & & & & & & \\ & & & . & & & & & & & \\ & & & & . & & & & & & \\ & & & & & (\lambda - \lambda_s)^{v_s} & & & & & \\ & & & & & & 1 & & & & \\ & & & & & & & . & & & \\ & & & & & & & & . & & \\ & & & & & & & & & . & \\ & & & & & & & & & & 1 \end{matrix}$
Suppose now that the notation is so arranged that
λ1 = λ2 = ..... = λp = α, ν1 ≥ ν2 ≥ ..... ≥ νp,
but λi &ne α for i > p. The nth determinantal factor Dn then contains (λ — a) to the power $\sum \limits_1^p v_i$ exactly. Each minor of order n — 1 contains at least p — 1 of the factors
(14) (λ - α)ν1, (λ - α)ν2,....., (λ - α)νp
and in one the highest power (λ — α)ν is lacking; hence Dn-1 contains (λ — a) to exactly the power $\sum \limits_2^p v_i$ and hence the nth invariant factor αn contains it to exactly the ν1>ith power. Similarly the minors of order n — 2 each contain at least p — 2 of the factors (14) and one lacks the two factors of highest degree; hence (λ — α) is contained in Dn-2 to exactly the power $\sum \limits_3^p v_i$ and in αn-1 to the power ν2. Continuing in this way we see that (14) gives the elementary divisors of a which are powers of (λ — α) and, treating the other roots in the same way, we see that the complete list of elementary divisors is given by (12) as required by the theorem.
3.07
If A is a matrix with the same elementary divisors as a, it follows from Theorem 5 that there is a matrix P such that A = PaP-1 and hence, if we choose in place of the fundamental basis (e1, e2,....., en) the basis (Pe1, Pe2,...., Pen), it follows from Theorem 6 of chapter 1 that (11) gives the form of A relative to the new basis. This form is called the canonical form of A. It follows immediately from this that
$(15) \ \ \ \ \ \ \ \ P^{-1} A^k P = \begin{Vmatrix} a_1^k & & & & & \\ & a_2^k & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & a_s^k \end{Vmatrix}$
where aki is the block of terms derived by forming the kth power of ai regarded as a matrix of order νi.
Since Dn equals |λ — a|, it is the characteristic function of a (or A) and, since Dn-1 is the highest common factor of the first minors, it follows from Theorem 3 of chapter 2 that αn is the reduced characteristic function.
If we add ƒ's together in groups each group consisting of all the ƒ's that correspond to the same value of λi, we get a set of idempotent matrices, say φ1, φ2,...., φn corresponding to the distinct roots of a, say α1, α2,...., αr. These are the principal idempotent elements of a; for (i) aφi = φia, (ii) (a — αi)φi is nilpotent, (iii) ∑φi = ∑ƒi = 1 and φiφj = 0 (i ≠ j) so that the conditions of §2.11 are satisfied.
When the same root αi occurs in several elementary divisors, the corresponding ƒ's are called partial idempotent elements of a; they are not unique as is seen immediately by taking a = 1.
If α is one of the roots of A, the form of A — α is sometimes important. Suppose that λ1 = λ2 = ..... = λp = α, λi ≠ α (i > p) and set
bi = ai — αƒi
the corresponding array in the t'th block of a — α (cf. (10), (11)) being
$(16) \ \ \ \ \ \ \ \ \ \ \begin{matrix} \lambda_i - \alpha & 1 & & & & \\ & \lambda_i - \alpha & 1 & & & \\ & & . & . & & \\ & & & . & . & \\ & & & & . & 1 \\ & & & & & \lambda_i - \alpha \end{matrix}$
In the case of the first p blocks λi — α = 0 and the corresponding b1, b2,...., bp are nilpotent, the index of bi being νi and, assuming ν1 ≥ ν2 ≥ .... ≥ νp as before, (A — α)k has the form
$P^{-1} (A - \alpha)^k P = \begin{Vmatrix} b_1^k & & & & & \\ & b_2^k & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & b_s^k \end{Vmatrix}$
or, when k ≥ ν1,
$(17) \ \ \ \ \ \ \ P^{-1} (A - \alpha)^k P = \begin{Vmatrix} 0 & & & & & & & & & \\ & . & & & & & & & & \\ & & . & & & & & & & \\ & & & . & & & & & & \\ & & & & 0 & & & & & \\ & & & & & b_{p+1}^k & & & & \\ & & & & & & . & & & \\ & & & & & & & . & & \\ & & & & & & & & . & \\ & & & & & & & & & b_s^k \end{Vmatrix} $
Since none of the diagonal coordinates of bp+1,....., bs are 0, the rank of (A — α)k, when k ≥ ν1 is exactly n — $\sum \limits_1^p v_i = \sum \limits_{p+1}^s v_i$ and the nullspace of (A — α)k is then the same as that of (A — α)ν1. Hence, if there exists a vector z such that (A — α)kz = 0 but (A — α)k-1 z ≠ 0, then (i) k ≤ ν1, (ii) z lies in the nullspace of (A — α)k.
3.08 Invariant vectors.
If A is a matrix with the elementary divisors given in the statement of Theorem 6, then λ — A is equivalent to λ — a and by Theorem 5 there is a non-singular matrix P such that A = PaP-1. If we denote the unit vectors corresponding to the rows and columns of ai in (10) by ei1, ei2,...., eiνi and set
$(18) \ \ \ \ \ x_j^i = \left\{ \begin{array}{l l} P e_j^i & (j = 1, 2, ... , v_i; i = 1, 2, ... , s) \\ 0 & (j < 1 \ or \ > v_i; \ or \ i < 1 \ or > s)\\ \end{array} \right. \]$
then
aei1 = λi1, aei2 = λiei2 + ei1,....., aeiνi = λieiνi + eiν
and hence
(19) Axij = λixij + xij-1, (j = 1, 2,...., νi; i = 1, 2,....., s).
The vectors xij are called a set of invariant vectors of A.
The matrix A can be expressed in terms of its invariant vectors as follows. We have from (10)
$a_i = \sum \limits_i (\lambda_i e_j^i + e_{j-1}^i ) S e_j^i = \sum \limits_i e_j^i S (\lambda_i e_j^i + e_{j +1}^i )$
and hence, if
(20) yij = (P')-1eij = (PP')-1xij,
then
$(21) \ \ \ \ \ \ \ \ \ \ A = \sum \limits_{i,j} (\lambda_i x_j^i + x_{j-1}^i ) S y_j^i = \sum \limits_{i,j} x_j^i S (\lambda_i y_j^i + y_{j +1}^i )$
where it should be noted that the y's form a system reciprocal to the x's and that each of these systems forms a basis of the vector space since |P| ≠ 0.
If we form the transverse of A, we have from (21)
$(22) \ \ \ \ \ \ \ \ \ \ A' = \sum \limits_{i,j} (\lambda_i y_j^i + y_{j +1}^i ) S x_j^i$
so that the invariant vectors of A'are obtained by forming the system reciprocal to the x's and inverting the order in each group of vectors corresponding to a given elementary divisor; thus
A'yiνi = λiyiνi, A'yiνi-1 = λiyiνi-1 + yiνi,...., A'yi1 = λiyi1 + yi2.
A matrix A and its transverse clearly have the same elementary divisors and are therefore similar. The matrix which transforms A into A' can be given explicitly as follows. Let qi be the symmetric array
$\begin{matrix} 0 & 0 & ... & 0 & 1 \\ 0 & 0 & ... & 1 & 0 \\ . & . & ... & . & . \\ . & . & ... & . & . \\ 0 & 1 & ... & 0 & 0 \\ 1 & 0 & ... & 0 & 0 \end{matrix} (v_i \ rows \ and \ columns)$
It is easily seen that qiai = a'iqi and hence, if Q is the matrix
$Q = \begin{Vmatrix} q_1 & & & & & \\ & q_2 & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & q_s \end{Vmatrix}$
we have Qa = a'Q, and a short calculation gives A' = R-1AR where R is the symmetric matrix
(23) R = PQ-1P' = PQP'.
If the elementary divisors of A are simple, then Q = 1 and R = PP'.
If the roots λi of the elementary divisors (12) are all different, the nullity of (A — λi) is 1, and hence xi1 is unique to a scalar multiplier. But the remaining xij are not unique. In fact, if the x's denote one choice of the invariant vectors, we may take in place of xij
zij = ki1xij + ki2xij-1 + .... + kijxi1 (j = 1, 2,...., νi)
where the k's are any constant scalars subject to the condition ki1 ≠ 0. Suppose now that λ1 = λ2 = ..... = λp = α, λi ≠ α (i > p) and ν1 ≥ ν2 ≥ .... ≥ ν
(24) zi = (A - α)k-izk (i = 1, 2,...,k) (A - α)kzk = 0.
It is also convenient to set zi = 0 for i < 0 or > k. We have already seen that k ≤ νi and that zk lies in the nullspace of (A — α)ν1; and from (17) it is seen that the nullspace of (A — α)ν1 has the basis (xij; j = 1, 2,..., νi, i = 1. 2,...., p).
Since zk belongs to the nullspace of (A — α)ν1, we may set
$(25) \ \ \ \ \ \ \ \ \ z_k = \sum \limits_{i=1}^p \sum \limits_{j=1}^{v_i} \xi_{ij} x_j^i$
and therefore by repeated application of (15) with λi = α
$(26) \ \ \ \ \ \ \ \ \ (A - \alpha)^r z_k = \sum \limits_{i,j} \xi_{ij} x_{j-2}^i$
From this it follows that, in order that (A — α)kzk = 0, only values of j which are less than or equal to k can actually occur in (25) and in order that (A — α)k-1zk ≠ 0 at least one ζik must be different from 0; hence
(27) $z_k = \sum \limits_i (\xi_{ik} x_k^i + \xi_{i,k-1} x_{k-1}^i + ...) \\ z_{k-1} = \sum \limits_i (\xi_{ik} x_{k-1}^i + \xi_{i,k-1} x_{k-2}^i + ...) \\ ........................................................................................\\ z_1 = \sum \limits_i \xi_{ik} x_1^i$
Finally, if we impose the restriction that zk does not belong to any chain pertaining to an exponent greater than k, it is necessary and sufficient that k be one of the numbers ν1, ν2,...., νp and that no value of i corresponding to an exponent greater than k occur in (27).
3.09
The actual determination of the vectors xij can be carried out by the processes of §3.02 and §3.04 or alternatively as follows. Suppose that the first s1 of the exponents νi equal n1, the next s2 equal n2, and so on, and finally the last sq equal nq. Let R1 be the nullspace of (A — α)n1 and R'1 the nullspace of (A — α)n1-1; then R1 contains R'1. If M1 is a space complementary to R'1 in R1, then for any vector x in M1 we have (A — α)rx = 0 only when r ≥ n1. Also, if x1, x2,....., xm1 is a basis of M1, the vectors
(28) (A - α)rxi (r = 0,1,....,n1 - 1)
are linearly independent; for, if
$\sum \limits_{r=a}^{n_1 - 1} \sum \limits_i \xi_{ir} (A - \alpha)^r x_i = 0,$
some ξir being different from 0, then multiplying by (A — α)n1-s-1 we have
$(A - \alpha)^{n_1 - 1} \sum \limits_i \xi_{is} x_i = 0$
which is only possible if every ξis = 0 since x1, x2,...., xm1 form a basis of M1 and for no other vector of M1 is (A — α)n1-1x = 0. The space defined by (28) clearly lies in R1; we shall denote it by ℘1. If we set R1 = R2 + ℘1 where R2 is complementary to ℘1 in R1, then R2 contains all vectors which are members of sets belonging to the exponents n2, n3,.... but not lying in sets with the exponent n1.
We now set R2 = R'2 + M2, where R'2 is the subspace of vectors x in R2 such that (A — α)n1-1x = 0. As before the elements of M2 generate sets with exponent n2 but are not members of sets with higher exponents; and by a repetition of this process we can determine step by step the sets of invariant vectors corresponding to each exponent n1.