Algebraic Operations with Matrices. The Characteristic equation
The following elementary considerations enable us to carry over a number of results of ordinary scalar algebra into the algebra of matrices. Suppose ƒ(λ1, λ2,..., λr), g(λ1, λ2,..., λr) are integral algebraic functions of the scalar variables λi with scalar coefficients, and suppose that
ƒ(λ1, λ2,...., λr) = g(λ1, λ2,...., λr)
is an algebraic identity; then, when ƒ(λ1,...., λr) — g(λ1,...., λr) is reduced to the standard form of a polynomial, the coefficients of the various powers of the λ's are zero. In carrying out this reduction no properties of the λ's are used other than those which state that they obey the laws of scalar multiplication and addition: if then we replace λ1, λ2,..., λr by commutative matrices x1, x2,..., xr, the reduction to the form 0 is still valid step by step and hence
ƒ(x1, x2,..., xr) = g(x1, x2,..., xr).
An elementary example of this is
(1 - x2) = (1 - x)(1 + x)
or, when xy = yx,
x2 - y2 = (x - y)(x + y).
Here, if xy ≠ yx, the reader should notice that the analogue of the algebraic identity becomes
x2 — y2 = x(x + y) — (x + y)y,
which may also be written x2 — y2 = (x — y)(x + y) + (yx — xy).
2.02 Matric polynomials in a scalar variable.
By a matric polynomial in a scalar variable λ is meant a matriλ that can be expressed in the form
(1) P(λ) = p0λr + p1λr+1 + ... + pr (p0 ≠ 0),
where p0, p1, ..., pr are constant matrices. The coordinates of P(λ) are scalar polynomials in λ and hence, if
(2) Q(λ) = q0λs + q1λs-1 + ... + qs (q0 ≠ 0)
is also a matric polynomial, P(λ) = Q(λ) if, and only if, r = s and the coefficients of corresponding powers of λ are equal, that is, pi = qi (i = 1, 2, ..., r). If |q0| ≠ 0, the degree of the product P(λ)Q(λ) (or Q(λ)P(λ)) is eλactly r + s since the coefficient of the highest power λr+s which occurs in the product is p0q0 (or q0p0) which cannot be 0 if p0 ≠ 0 and |q0| ≠ 0. If, however, both | p0| and |q0| are 0, the degree of the product may well be less than r + s, as is seen from the examples
(e11λ + 1)(e22λ + 1) = e11e22λ2 + (e11+ e22)λ + 1 = (e11 + e22)λ + 1,
Another noteworthy difference between matric and scalar polynomials is that, when the determinant of a matric polynomial is a constant different from 0, its inverse is also a matric polynomial: for instance
(e12λ + 1)-1 = -e12λ + 1,
[(e12 + e23)λ + 1]-1 = e13λ2 - (e12 + e23)λ + 1.
We shall call such polynomials elementary polynomials.
2.03 The division transformation.
The greater part of the theory of the division transformation can be extended from ordinary algebra to the algebra of matrices; the main precaution that must be taken is that it must not be assumed that every element of the algebra has an inverse and that due allowance must be made for the peculiarities introduced by the lack of commutativity in multiplication.
Theorem 1. P(λ) and Q(λ) are the polynomials defined by (1) and (2), and if |q0| ≠ 0, there eλist unique polynomials S(λ), R(λ), S1(λ), R1(λ), of which S and S1 if not zero, are of degree r — s and the degrees of R and R1 are s — 1 at'most, such that
P(λ) ≡ S(λ)Q(λ) + R(λ) ≡ Q(λ)S1(λ) + R1(λ).
If r < s, we may take S1 = S = 0 and R1 = R = P; in so far as the existence of these polynomials is concerned the theorem is therefore true in this case. We shall now assume as a basis for a proof by induction that the theorem is true for polynomials of degree less than r and that r ≤ s. Since |q0| ≠ 0, q-10 eλists and, as in ordinary scalar division, we have
P(λ) - p0q-10λr-sQ(λ) = (p1 - p0q-10q1)λr-1 + ... = P1(λ).
Since the degree of P1 is less than r, we have by hypothesis P1(λ) = P2(λ)Q(λ) + R(λ), the degrees of P2 and R being less, respectively, than r — s and s; hence
P(λ) = (p0q-10λr-s + P2(λ))Q(λ) + R(λ) = S(λ)Q(λ) + R(λ)
as required by the theorem. The existence of the right hand quotient and remainder follows in the same way.
It remains to prove the uniqueness of S and R. Suppose, if possible, that P = SQ + R = TQ + U where R and S are as above and T, U are polynomials the degree of U being less than s; then (S — T)Q = U — R. If S — T ≠ 0, then, since |q0| ≠ 0, the degree of the polynomial (S — T)Q is at least as great as that of Q and is therefore greater than the degree of U — R. It follows immediately that S — T = 0, and hence also U — R = 0; which completes the proof of the theorem.
If Q is a scalar polynomial, that is, if its coefficients q are scalars, then S = S1, R = R1 and, if the division is eλact, then Q(λ) is a factor of each of the coordinates of P(λ).
Theorem 2. If the matric polynomial (1) is divided on the right by λ — a, the remainder is
p0ar + p1ar-1 + .... + pr
arp0 + ar-1p1 + .... + pr.
As in ordinary algebra the proof follows immediately from the identity
λs - as = (λ - a)(λs-1 + λs-2a + .... + as-1)
in which the order of the factors is immaterial since λ is a scalar.
If P(λ) is a scalar polynomial, the right and left remainders are the same and are conveniently denoted by P(a).
Theorem 1 of the preceding section holds true as regards the existence of S, S1, R, R1 and the degree of R, R1 even when |q0| = 0 provided |Q(λ)| ≠ 0. Suppose the rank of q0 is t < n; then by §1.10 it has the form or, say, where h and k are non-singular matrices for which hei = αi, k'ei = βi (i = 1, 2, ...., t). If , then
(3) Q1 = (c1λ + 1)h-1Q
is a polynomial whose degree is not higher than the degree s of Q since c1h-1q0 = 0 so that the term in λs+1 is absent. Now, if η = |h-1|, then
|Q1| = |c1λ + 1||h-1||Q| = (1 + λ)n-tη|Q|,
so that the degree of |Q1| is greater than that of |Q| by n — t. If the leading coefficient of Q1 is singular, this process may be repeated, and so on, giving Q1, Q2,...., where the degree of |Qi| is greater than that of |Qi-1|. But the degree of each Qi is less than or equal to s and the degree of the determinant of a polynomial of the sth degree cannot exceed ns. Hence at some stage the leading coefficient of, say, Qj is not singular and, from the law of formation (3) of the successive Q's, we have Qi(λ) = H(λ)Q(λ), where H(λ) is a matric polynomial.
By Theorem 1, Qj taking the place of Q, we can find S* and R, the latter of degree s — 1 at most, such that
P(λ) = S*(λ)H(λ)Q(λ) + R(λ) = S(λ)Q(λ) + R(λ).
The theorem is therefore true even if |q0| = 0 except that the quotient and remainder are not necessarily unique and the degree of S may be greater than r — s, as is shown by taking P = λ2 — 1, Q = e11λ + 1, when we have
P = (e22λ2 + e11λ - 1)Q = (e22λ2 + e11λ - 1 + e12)Q - e12
2.05 The characteristic equation.
If λ is a matrix, the scalar polynomial
(4) ƒ(λ) = |λ - x| = λn + a1λn-1 + ... + an
is called the characteristic function corresponding to x. We have already seen (§1.05 (15)) that the product of a matriλ and its adjoint equals its determinant; hence
(λ - x) adj (λ - x) = |λ - x| = ƒ(λ).
It follows that the polynomial ƒ(λ) is exactly divisible by λ - x so that by the remainder theorem (§2.03, Theorem 2)
(5) ƒ(x) = 0.
As a simple example of this we may take . Here
ƒ(λ - α)(λ - δ) - βγ = λ2 - (α + δ)λ αδ - βγ
The following theorem is an important extension of this result.
Theorem 3. ƒ(λ) = |λ — x| and θ(λ) is the highest common factor of the first minors of |λ — x|, and if
(6) φ(λ) = ƒ(λ)/θ(λ),
the leading coefficient of θ(λ) being 1 (and therefore also that of φ(λ)), then
(i) φ(x) = 0;
(ii) if ψ(λ) is any scalar polynomial such that ψ(x) = 0, then φ(λ) is a factor of ψ(λ), that is, φ(λ) is the scalar polynomial of lowest degree and with leading coefficient 1 such that φ(λ) = 0;
(iii) every foot of ƒ(λ) is a root of φ(λ).
The coordinates of adj(λ — x) are the first minors of |λ — x| and therefore by hypothesis [adj(λ — x)]/θ(λ) is integral; also
hence φ(x) = 0 by the remainder theorem.
If ψ(λ) is any scalar polynomial for which ψ(x) = 0, we can find scalar polynomials M(λ), N(λ) such that M(λ)φ(λ) + N(λ)φ(λ) = ζ(λ), where ζ(λ) is the highest common factor of ψ and φ. Substituting x for λ in this scalar identity and using φ(x) = 0 = ψ(x) we have ζ(λ) = 0; if, therefore, ψ(x) = 0 is a scalar equation of lowest degree satisfied by x, we must have ψ(λ) = ζ(λ), apart from a constant factor, so that ψ(λ) is a factor of φ(λ), say
(7) φ(λ) = h(λ)ψ(λ).
Since ψ(x) = 0, λ — x is a factor of ψ(λ), say ψ(λ) = (λ — x)g(λ), where g is a matric polynomial; hence
and this cannot be integral unless h(λ) is a constant in view of the fact that θ(λ) is the highest common factor of the coordinates of adj(λ — x); it follows that ψ(λ) differs from φ(λ) by at most a constant factor.
A repetition of the first part of this argument shows that, if ψ(x) = 0 is any scalar equation satisfied by x, then φ(λ) is a factor of ψ(λ).
It remains to show that every root of ƒ(λ) is a root of φ(λ). If λ1 is any root of ƒ(λ) = |λ — x|, then from φ(λ) = g(λ)(λ — x) we have
φ(λ1) = g(λ1)(λ1 - x)
so that the determinant, [φ(λ1)]n, of the scalar matrix φ(λ1) equals |g(λ1)| |λ1 — x|, which vanishes since |λ1 — x| = ƒ(λ1). This is only possible if φ(λ1) = 0, that is, if every root of ƒ(λ) is also a root of φ(λ).
The roots of ƒ(λ) are also called the roots of x, φ(λ) is called the reduced characteristic function of x, and φ(x) = 0 the reduced equation of x.
A few simple results are conveniently given at this point although they are for the most part merely particular cases of later theorems. If g(λ) is a scalar polynomial, then on dividing by φ(λ), whose degree we shall denote by ν, we may set g(λ) = q(λ)φ(λ) + r(λ), where q and r are polynomials the degree of r being less than ν. Replacing λ by x in this identity and remembering that φ(x) = 0, we have g(x) = r(x), that is, any polynomial can be replaced by an equivalent polynomial of degree less than ν.
If g(λ) is a scalar polynomial which is a factor of φ(λ), say φ(λ) = h(λ)g(λ), then 0 = φ(x) = h(x)g(x). It follows that |g(x)| = 0; for if this were not so, we should have h(x) = [g(x)]-1φ(x) = 0, whereas x can satisfy no scalar equation of lower degree than φ. Hence, if g(λ) is a scalar polynomial which has a factor in common with φ(x), then g(x) is singular.
If a scalar polynomial g(λ) has no factor in common with φ(λ), there exist scalar polynomials M(λ), N(λ) such that M(λ)g(λ) + N(λ)<p(λ) ≡ 1. Hence M(x)g(x) = 1, or [g(x)]-1 = M(x). It follows immediately that any finite rational function of x with scalar coefficients can be expressed as a scalar polynomial in x of degree ν — 1 at most. It should be noticed carefully however that, if x is a variable matrix, the coefficients of the reduced polynomial will in general contain the variable coordinates of x and will not be integral in these unless the original function is integral. It follows also that g(x) is singular only when g(λ) has a factor in common with φ(λ).
Finally we may notice here that similar matrices have the same reduced equation; for, if g is a scalar polynomial, g(y-1xy) = y-1g(x)y. As a particular case of this we have that xy and yx have the same reduced equation if, say, y is non-singular; for xy = y-1•yx•y. If both x and y are singular, it can be shown that xy and yx have the same characteristic equation, but not necessarily the same reduced equation as is seen from the example x = e12, y = e22
2.07 Matrices with distinct roots.
Because of its importance and comparative simplicity we shall investigate the form of a matrix all of whose roots are different before considering the general case. Let(8) ƒ(λ) = |λ - | = (λ - λ1)(λ - λ2) .... (λ - λn)
where no two roots are equal and set
By the Lagrange interpolation formula hence
(10) ƒ1(x) + ƒ2(x) + .... + ƒn(x) = 1.
Further, ƒ(λ) is a factor of ƒi(λ)ƒj(λ) (i ≠ j) so that
(11) ƒi(x)ƒj(x) = 0 (i ≠ j);
hence multiplying (10) by fi(λ) and using (11) we have
(12) [ƒi(x)]2 = ƒi(x).
Again, (λ - λi)ƒi(λ) = ƒ(λ)/ƒ'(λi); hence (x - λi)ƒi(x) = 0, that is,
(13) xƒi(x) = λiƒi(x),
whence, summing with regard to i and using (10), we have
(14) x = λ1ƒ1(x) + λ2ƒ2(x) + ....... + λnƒn(x).
If we form xr from (14), r being a positive integer, it is immediately seen from (11) and (12), or from the Lagrange interpolation formula, that
(15) xr = λr1ƒ1 + λr2ƒ2 + .... + λrnƒn,
where ƒi stands for ƒi(x), and it is easily verified by actual multiplication that, if no root is 0,
x-1 = λ-11ƒ1 + λ-12ƒ2 + .... + λ-1nƒn
so that (15) holds for negative powers also. The matrices ƒi are linearly independent. For if ∑γiƒi = 0, then
0 = ƒi∑γiƒi = γiƒ2i = γiƒi
whence every γj = 0 seeing that in the case we are considering ƒ(λ) is itself the reduced characteristic function so that ƒj(x) ≠ 0.
From these results we have that, if g(λ) is any scalar rational function whose denominator has no factor in common with φ(λ), then
(16) g(x) = g(λ1)ƒ1 + g(λ2)ƒ2 + .... + g(λn)ƒn.
It follows from this that the roots of g(x) are g(λi) (i = 1, 2,....,n). For setting y = g(x), μi = g(λi), we have as above
ψ(y) = ∑ψ(μi)ƒi,
ψ(λ) being a scalar polynomial. Now ψ(y)ƒi = ψ(μi)ƒi hence, if ψ(y) = 0, then also ψ(μi) = 0 (i = 1, 2,....., n); and conversely. Hence if the notation is so chosen that μ1, μ2,...., μr are the distinct values of μi the reduced characteristic function of y = g(z) is
If the determinant |λ - x| = ƒ(λ) is expanded in powers of λ, it is easily seen that the coefficient ar of λn-r is (—1)r times the sum of the principal minors of x of order r; this coefficient is therefore a homogeneous polynomial of degree r in the coordinates of x. In particular, —a1 is the sum of the coordinates in the main diagonal: this sum is called the trace of x and is denoted by tr x.
If y is an arbitrary matrix, μ a scalar variable, and z = x + μy, the coefficients of the characteristic equation of z, say
(17) zn + b1zn-1 + .... + bn = 0,
are polynomials in μ of the form
(18) bs = as0 + μas1 + .... + μsass, (as0 = as, a00 = 1)
and the powers of z are also polynomials in μ, say
where is obtained by multiplying s x's and t y's together in every possible way and adding the terms so obtained, e.g.,
If we substitute (18) and (19) in (17) and arrange according to powers of μ, then, since μ is an independent variable, the coefficients of its several powers must be zero. This leads to a series of'relations connecting x and y of the form
(s = 0,1,2,...)
where aij are the coefficients defined in (18) and is replaced by 0 when j > s. In particular, if s = 1,
which, when xy = yx, becomes
ƒ'(x)y = -(a11xn-1 + .... + an1) = g(x).
When x has no repeated roots, ƒ'(λ) has no root in common with ƒ(λ) and ƒ'(x) has an inverse (cf. §2.06) so that y = g(x)/ƒ'(x) which can be expressed as a scalar polynomial in x; and conversely every such polynomial is commutative with x. We therefore have the following theorem:
Theorem 4. If x has no multiple roots, the only matrices commutative with it are scalar polynomials in x.
2.09 Matrices with multiple roots.
We shall now extend the main results of §2.07 to matrices whose roots are not necessarily simple. Suppose in the first place that x has only one distinct root and that its reduced characteristic function is φ(λ) = (λ — λ1)ν, and set
ηi1 = ηi = (x - λ1)i = (x - λ1)ηi-1 (i = 1, 2,, ν - 1);
ην1 = 0, xην-1 = λ1ην-1, xηi = λ1ηi + ηi+1 (i = 1, 2,....,ν — 2)
and xs = (λ1 + η1)s = λs1 + sλs-11η1 + + ....
where the binomial expansion is cut short with the term ην-11 since ην1 = 0. Again, if g(λ) is any scalar polynomial, then
It follows immediately that, if g(s)(λ) is the first derivative of g(λ) which is not 0 when λ = λ1 and (k — 1)s < ν ≤ ks, then the reduced equation of g(x) is
[g(x) - g(λ1]k = 0.
It should be noted that the first ν — 1 powers of η1 are linearly independent since φ(λ) is the reduced characteristic function of x.
We shall now suppose that x has more than one root. Let the reduced characteristic function be
(22) hi(λ) = φ(λ)/(λ - λi)νi.
We can determine two scalar polynomials, Mi(λ) and Ni(λ), of degrees not exceeding νi — 1 and ν — νi - 1, respectively, such that
Mi(λ)hi(λ) + (λ - λi)νiNi(λ) ≡ 1, Mi(λi) ≠ 0.
If we set
(23) φi(λ) = Mi(λ)hi(λ),
then 1 — ∑φi(λ) is exactly divisible by φ(λ) and, being of degree ν — 1 at most, must be identically 0; hence
Again, from (22) and (23), φ(λ) is a factor of φi(λ)φj(λ) (i ≠ j) arid hence on multiplying (24) by φi(λ) we have
(25) [φi(λ)]2 ≡ φi(λ), φi(λ)φj(λ) ≡ 0, mod φ(λ) (i ≠ j).
Further, if gr(x) is a scalar polynomial, then
where R has the form ∑Ci(λ)(λ — λi)νiφi(λ), Ci being a polynomial, so that R vanishes when x is substituted for x.
If we put x for λ in (23) and set φi for φi(x), then (24) and (25) show that
It follows as in §2.07 that the matrices φi are linearly independent and none is zero, since φi(λi) ≠ 0 so that φ(λ) is not a factor of φi(lambda;), which would be the case were φi(x) = 0. We now put x for λ in (26) and set
(28) ηi = (x - λi)φi (i = 1, 2,...., r).
Since the νith power of (λ — λi)jφi(λ) is the first which has φ(λ) as a factor, ηi is a nilpotent matrix of index νi (cf. §1.05) and, remembering that φ2i = φi, we have
(29) ηji = (x — λi)jφi ≠ 0 (j < νi), ηiφi = ηi = φiηi,
(30) xφi = λiφi + ηi, xηji = λiηji + ηj+1i
equation (26) therefore becomes
and in particular
(i) xψi = ψix,
(33) (ii) (x - λi)ψi is nilpotent,
then ψi = φi (i = 1, 2,...., r). For let θij = φiψj; from (i) θij also equals ψjφi. From (ii) and (28)
ηi = xφi — λiφi, ξj = xψj - λjψj
are both nilpotent and, since φi and ηi are polynomials in x, they are commutative with ψj and therefore with ξj; also
xθij = λiθij + (x — λi)φiψj = λiθij + ηiψj
= λjθij + (x - λj)φiψj = λjθij + ξjφi.
Hence (λi — λj)θij = ξjφi - ηiψj. But if μ is the greater of the indices of ξj and ηi then, since all the matrices concerned are commutative, each term of (ξjφi — ηiψj)2μ contains ξμj or ημi as a factor and is therefore 0. If θij ≠ 0, this is impossible when i ≠ j since θij a is idempotent and &lambdea;i — λj ≠ 0. Hence φiψj = 0 when i ≠ j and from (iii)
ψj = ψj∑φi = ψjφj = φj∑ψi = φi
which proves the uniqueness of the φ's.
We shall now determine the reduced equation of g(x). If we set gi for g(x)φi, then
= g(λi)φi + ζi,
say, and if si is the order of the first derivative in (34) which is not 0, then ζi is a nilpotent matrix whose index ki is given by ki = 1 < νi/si ≤ ki.
If Φ(λ) is a scalar polynomial, and γi = g(λi),
so that Φ(g(x)) = 0 if, and only if, g(λi) is a root of Φ(λ) of multiplicity ki. Hence, if
Ψ(λ) = Π[λ - g(λi]ki
where when two or more values of i give the same value of g(λi), only that one is to be taken for which ki is greatest, then Ψ(λ) is the reduced characteristic function of g(x). As a part of this result we have the following theorem.
g(λ1), g(λ2),...., g(λr).
If the roots g(λi) are all distinct, the principal idempotent elements of g(x) are the same as those of x; for condition (33) of §2.11 as applied to g(x) are satisfied by φi (i = 1, 2, ...., r), and these conditions were shown to characterize the principal idempotent elements completely.
2.13 The square roof of a matrix.
Although the general question of functions of a matrix will not be taken up till a later chapter, it is convenient to give here one determination of the square root of a matrix x.
If α and β are scalars, α ≠ 0, and (α + β)1/2 is expanded formally in a Taylor series,
then, if , it follows that
(35) S2ν = α + β + αTν
where Tν is a polynomial in β/α which contains no power of β/α lower than the νth. If a and b are commutative matrices and a is the square of a known non-singular matrix a1/2, then (35) being an algebraic identity in α and β remains true when a and b are put in their place.
If xi = λiφi + ηi is the matrix defined in §2.11 (32), then so long as λi ≠ 0, we may set α = λiφi, β = ηi since λiφi = (λ1/2iφi)2; and in this case the Taylor series terminates since ηνii = 0, that is, Tνi = 0 and the square of the terminating series for (λiφi + ηi)1/2 in powers of ηi equals λiφi + ηi. It follows immediately from (32) and (27) that, if x is a matrix no one of whose roots i$ 0, the square of the matrix
If the reduced equation of x has no multiple roots, (36) becomes
(37) x1/2 = ∑λ1/2iφi
and this is valid even if one of the roots is 0. If, however, 0 is a multiple root of the reduced equation, x may have no square root as, for example, the matrix .
Formula (36) gives 2r determinations of x1/2 but we shall see later that an infinity of determinations is possible in certain cases.
2.14 Reducible matrices.
If x = x1 + x2 is the direct sum of x1 and x2 and e1, e2 are the corresponding idempotent elements, that is,
eix = xi = xi, eiej = 0 (i ≠ j; i, j = 1, 2),
then xr = xr1 + xr2 (r ≥ 2) and we may set as before 1 = x0 = x01 + x02 = e1 + e2. Hence, if ƒ(λ) = λm + b1λm-1 + .... + bm is any scalar polynomial, we have
ƒ(x) = e1ƒ(x1) + e2ƒ(x2) = f(x1) + f(x2) - bm,
and if g(λ) is a second scalar polynomial
ƒ(x)g(x) = e1f(x1)g(x1) + e2ƒ(x2)g(x2).
Now if ƒi(λ) is the reduced characteristic function of xi regarded as a matrix in the space determined by ei, then the reduced characteristic function of xi as a matrix in the original fundamental space is clearly λƒi(λ) unless λ is a factor of ƒi(λ) in which case it is simply ƒi(λ). Further the reduced characteristic function of x = x1 + x2 is clearly the least common multiple of ƒ1(λ) and ƒ2(λ); for if
ψ(λ) = ƒ1(λ)g1(λ) = ƒ2(λ)g2(λ)
ψ(x1 + x2) = e1ψ(x1 + e2ψ(x2)
= e1ƒ1(x1)g1(x1) + e2ƒ2(x2)g2(x2) = 0.