# Algebraic Operations with Matrices. The Characteristic equation

### 2.0 Identities.

The following elementary considerations enable us to carry over a number of results of ordinary scalar algebra into the algebra of matrices. Suppose ƒ(λ_{1}, λ_{2},..., λ_{r}), g(λ_{1}, λ_{2},..., λ_{r}) are integral algebraic functions of the scalar variables λ_{i} with scalar coefficients, and suppose that

ƒ(λ_{1}, λ_{2},...., λ_{r}) = g(λ_{1}, λ_{2},...., λ_{r})

is an algebraic identity; then, when ƒ(λ_{1},...., λ_{r}) — g(λ_{1},...., λ_{r}) is reduced to the standard form of a polynomial, the coefficients of the various powers of the λ's are zero. In carrying out this reduction no properties of the λ's are used other than those which state that they obey the laws of scalar multiplication and addition: if then we replace λ_{1}, λ_{2},..., λ_{r} by commutative matrices x_{1}, x_{2},..., x_{r}, the reduction to the form 0 is still valid step by step and hence

ƒ(x_{1}, x_{2},..., x_{r}) = g(x_{1}, x_{2},..., x_{r}).

An elementary example of this is

(1 - x^{2}) = (1 - x)(1 + x)

or, when xy = yx,

x^{2} - y^{2} = (x - y)(x + y).

Here, if xy ≠ yx, the reader should notice that the analogue of the algebraic identity becomes

x^{2} — y^{2} = x(x + y) — (x + y)y,

which may also be written x^{2} — y^{2} = (x — y)(x + y) + (yx — xy).

### 2.02 Matric polynomials in a scalar variable.

By a matric polynomial in a scalar variable λ is meant a matriλ that can be expressed in the form

(1) P(λ) = p_{0}λ^{r} + p_{1}λ^{r+1} + ... + p_{r} (p_{0} ≠ 0),

where p_{0}, p_{1}, ..., p_{r} are constant matrices. The coordinates of P(λ) are scalar polynomials in λ and hence, if

(2) Q(λ) = q_{0}λ^{s} + q_{1}λ^{s-1} + ... + q_{s} (q_{0} ≠ 0)

is also a matric polynomial, P(λ) = Q(λ) if, and only if, r = s and the coefficients of corresponding powers of λ are equal, that is, p_{i} = q_{i} (i = 1, 2, ..., r). If |q_{0}| ≠ 0, the degree of the product P(λ)Q(λ) (or Q(λ)P(λ)) is eλactly r + s since the coefficient of the highest power λ^{r+s} which occurs in the product is p_{0}q_{0} (or q_{0}p_{0}) which cannot be 0 if p_{0} ≠ 0 and |q_{0}| ≠ 0. If, however, both | p_{0}| and |q_{0}| are 0, the degree of the product may well be less than r + s, as is seen from the examples

(e_{11}λ + 1)(e_{22}λ + 1) = e_{11}e_{22}λ^{2} + (e_{11}+ e_{22})λ + 1 = (e_{11} + e_{22})λ + 1,

$\begin{Vmatrix} \lambda & 1 \\ \lambda & 1 \end{Vmatrix} \ \ \begin{Vmatrix} 1 & -1 \\ -\lambda & \lambda \end{Vmatrix} = 0$

Another noteworthy difference between matric and scalar polynomials is that, when the determinant of a matric polynomial is a constant different from 0, its inverse is also a matric polynomial: for instance

(e_{12}λ + 1)^{-1} = -e_{12}λ + 1,

[(e_{12} + e_{23})λ + 1]^{-1} = e_{13}λ^{2} - (e_{12} + e_{23})λ + 1.

We shall call such polynomials elementary polynomials.

### 2.03 The division transformation.

The greater part of the theory of the division transformation can be extended from ordinary algebra to the algebra of matrices; the main precaution that must be taken is that it must not be assumed that every element of the algebra has an inverse and that due allowance must be made for the peculiarities introduced by the lack of commutativity in multiplication.

**Theorem 1. ** * P(λ) and Q(λ) are the polynomials defined by (1) and (2), and if |q _{0}| ≠ 0, there eλist unique polynomials S(λ), R(λ), S_{1}(λ), R_{1}(λ), of which S and S_{1} if not zero, are of degree r — s and the degrees of R and R_{1} are s — 1 at'most, such that
*

P(λ) ≡ S(λ)Q(λ) + R(λ) ≡ Q(λ)S_{1}(λ) + R_{1}(λ).

If r < s, we may take S_{1} = S = 0 and R_{1} = R = P; in so far as the existence of these polynomials is concerned the theorem is therefore true in this case. We shall now assume as a basis for a proof by induction that the theorem is true for polynomials of degree less than r and that r ≤ s. Since |q_{0}| ≠ 0, q^{-1}_{0} eλists and, as in ordinary scalar division, we have

P(λ) - p_{0}q^{-1}_{0}λ^{r-s}Q(λ) = (p_{1} - p_{0}q^{-1}_{0}q_{1})λ^{r-1} + ... = P_{1}(λ).

Since the degree of P_{1} is less than r, we have by hypothesis P_{1}(λ) = P_{2}(λ)Q(λ) + R(λ), the degrees of P_{2} and R being less, respectively, than r — s and s; hence

P(λ) = (p_{0}q^{-1}_{0}λ^{r-s} + P_{2}(λ))Q(λ) + R(λ) = S(λ)Q(λ) + R(λ)

as required by the theorem. The existence of the right hand quotient and remainder follows in the same way.

It remains to prove the uniqueness of S and R. Suppose, if possible, that P = SQ + R = TQ + U where R and S are as above and T, U are polynomials the degree of U being less than s; then (S — T)Q = U — R. If S — T ≠ 0, then, since |q_{0}| ≠ 0, the degree of the polynomial (S — T)Q is at least as great as that of Q and is therefore greater than the degree of U — R. It follows immediately that S — T = 0, and hence also U — R = 0; which completes the proof of the theorem.

If Q is a scalar polynomial, that is, if its coefficients q are scalars, then S = S_{1}, R = R_{1} and, if the division is eλact, then Q(λ) is a factor of each of the coordinates of P(λ).

**Theorem 2. ** * If the matric polynomial (1) is divided on the right by λ — a, the remainder is
p _{0}a^{r} + p_{1}a^{r-1} + .... + p_{r}
*

As in ordinary algebra the proof follows immediately from the identity

λ^{s} - a^{s} = (λ - a)(λ^{s-1} + λ^{s-2}a + .... + a^{s-1})

in which the order of the factors is immaterial since λ is a scalar.

If P(λ) is a scalar polynomial, the right and left remainders are the same and are conveniently denoted by P(a).

### 2.04

Theorem 1 of the preceding section holds true as regards the existence of S, S_{1}, R, R_{1} and the degree of R, R_{1} even when |q_{0}| = 0 provided |Q(λ)| ≠ 0. Suppose the rank of q_{0} is t < n; then by §1.10 it has the form $\sum\limits_{1}^{t} \alpha_i S \beta_i$ or, say, $h(\sum\limits_1^t e_{ii})k$ where h and k are non-singular matrices for which he_{i} = α_{i}, k'e_{i} = β_{i} (i = 1, 2, ...., t). If $c_1 = \sum \limits_{t+1}^n e_{ii}$, then

(3) Q_{1} = (c_{1}λ + 1)h^{-1}Q

is a polynomial whose degree is not higher than the degree s of Q since c_{1}h^{-1}q_{0} = 0 so that the term in λ^{s+1} is absent. Now, if η = |h^{-1}|, then

|Q_{1}| = |c_{1}λ + 1||h^{-1}||Q| = (1 + λ)^{n-t}η|Q|,

so that the degree of |Q_{1}| is greater than that of |Q| by n — t. If the leading coefficient of Q_{1} is singular, this process may be repeated, and so on, giving Q_{1}, Q_{2},...., where the degree of |Q_{i}| is greater than that of |Q_{i-1}|. But the degree of each Q_{i} is less than or equal to s and the degree of the determinant of a polynomial of the sth degree cannot exceed ns. Hence at some stage the leading coefficient of, say, Q_{j} is not singular and, from the law of formation (3) of the successive Q's, we have Q_{i}(λ) = H(λ)Q(λ), where H(λ) is a matric polynomial.

By Theorem 1, Q_{j} taking the place of Q, we can find S* and R, the latter of degree s — 1 at most, such that

P(λ) = S*(λ)H(λ)Q(λ) + R(λ) = S(λ)Q(λ) + R(λ).

The theorem is therefore true even if |q_{0}| = 0 except that the quotient and remainder are not necessarily unique and the degree of S may be greater than r — s, as is shown by taking P = λ^{2} — 1, Q = e_{11}λ + 1, when we have

P = (e_{22}λ^{2} + e_{11}λ - 1)Q = (e_{22}λ^{2} + e_{11}λ - 1 + e_{12})Q - e_{12}

### 2.05 The characteristic equation.

If λ is a matrix, the scalar polynomial

(4) ƒ(λ) = |λ - x| = λ^{n} + a_{1}λ^{n-1} + ... + a_{n}

is called the *characteristic function* corresponding to x. We have already seen (§1.05 (15)) that the product of a matriλ and its adjoint equals its determinant; hence

(λ - x) adj (λ - x) = |λ - x| = ƒ(λ).

It follows that the polynomial ƒ(λ) is exactly divisible by λ - x so that by the remainder theorem (§2.03, Theorem 2)

(5) ƒ(x) = 0.

As a simple example of this we may take $x = \begin{Vmatrix} \alpha & \beta \\ \gamma & \delta \end{Vmatrix}$. Here

ƒ(λ - α)(λ - δ) - βγ = λ^{2} - (α + δ)λ αδ - βγ

and

$f(x) = \begin{Vmatrix} \alpha^2 + \beta \gamma & \alpha \beta + \beta \delta \\ \gamma \alpha + \delta \gamma & \gamma \beta + \delta^2 \end{Vmatrix} - (\alpha + \delta) \begin{Vmatrix} \alpha & \beta \\ \gamma & \delta \end{Vmatrix} + (\alpha\delta - \beta\gamma) \begin{Vmatrix} 1 & 0 \\ 0 & 1 \end{Vmatrix} = 0$

The following theorem is an important extension of this result.

**Theorem 3. ** * ƒ(λ) = |λ — x| and θ(λ) is the highest common factor of the first minors of |λ — x|, and if
(6) φ(λ) = ƒ(λ)/θ(λ),
the leading coefficient of θ(λ) being 1 (and therefore also that of φ(λ)), then
(i) φ(x) = 0;
(ii) if ψ(λ) is any scalar polynomial such that ψ(x) = 0, then φ(λ) is a factor of ψ(λ), that is, φ(λ) is the scalar polynomial of lowest degree and with leading coefficient 1 such that φ(λ) = 0;
(iii) every foot of ƒ(λ) is a root of φ(λ). *

The coordinates of adj(λ — x) are the first minors of |λ — x| and therefore by hypothesis [adj(λ — x)]/θ(λ) is integral; also

$\frac{adj(\lambda - x)}{\theta(\lambda)} \ (\lambda - x) = \frac{f(\lambda)}{\theta(\lambda)} = \phi(\lambda);$

hence φ(x) = 0 by the remainder theorem.

If ψ(λ) is any scalar polynomial for which ψ(x) = 0, we can find scalar polynomials M(λ), N(λ) such that M(λ)φ(λ) + N(λ)φ(λ) = ζ(λ), where ζ(λ) is the highest common factor of ψ and φ. Substituting x for λ in this scalar identity and using φ(x) = 0 = ψ(x) we have ζ(λ) = 0; if, therefore, ψ(x) = 0 is a scalar equation of lowest degree satisfied by x, we must have ψ(λ) = ζ(λ), apart from a constant factor, so that ψ(λ) is a factor of φ(λ), say

(7) φ(λ) = h(λ)ψ(λ).

Since ψ(x) = 0, λ — x is a factor of ψ(λ), say ψ(λ) = (λ — x)g(λ), where g is a matric polynomial; hence

$\psi(\lambda) =\frac{\phi(\lambda)}{h(\lambda)} = \frac{f(\lambda)}{h(\lambda)\theta(\lambda)} = (\lambda - x)g(\lambda)$

Hence

$g(\lambda) = \frac{f(\lambda)}{\theta(\lambda) h(\lambda) (\lambda - x)} = \frac{adj(\lambda - x)}{\theta(\lambda)h(\lambda)}$

and this cannot be integral unless h(λ) is a constant in view of the fact that θ(λ) is the highest common factor of the coordinates of adj(λ — x); it follows that ψ(λ) differs from φ(λ) by at most a constant factor.

A repetition of the first part of this argument shows that, if ψ(x) = 0 is any scalar equation satisfied by x, then φ(λ) is a factor of ψ(λ).

It remains to show that every root of ƒ(λ) is a root of φ(λ). If λ_{1} is any root of ƒ(λ) = |λ — x|, then from φ(λ) = g(λ)(λ — x) we have

φ(λ_{1}) = g(λ_{1})(λ_{1} - x)

so that the determinant, [φ(λ_{1})]^{n}, of the scalar matrix φ(λ_{1}) equals |g(λ_{1})| |λ_{1} — x|, which vanishes since |λ_{1} — x| = ƒ(λ_{1}). This is only possible if φ(λ_{1}) = 0, that is, if every root of ƒ(λ) is also a root of φ(λ).

The roots of ƒ(λ) are also called the *roots* of x, φ(λ) is called the *reduced characteristic function* of x, and φ(x) = 0 the *reduced equation* of x.

### 2.06

A few simple results are conveniently given at this point although they are for the most part merely particular cases of later theorems. If g(λ) is a scalar polynomial, then on dividing by φ(λ), whose degree we shall denote by ν, we may set g(λ) = q(λ)φ(λ) + r(λ), where q and r are polynomials the degree of r being less than ν. Replacing λ by x in this identity and remembering that φ(x) = 0, we have g(x) = r(x), that is, any polynomial can be replaced by an equivalent polynomial of degree less than ν.

If g(λ) is a scalar polynomial which is a factor of φ(λ), say φ(λ) = h(λ)g(λ), then 0 = φ(x) = h(x)g(x). It follows that |g(x)| = 0; for if this were not so, we should have h(x) = [g(x)]^{-1}φ(x) = 0, whereas x can satisfy no scalar equation of lower degree than φ. Hence, if g(λ) is a scalar polynomial which has a factor in common with φ(x), then g(x) is singular.

If a scalar polynomial g(λ) has no factor in common with φ(λ), there exist scalar polynomials M(λ), N(λ) such that M(λ)g(λ) + N(λ)<p(λ) ≡ 1. Hence M(x)g(x) = 1, or [g(x)]^{-1} = M(x). It follows immediately that any finite rational function of x with scalar coefficients can be expressed as a scalar polynomial in x of degree ν — 1 at most. It should be noticed carefully however that, if x is a variable matrix, the coefficients of the reduced polynomial will in general contain the variable coordinates of x and will not be integral in these unless the original function is integral. It follows also that g(x) is singular only when g(λ) has a factor in common with φ(λ).

Finally we may notice here that similar matrices have the same reduced equation; for, if g is a scalar polynomial, g(y^{-1}xy) = y^{-1}g(x)y. As a particular case of this we have that xy and yx have the same reduced equation if, say, y is non-singular; for xy = y^{-1}•yx•y. If both x and y are singular, it can be shown that xy and yx have the same characteristic equation, but not necessarily the same reduced equation as is seen from the example x = e_{12}, y = e_{22}

### 2.07 Matrices with distinct roots.

Because of its importance and comparative simplicity we shall investigate the form of a matrix all of whose roots are different before considering the general case. Let

(8) ƒ(λ) = |λ - | = (λ - λ_{1})(λ - λ

_{2}) .... (λ - λ

_{n})

where no two roots are equal and set

$(9) \ \ \ \ \ \ \ f_i (\lambda) = \frac{(\lambda - \lambda_1) ... (\lambda - \lambda_{i -1})(\lambda - \lambda_{i + 1}) ...(\lambda - \lambda_n)}{(\lambda_i - \lambda_1) ... (\lambda_i - \lambda_{i -1})(\lambda_i - \lambda_{i + 1}) ...(\lambda_i - \lambda_n)} = \frac{f(\lambda)f'(\lambda_i)'}{(\lambda - \lambda_i)}$

By the Lagrange interpolation formula $\sum \limits_i f_i (\lambda) = 1;$ hence

(10) ƒ

_{1}(x) + ƒ

_{2}(x) + .... + ƒ

_{n}(x) = 1.

Further, ƒ(λ) is a factor of ƒ

_{i}(λ)ƒ

_{j}(λ) (i ≠ j) so that

(11) ƒ

_{i}(x)ƒ

_{j}(x) = 0 (i ≠ j);

hence multiplying (10) by f

_{i}(λ) and using (11) we have

(12) [ƒ

_{i}(x)]

^{2}= ƒ

_{i}(x).

Again, (λ - λ

_{i})ƒ

_{i}(λ) = ƒ(λ)/ƒ'(λ

_{i}); hence (x - λ

_{i})ƒ

_{i}(x) = 0, that is,

(13) xƒ

_{i}(x) = λ

_{i}ƒ

_{i}(x),

whence, summing with regard to i and using (10), we have

(14) x = λ

_{1}ƒ

_{1}(x) + λ

_{2}ƒ

_{2}(x) + ....... + λ

_{n}ƒ

_{n}(x).

If we form x^{r} from (14), r being a positive integer, it is immediately seen from (11) and (12), or from the Lagrange interpolation formula, that

(15) x^{r} = λ^{r}_{1}ƒ_{1} + λ^{r}_{2}ƒ_{2} + .... + λ^{r}_{n}ƒ_{n},

where ƒ_{i} stands for ƒ_{i}(x), and it is easily verified by actual multiplication that, if no root is 0,

x^{-1} = λ^{-1}_{1}ƒ_{1} + λ^{-1}_{2}ƒ_{2} + .... + λ^{-1}_{n}ƒ_{n}

so that (15) holds for negative powers also. The matrices ƒ_{i} are linearly independent. For if ∑γ_{i}ƒ_{i} = 0, then

0 = ƒ_{i}∑γ_{i}ƒ_{i} = γ_{i}ƒ^{2}_{i} = γ_{i}ƒ_{i}

whence every γ_{j} = 0 seeing that in the case we are considering ƒ(λ) is itself the reduced characteristic function so that ƒ_{j}(x) ≠ 0.

From these results we have that, if g(λ) is any scalar rational function whose denominator has no factor in common with φ(λ), then

(16) g(x) = g(λ_{1})ƒ_{1} + g(λ_{2})ƒ_{2} + .... + g(λ_{n})ƒ_{n}.

It follows from this that the roots of g(x) are g(λ_{i}) (i = 1, 2,....,n). For setting y = g(x), μ_{i} = g(λ_{i}), we have as above

ψ(y) = ∑ψ(μ_{i})ƒ_{i},

ψ(λ) being a scalar polynomial. Now ψ(y)ƒ_{i} = ψ(μ_{i})ƒ_{i} hence, if ψ(y) = 0, then also ψ(μ_{i}) = 0 (i = 1, 2,....., n); and conversely. Hence if the notation is so chosen that μ_{1}, μ_{2},...., μ_{r} are the distinct values of μ_{i} the reduced characteristic function of y = g(z) is $\prod \limits_1^r (\lambda - \mu_i)$

If the determinant |λ - x| = ƒ(λ) is expanded in powers of λ, it is easily seen that the coefficient a_{r} of λ^{n-r} is (—1)^{r} times the sum of the principal minors of x of order r; this coefficient is therefore a homogeneous polynomial of degree r in the coordinates of x. In particular, —a_{1} is the sum of the coordinates in the main diagonal: this sum is called the *trace* of x and is denoted by tr x.

If y is an arbitrary matrix, μ a scalar variable, and z = x + μy, the coefficients of the characteristic equation of z, say

(17) z^{n} + b_{1}z^{n-1} + .... + b_{n} = 0,

are polynomials in μ of the form

(18) b_{s} = a_{s0} + μa_{s1} + .... + μ^{s}a_{ss}, (a_{s0} = a_{s}, a_{00} = 1)

and the powers of z are also polynomials in μ, say

$(19) \ \ \ \ \ \ \ \ z^r = x^r + \mu \begin{Bmatrix} x & y \\ r-1 & 1 \end{Bmatrix} + \mu^2 \begin{Bmatrix} x & y \\ r-2 & 2 \end{Bmatrix} + ... + \mu^r y^r$

where $\begin{Bmatrix} x & y \\ 8 & t \end{Bmatrix}$ is obtained by multiplying s x's and t y's together in every possible way and adding the terms so obtained, e.g.,

$\begin{Bmatrix} x & y \\ 2 & t \end{Bmatrix} = x^2 y + xyx + yx^2$

If we substitute (18) and (19) in (17) and arrange according to powers of μ, then, since μ is an independent variable, the coefficients of its several powers must be zero. This leads to a series of'relations connecting x and y of the form

$(20) \ \ \ \ \ \ \ \ \ \sum \limits_{i,j} a_{ij} \begin{Bmatrix} x & y \\ n - 8 - i + j & s - j \end{Bmatrix} = 0$ (s = 0,1,2,...)

where a_{ij} are the coefficients defined in (18) and $\begin{Bmatrix} x & y \\ n - 8 - i + j & s - j \end{Bmatrix}$ is replaced by 0 when j > s. In particular, if s = 1,

$\begin{Bmatrix} x & y \\ n - 1 & 1 \end{Bmatrix} + a_1 \begin{Bmatrix} x & y \\ n - 2 & 1 \end{Bmatrix} + ... + a_{n-1}y + a_{11}x^{n-1} + ... + a_{n1} = 0$

which, when xy = yx, becomes

ƒ'(x)y = -(a_{11}x^{n-1} + .... + a_{n1}) = g(x).

When x has no repeated roots, ƒ'(λ) has no root in common with ƒ(λ) and ƒ'(x) has an inverse (cf. §2.06) so that y = g(x)/ƒ'(x) which can be expressed as a scalar polynomial in x; and conversely every such polynomial is commutative with x. We therefore have the following theorem:

**Theorem 4. ** * If x has no multiple roots, the only matrices commutative with it are scalar polynomials in x.*

### 2.09 Matrices with multiple roots.

We shall now extend the main results of §2.07 to matrices whose roots are not necessarily simple. Suppose in the first place that x has only one distinct root and that its reduced characteristic function is φ(λ) = (λ — λ_{1})^{ν}, and set

η^{i}_{1} = η_{i} = (x - λ_{1})^{i} = (x - λ_{1})η_{i-1} (i = 1, 2,, ν - 1);

then

η^{ν}_{1} = 0, xη_{ν-1} = λ_{1}η_{ν-1}, xη_{i} = λ_{1}η_{i} + η_{i+1} (i = 1, 2,....,ν — 2)

and
x^{s} = (λ_{1} + η_{1})^{s} = λ^{s}_{1} + sλ^{s-1}_{1}η_{1} + ${s \choose 2}$ + ....

where the binomial expansion is cut short with the term η^{ν-1}_{1} since η^{ν}_{1} = 0. Again, if g(λ) is any scalar polynomial, then

$g(x) = g(\lambda_1 + \eta_1) = g(\lambda_1) + g'(\lambda_1) \eta_1 + ... + \frac{g^{(v-1)}(\lambda_1) }{(v-1)!} \eta_1^{(v-1)}$

It follows immediately that, if g^{(s)}(λ) is the first derivative of g(λ) which is not 0 when λ = λ_{1} and (k — 1)s < ν ≤ ks, then the reduced equation of g(x) is

[g(x) - g(λ_{1}]^{k} = 0.

It should be noted that the first ν — 1 powers of η_{1} are linearly independent since φ(λ) is the reduced characteristic function of x.

### 2.10

We shall now suppose that x has more than one root. Let the reduced characteristic function be

$(21) \ \ \ \ \ \ \varphi(\lambda) = \prod \limits_{i=1}^r (\lambda - \lambda_i)^{v_i} \ \ \ \ \ (\sum v_i = v, \ r > 1)$

and set

(22) h_{i}(λ) = φ(λ)/(λ - λ_{i})^{νi}.

We can determine two scalar polynomials, M_{i}(λ) and N_{i}(λ), of degrees not exceeding ν_{i} — 1 and ν — ν_{i} - 1, respectively, such that

M_{i}(λ)h_{i}(λ) + (λ - λ_{i})^{νi}N_{i}(λ) ≡ 1, M_{i}(λ_{i}) ≠ 0.

If we set

(23) φ_{i}(λ) = M_{i}(λ)h_{i}(λ),

then 1 — ∑φ_{i}(λ) is exactly divisible by φ(λ) and, being of degree ν — 1 at most, must be identically 0; hence

(24) $\sum \limits_1^r \varphi_i (\lambda) = 1$

Again, from (22) and (23), φ(λ) is a factor of φ_{i}(λ)φ_{j}(λ) (i ≠ j) arid hence on multiplying (24) by φ_{i}(λ) we have

(25) [φ_{i}(λ)]^{2} ≡ φ_{i}(λ), φ_{i}(λ)φ_{j}(λ) ≡ 0, mod φ(λ) (i ≠ j).

Further, if gr(x) is a scalar polynomial, then

$(26) \ \ \ \ \ \ \ \ g(\lambda) = \sum \limits_1^r g(\lambda) \varphi_i (\lambda)$

$= \sum \limits_1^r [g(\lambda_i) + g'(\lambda_i)(\lambda - \lambda_i) + ... + \frac{g^{(v_i-1)}(\lambda_i)}{(v_i-1)!}(\lambda - \lambda_i)^{(v_i-1)}] \varphi_i(\lambda) + R$

where R has the form ∑C_{i}(λ)(λ — λ_{i})^{νi}φ_{i}(λ), C_{i} being a polynomial, so that R vanishes when x is substituted for x.

### 2.11

If we put x for λ in (23) and set φ_{i} for φ_{i}(x), then (24) and (25) show that

$(27) \ \ \ \ \ \ \ \ \varphi_i^2 = \varphi_i, \ \varphi_i \varphi_j = 0, \ \ (i \neq j) \ \ \sum \limits_1^r \varphi_i (\lambda) = 1$

It follows as in §2.07 that the matrices φ_{i} are linearly independent and none is zero, since φ_{i}(λ_{i}) ≠ 0 so that φ(λ) is not a factor of φ_{i}(lambda;), which would be the case were φ_{i}(x) = 0. We now put x for λ in (26) and set

(28) η_{i} = (x - λ_{i})φ_{i} (i = 1, 2,...., r).

Since the ν_{i}th power of (λ — λ_{i})^{j}φ_{i}(λ) is the first which has φ(λ) as a factor, η_{i} is a nilpotent matrix of index ν_{i} (cf. §1.05) and, remembering that φ^{2}_{i} = φ_{i}, we have

(29) η^{j}_{i} = (x — λ_{i})^{j}φ_{i} ≠ 0 (j < ν_{i}), η_{i}φ_{i} = η_{i} = φ_{i}η_{i},

(30) xφ_{i} = λ_{i}φ_{i} + η_{i}, xη^{j}_{i} = λ_{i}η^{j}_{i} + η^{j+1}_{i}

equation (26) therefore becomes

(31) $g(x) = \sum \limits_1^r [g(\lambda_i)\varphi_i + g'(\lambda_i) \eta_i + ... + \frac{g^{(v_i-1)}(\lambda_i) }{(v_i-1)!} \eta_i^{(v_i-1)}]$

and in particular

$(32) \ \ \ \ \ \ x = \sum \limits_1^r (\lambda_i \varphi_i + \eta_i) = \sum x_i$

_{i}and η

_{i}are called the

*principal idempotent*and

*nilpotent elements*of x corresponding to the root λ

_{i}. The matrices φ

_{i}are uniquely determined by the following conditions: if ψ

_{i}(i = 1, 2,...., r) are any matrices such that

(i) xψ

_{i}= ψ

_{i}x,

(33) (ii) (x - λ

_{i})ψ

_{i}is nilpotent,

(iii) $\sum \limits_i \psi_i = 1, \ \ \ \psi_i^2 = \psi_i \neq 0$,

then ψ

_{i}= φ

_{i}(i = 1, 2,...., r). For let θ

_{ij}= φ

_{i}ψ

_{j}; from (i) θ

_{ij}also equals ψ

_{j}φ

_{i}. From (ii) and (28)

η

_{i}= xφ

_{i}— λ

_{i}φ

_{i}, ξ

_{j}= xψ

_{j}- λ

_{j}ψ

_{j}

are both nilpotent and, since φ

_{i}and η

_{i}are polynomials in x, they are commutative with ψ

_{j}and therefore with ξ

_{j}; also

xθ

_{ij}= λ

_{i}θ

_{ij}+ (x — λ

_{i})φ

_{i}ψ

_{j}= λ

_{i}θ

_{ij}+ η

_{i}ψ

_{j}

= λ

_{j}θ

_{ij}+ (x - λ

_{j})φ

_{i}ψ

_{j}= λ

_{j}θ

_{ij}+ ξ

_{j}φ

_{i}.

Hence (λ

_{i}— λ

_{j})θ

_{ij}= ξ

_{j}φ

_{i}- η

_{i}ψ

_{j}. But if μ is the greater of the indices of ξ

_{j}and η

_{i}then, since all the matrices concerned are commutative, each term of (ξ

_{j}φ

_{i}— η

_{i}ψ

_{j})

^{2μ}contains ξ

^{μ}

_{j}or η

^{μ}

_{i}as a factor and is therefore 0. If θ

_{ij}≠ 0, this is impossible when i ≠ j since θ

_{ij}a is idempotent and &lambdea;

_{i}— λ

_{j}≠ 0. Hence φ

_{i}ψ

_{j}= 0 when i ≠ j and from (iii)

ψ

_{j}= ψ

_{j}∑φ

_{i}= ψ

_{j}φ

_{j}= φ

_{j}∑ψ

_{i}= φ

_{i}

which proves the uniqueness of the φ's.

### 2.12

We shall now determine the reduced equation of g(x). If we set g_{i} for g(x)φ_{i}, then

$(34) \ \ \ \ \ \ \ \ g_i = g(\lambda_i)\varphi_i + g'(\lambda_i) \eta_i + ... + \frac{g^{(v_i-1)}(\lambda_i) }{(v_i-1)!} \eta_i^{(v_i-1)}$

= g(λ_{i})φ_{i} + ζ_{i},

say, and if s_{i} is the order of the first derivative in (34) which is not 0, then ζ_{i} is a nilpotent matrix whose index k_{i} is given by k_{i} = 1 < ν_{i}/s_{i} ≤ k_{i}.

If Φ(λ) is a scalar polynomial, and γ_{i} = g(λ_{i}),

$\Phi(g(x)) = \sum \Phi(g_i)\varphi_i = \sum [\Phi(\gamma_i)\varphi_i + \Phi'(\gamma_i)\xi_i + ... + \frac{\Phi^{(k_i-1)}(\gamma_i) }{(k_i-1)!} \xi_i^{(k_i-1)}]$

so that Φ(g(x)) = 0 if, and only if, g(λ_{i}) is a root of Φ(λ) of multiplicity k_{i}. Hence, if

Ψ(λ) = Π[λ - g(λ_{i}]^{ki}

where when two or more values of i give the same value of g(λ_{i}), only that one is to be taken for which k_{i} is greatest, then Ψ(λ) is the reduced characteristic function of g(x). As a part of this result we have the following theorem.

**Theorem 5.**

*If g(λ) is a scalar polynomial and x a matrix whose distinct roots are λ*

g(λ

_{1}, λ_{2},...., λ_{r}, the roots of the matrix g(x) areg(λ

_{1}), g(λ_{2}),...., g(λ_{r}).
If the roots g(λ_{i}) are all distinct, the principal idempotent elements of g(x) are the same as those of x; for condition (33) of §2.11 as applied to g(x) are satisfied by φ_{i} (i = 1, 2, ...., r), and these conditions were shown to characterize the principal idempotent elements completely.

### 2.13 The square roof of a matrix.

Although the general question of functions of a matrix will not be taken up till a later chapter, it is convenient to give here one determination of the square root of a matrix x.

If α and β are scalars, α ≠ 0, and (α + β)^{1/2} is expanded formally in a Taylor series,

$(\alpha + \beta)^{\frac{1}{2}} = \alpha^{\frac{1}{2}} \sum \limits_0^r \delta_r (\frac{\beta}{\alpha})^r$

then, if $S_v = \alpha^{\frac{1}{2}} \alpha^{\frac{1}{2}} \sum \limits_0^{r-1} \delta_r (\frac{\beta}{\alpha})^r$, it follows that

(35) S^{2}_{ν} = α + β + αT_{ν}

where T_{ν} is a polynomial in β/α which contains no power of β/α lower than the νth. If a and b are commutative matrices and a is the square of a known non-singular matrix a^{1/2}, then (35) being an algebraic identity in α and β remains true when a and b are put in their place.

If x_{i} = λ_{i}φ_{i} + η_{i} is the matrix defined in §2.11 (32), then so long as λ_{i} ≠ 0, we may set α = λ_{i}φ_{i}, β = η_{i} since λ_{i}φ_{i} = (λ^{1/2}_{i}φ_{i})^{2}; and in this case the Taylor series terminates since η^{νi}_{i} = 0, that is, T_{νi} = 0 and the square of the terminating series for (λ_{i}φ_{i} + η_{i})^{1/2} in powers of η_{i} equals λ_{i}φ_{i} + η_{i}. It follows immediately from (32) and (27) that, if x is a matrix no one of whose roots is 0, the square of the matrix

$(36) x^{\frac{1}{2}} = \sum \limits_1^r \lambda_i^{\frac{1}{2}} [ \varphi_i + \frac{1}{2} \lambda_i^{-1} \eta_i - ... + (-1)^{(v_i - 2)} \frac{(2v_i - 4)!}{2^{(2v_i - 3)}(v_i - 2)!(v_i - 1)!}(\frac{\eta_i}{\lambda_i})^{v_i - 1} ]$

is x.

If the reduced equation of x has no multiple roots, (36) becomes

(37) x^{1/2} = ∑λ^{1/2}_{i}φ_{i}

and this is valid even if one of the roots is 0. If, however, 0 is a multiple root of the reduced equation, x may have no square root as, for example, the matrix $\begin{Vmatrix} 0 & 1 \\ 0 & 0 \end{Vmatrix}$.

Formula (36) gives 2^{r} determinations of x^{1/2} but we shall see later that an infinity of determinations is possible in certain cases.

### 2.14 Reducible matrices.

If x = x_{1} + x_{2} is the direct sum of x_{1} and x_{2} and e_{1}, e_{2} are the corresponding idempotent elements, that is,

e_{i}x = x_{i} = xi, e_{i}e_{j} = 0 (i ≠ j; i, j = 1, 2),

then x^{r} = x^{r}_{1} + x^{r}_{2} (r ≥ 2) and we may set as before 1 = x^{0} = x^{0}_{1} + x^{0}_{2} = e_{1} + e_{2}. Hence, if ƒ(λ) = λ^{m} + b_{1}λ^{m-1} + .... + b_{m} is any scalar polynomial, we have

ƒ(x) = e_{1}ƒ(x_{1}) + e_{2}ƒ(x_{2}) = f(x_{1}) + f(x_{2}) - b_{m},

and if g(λ) is a second scalar polynomial

ƒ(x)g(x) = e_{1}f(x_{1})g(x_{1}) + e_{2}ƒ(x_{2})g(x_{2}).

Now if ƒ_{i}(λ) is the reduced characteristic function of x_{i} regarded as a matrix in the space determined by e_{i}, then the reduced characteristic function of x_{i} as a matrix in the original fundamental space is clearly λƒ_{i}(λ) unless λ is a factor of ƒ_{i}(λ) in which case it is simply ƒ_{i}(λ). Further the reduced characteristic function of x = x_{1} + x_{2} is clearly the least common multiple of ƒ_{1}(λ) and ƒ_{2}(λ); for if

ψ(λ) = ƒ_{1}(λ)g_{1}(λ) = ƒ_{2}(λ)g_{2}(λ)

then

ψ(x_{1} + x_{2}) = e_{1}ψ(x_{1} + e_{2}ψ(x_{2})

= e_{1}ƒ_{1}(x_{1})g_{1}(x_{1}) + e_{2}ƒ_{2}(x_{2})g_{2}(x_{2}) = 0.