Algebraic Operations with Matrices. The Characteristic equation

2.0 Identities.

The following elementary considerations enable us to carry over a number of results of ordinary scalar algebra into the algebra of matrices. Suppose ƒ(λ₁, λ₂,..., λ_r), g(λ₁, λ₂,..., λ_r) are integral algebraic functions of the scalar variables λ_i with scalar coefficients, and suppose that
ƒ(λ₁, λ₂,...., λ_r) = g(λ₁, λ₂,...., λ_r)
is an algebraic identity; then, when ƒ(λ₁,...., λ_r) — g(λ₁,...., λ_r) is reduced to the standard form of a polynomial, the coefficients of the various powers of the λ's are zero. In carrying out this reduction no properties of the λ's are used other than those which state that they obey the laws of scalar multiplication and addition: if then we replace λ₁, λ₂,..., λ_r by commutative matrices x₁, x₂,..., x_r, the reduction to the form 0 is still valid step by step and hence
ƒ(x₁, x₂,..., x_r) = g(x₁, x₂,..., x_r).

An elementary example of this is
         (1 - x²) = (1 - x)(1 + x)
or, when xy = yx,
         x² - y² = (x - y)(x + y).
Here, if xy ≠ yx, the reader should notice that the analogue of the algebraic identity becomes
         x² — y² = x(x + y) — (x + y)y,
which may also be written x² — y² = (x — y)(x + y) + (yx — xy).

2.02 Matric polynomials in a scalar variable.

By a matric polynomial in a scalar variable λ is meant a matriλ that can be expressed in the form
(1) P(λ) = p₀λ^r + p₁λ^r+1 + ... + p_r (p₀ ≠ 0),
where p₀, p₁, ..., p_r are constant matrices. The coordinates of P(λ) are scalar polynomials in λ and hence, if
(2) Q(λ) = q₀λ^s + q₁λ^s-1 + ... + q_s (q₀ ≠ 0)
is also a matric polynomial, P(λ) = Q(λ) if, and only if, r = s and the coefficients of corresponding powers of λ are equal, that is, p_i = q_i (i = 1, 2, ..., r). If |q₀| ≠ 0, the degree of the product P(λ)Q(λ) (or Q(λ)P(λ)) is eλactly r + s since the coefficient of the highest power λ^r+s which occurs in the product is p₀q₀ (or q₀p₀) which cannot be 0 if p₀ ≠ 0 and |q₀| ≠ 0. If, however, both | p₀| and |q₀| are 0, the degree of the product may well be less than r + s, as is seen from the examples
(e₁₁λ + 1)(e₂₂λ + 1) = e₁₁e₂₂λ² + (e₁₁+ e₂₂)λ + 1 = (e₁₁ + e₂₂)λ + 1,
$\begin{Vmatrix} \lambda & 1 \\ \lambda & 1 \end{Vmatrix} \ \ \begin{Vmatrix} 1 & -1 \\ -\lambda & \lambda \end{Vmatrix} = 0$

Another noteworthy difference between matric and scalar polynomials is that, when the determinant of a matric polynomial is a constant different from 0, its inverse is also a matric polynomial: for instance
(e₁₂λ + 1)^-1 = -e₁₂λ + 1,
[(e₁₂ + e₂₃)λ + 1]^-1 = e₁₃λ² - (e₁₂ + e₂₃)λ + 1.
We shall call such polynomials elementary polynomials.

2.03 The division transformation.

The greater part of the theory of the division transformation can be extended from ordinary algebra to the algebra of matrices; the main precaution that must be taken is that it must not be assumed that every element of the algebra has an inverse and that due allowance must be made for the peculiarities introduced by the lack of commutativity in multiplication.

Theorem 1. P(λ) and Q(λ) are the polynomials defined by (1) and (2), and if |q₀| ≠ 0, there eλist unique polynomials S(λ), R(λ), S₁(λ), R₁(λ), of which S and S₁ if not zero, are of degree r — s and the degrees of R and R₁ are s — 1 at'most, such that
P(λ) ≡ S(λ)Q(λ) + R(λ) ≡ Q(λ)S₁(λ) + R₁(λ).

If r < s, we may take S₁ = S = 0 and R₁ = R = P; in so far as the existence of these polynomials is concerned the theorem is therefore true in this case. We shall now assume as a basis for a proof by induction that the theorem is true for polynomials of degree less than r and that r ≤ s. Since |q₀| ≠ 0, q^-1₀ eλists and, as in ordinary scalar division, we have
P(λ) - p₀q^-1₀λ^r-sQ(λ) = (p₁ - p₀q^-1₀q₁)λ^r-1 + ... = P₁(λ).
Since the degree of P₁ is less than r, we have by hypothesis P₁(λ) = P₂(λ)Q(λ) + R(λ), the degrees of P₂ and R being less, respectively, than r — s and s; hence
P(λ) = (p₀q^-1₀λ^r-s + P₂(λ))Q(λ) + R(λ) = S(λ)Q(λ) + R(λ)
as required by the theorem. The existence of the right hand quotient and remainder follows in the same way.

It remains to prove the uniqueness of S and R. Suppose, if possible, that P = SQ + R = TQ + U where R and S are as above and T, U are polynomials the degree of U being less than s; then (S — T)Q = U — R. If S — T ≠ 0, then, since |q₀| ≠ 0, the degree of the polynomial (S — T)Q is at least as great as that of Q and is therefore greater than the degree of U — R. It follows immediately that S — T = 0, and hence also U — R = 0; which completes the proof of the theorem.

If Q is a scalar polynomial, that is, if its coefficients q are scalars, then S = S₁, R = R₁ and, if the division is eλact, then Q(λ) is a factor of each of the coordinates of P(λ).

Theorem 2. If the matric polynomial (1) is divided on the right by λ — a, the remainder is
p₀a^r + p₁a^r-1 + .... + p_r and, if it is divided on the left, the remainder is
a^rp₀ + a^r-1p₁ + .... + p_r.

As in ordinary algebra the proof follows immediately from the identity
λ^s - a^s = (λ - a)(λ^s-1 + λ^s-2a + .... + a^s-1)
in which the order of the factors is immaterial since λ is a scalar.

If P(λ) is a scalar polynomial, the right and left remainders are the same and are conveniently denoted by P(a).

2.04

Theorem 1 of the preceding section holds true as regards the existence of S, S₁, R, R₁ and the degree of R, R₁ even when |q₀| = 0 provided |Q(λ)| ≠ 0. Suppose the rank of q₀ is t < n; then by §1.10 it has the form $\sum\limits_{1}^{t} \alpha_i S \beta_i$ or, say, $h(\sum\limits_1^t e_{ii})k$ where h and k are non-singular matrices for which he_i = α_i, k'e_i = β_i (i = 1, 2, ...., t). If $c_1 = \sum \limits_{t+1}^n e_{ii}$, then
(3) Q₁ = (c₁λ + 1)h^-1Q
is a polynomial whose degree is not higher than the degree s of Q since c₁h^-1q₀ = 0 so that the term in λ^s+1 is absent. Now, if η = |h^-1|, then
|Q₁| = |c₁λ + 1||h^-1||Q| = (1 + λ)^n-tη|Q|,
so that the degree of |Q₁| is greater than that of |Q| by n — t. If the leading coefficient of Q₁ is singular, this process may be repeated, and so on, giving Q₁, Q₂,...., where the degree of |Q_i| is greater than that of |Q_i-1|. But the degree of each Q_i is less than or equal to s and the degree of the determinant of a polynomial of the sth degree cannot exceed ns. Hence at some stage the leading coefficient of, say, Q_j is not singular and, from the law of formation (3) of the successive Q's, we have Q_i(λ) = H(λ)Q(λ), where H(λ) is a matric polynomial.

By Theorem 1, Q_j taking the place of Q, we can find S* and R, the latter of degree s — 1 at most, such that
P(λ) = S*(λ)H(λ)Q(λ) + R(λ) = S(λ)Q(λ) + R(λ).
The theorem is therefore true even if |q₀| = 0 except that the quotient and remainder are not necessarily unique and the degree of S may be greater than r — s, as is shown by taking P = λ² — 1, Q = e₁₁λ + 1, when we have
P = (e₂₂λ² + e₁₁λ - 1)Q = (e₂₂λ² + e₁₁λ - 1 + e₁₂)Q - e₁₂

2.05 The characteristic equation.

If λ is a matrix, the scalar polynomial
(4)         ƒ(λ) = |λ - x| = λⁿ + a₁λ^n-1 + ... + a_n
is called the characteristic function corresponding to x. We have already seen (§1.05 (15)) that the product of a matriλ and its adjoint equals its determinant; hence
         (λ - x) adj (λ - x) = |λ - x| = ƒ(λ).
It follows that the polynomial ƒ(λ) is exactly divisible by λ - x so that by the remainder theorem (§2.03, Theorem 2)
(5)              ƒ(x) = 0.

As a simple example of this we may take $x = \begin{Vmatrix} \alpha & \beta \\ \gamma & \delta \end{Vmatrix}$. Here
ƒ(λ - α)(λ - δ) - βγ = λ² - (α + δ)λ αδ - βγ
and
$f(x) = \begin{Vmatrix} \alpha^2 + \beta \gamma & \alpha \beta + \beta \delta \\ \gamma \alpha + \delta \gamma & \gamma \beta + \delta^2 \end{Vmatrix} - (\alpha + \delta) \begin{Vmatrix} \alpha & \beta \\ \gamma & \delta \end{Vmatrix} + (\alpha\delta - \beta\gamma) \begin{Vmatrix} 1 & 0 \\ 0 & 1 \end{Vmatrix} = 0$

The following theorem is an important extension of this result.

Theorem 3. ƒ(λ) = |λ — x| and θ(λ) is the highest common factor of the first minors of |λ — x|, and if
(6)         φ(λ) = ƒ(λ)/θ(λ),
the leading coefficient of θ(λ) being 1 (and therefore also that of φ(λ)), then
     (i) φ(x) = 0;
     (ii) if ψ(λ) is any scalar polynomial such that ψ(x) = 0, then φ(λ) is a factor of ψ(λ), that is, φ(λ) is the scalar polynomial of lowest degree and with leading coefficient 1 such that φ(λ) = 0;
     (iii) every foot of ƒ(λ) is a root of φ(λ).

The coordinates of adj(λ — x) are the first minors of |λ — x| and therefore by hypothesis [adj(λ — x)]/θ(λ) is integral; also
$\frac{adj(\lambda - x)}{\theta(\lambda)} \ (\lambda - x) = \frac{f(\lambda)}{\theta(\lambda)} = \phi(\lambda);$
hence φ(x) = 0 by the remainder theorem.

If ψ(λ) is any scalar polynomial for which ψ(x) = 0, we can find scalar polynomials M(λ), N(λ) such that M(λ)φ(λ) + N(λ)φ(λ) = ζ(λ), where ζ(λ) is the highest common factor of ψ and φ. Substituting x for λ in this scalar identity and using φ(x) = 0 = ψ(x) we have ζ(λ) = 0; if, therefore, ψ(x) = 0 is a scalar equation of lowest degree satisfied by x, we must have ψ(λ) = ζ(λ), apart from a constant factor, so that ψ(λ) is a factor of φ(λ), say
(7) φ(λ) = h(λ)ψ(λ).
Since ψ(x) = 0, λ — x is a factor of ψ(λ), say ψ(λ) = (λ — x)g(λ), where g is a matric polynomial; hence

$\psi(\lambda) =\frac{\phi(\lambda)}{h(\lambda)} = \frac{f(\lambda)}{h(\lambda)\theta(\lambda)} = (\lambda - x)g(\lambda)$
Hence
$g(\lambda) = \frac{f(\lambda)}{\theta(\lambda) h(\lambda) (\lambda - x)} = \frac{adj(\lambda - x)}{\theta(\lambda)h(\lambda)}$
and this cannot be integral unless h(λ) is a constant in view of the fact that θ(λ) is the highest common factor of the coordinates of adj(λ — x); it follows that ψ(λ) differs from φ(λ) by at most a constant factor.

A repetition of the first part of this argument shows that, if ψ(x) = 0 is any scalar equation satisfied by x, then φ(λ) is a factor of ψ(λ).

It remains to show that every root of ƒ(λ) is a root of φ(λ). If λ₁ is any root of ƒ(λ) = |λ — x|, then from φ(λ) = g(λ)(λ — x) we have
φ(λ₁) = g(λ₁)(λ₁ - x)
so that the determinant, [φ(λ₁)]ⁿ, of the scalar matrix φ(λ₁) equals |g(λ₁)| |λ₁ — x|, which vanishes since |λ₁ — x| = ƒ(λ₁). This is only possible if φ(λ₁) = 0, that is, if every root of ƒ(λ) is also a root of φ(λ).

The roots of ƒ(λ) are also called the roots of x, φ(λ) is called the reduced characteristic function of x, and φ(x) = 0 the reduced equation of x.

2.06

A few simple results are conveniently given at this point although they are for the most part merely particular cases of later theorems. If g(λ) is a scalar polynomial, then on dividing by φ(λ), whose degree we shall denote by ν, we may set g(λ) = q(λ)φ(λ) + r(λ), where q and r are polynomials the degree of r being less than ν. Replacing λ by x in this identity and remembering that φ(x) = 0, we have g(x) = r(x), that is, any polynomial can be replaced by an equivalent polynomial of degree less than ν.

If g(λ) is a scalar polynomial which is a factor of φ(λ), say φ(λ) = h(λ)g(λ), then 0 = φ(x) = h(x)g(x). It follows that |g(x)| = 0; for if this were not so, we should have h(x) = [g(x)]^-1φ(x) = 0, whereas x can satisfy no scalar equation of lower degree than φ. Hence, if g(λ) is a scalar polynomial which has a factor in common with φ(x), then g(x) is singular.

If a scalar polynomial g(λ) has no factor in common with φ(λ), there exist scalar polynomials M(λ), N(λ) such that M(λ)g(λ) + N(λ)<p(λ) ≡ 1. Hence M(x)g(x) = 1, or [g(x)]^-1 = M(x). It follows immediately that any finite rational function of x with scalar coefficients can be expressed as a scalar polynomial in x of degree ν — 1 at most. It should be noticed carefully however that, if x is a variable matrix, the coefficients of the reduced polynomial will in general contain the variable coordinates of x and will not be integral in these unless the original function is integral. It follows also that g(x) is singular only when g(λ) has a factor in common with φ(λ).

Finally we may notice here that similar matrices have the same reduced equation; for, if g is a scalar polynomial, g(y^-1xy) = y^-1g(x)y. As a particular case of this we have that xy and yx have the same reduced equation if, say, y is non-singular; for xy = y^-1•yx•y. If both x and y are singular, it can be shown that xy and yx have the same characteristic equation, but not necessarily the same reduced equation as is seen from the example x = e₁₂, y = e₂₂

2.07 Matrices with distinct roots.

Because of its importance and comparative simplicity we shall investigate the form of a matrix all of whose roots are different before considering the general case. Let

(8)          ƒ(λ) = |λ - | = (λ - λ₁)(λ - λ₂) .... (λ - λ_n)
where no two roots are equal and set
$(9) \ \ \ \ \ \ \ f_i (\lambda) = \frac{(\lambda - \lambda_1) ... (\lambda - \lambda_{i -1})(\lambda - \lambda_{i + 1}) ...(\lambda - \lambda_n)}{(\lambda_i - \lambda_1) ... (\lambda_i - \lambda_{i -1})(\lambda_i - \lambda_{i + 1}) ...(\lambda_i - \lambda_n)} = \frac{f(\lambda)f'(\lambda_i)'}{(\lambda - \lambda_i)}$
By the Lagrange interpolation formula $\sum \limits_i f_i (\lambda) = 1;$ hence
(10)          ƒ₁(x) + ƒ₂(x) + .... + ƒ_n(x) = 1.
Further, ƒ(λ) is a factor of ƒ_i(λ)ƒ_j(λ) (i ≠ j) so that
(11)          ƒ_i(x)ƒ_j(x) = 0      (i ≠ j);
hence multiplying (10) by f_i(λ) and using (11) we have
(12)          [ƒ_i(x)]² = ƒ_i(x).
Again, (λ - λ_i)ƒ_i(λ) = ƒ(λ)/ƒ'(λ_i); hence (x - λ_i)ƒ_i(x) = 0, that is,
(13)          xƒ_i(x) = λ_iƒ_i(x),
whence, summing with regard to i and using (10), we have
(14)          x = λ₁ƒ₁(x) + λ₂ƒ₂(x) + ....... + λ_nƒ_n(x).

If we form x^r from (14), r being a positive integer, it is immediately seen from (11) and (12), or from the Lagrange interpolation formula, that
(15)          x^r = λ^r₁ƒ₁ + λ^r₂ƒ₂ + .... + λ^r_nƒ_n,
where ƒ_i stands for ƒ_i(x), and it is easily verified by actual multiplication that, if no root is 0,
         x^-1 = λ^-1₁ƒ₁ + λ^-1₂ƒ₂ + .... + λ^-1_nƒ_n
so that (15) holds for negative powers also. The matrices ƒ_i are linearly independent. For if ∑γ_iƒ_i = 0, then
         0 = ƒ_i∑γ_iƒ_i = γ_iƒ²_i = γ_iƒ_i
whence every γ_j = 0 seeing that in the case we are considering ƒ(λ) is itself the reduced characteristic function so that ƒ_j(x) ≠ 0.

From these results we have that, if g(λ) is any scalar rational function whose denominator has no factor in common with φ(λ), then
(16) g(x) = g(λ₁)ƒ₁ + g(λ₂)ƒ₂ + .... + g(λ_n)ƒ_n.
It follows from this that the roots of g(x) are g(λ_i) (i = 1, 2,....,n). For setting y = g(x), μ_i = g(λ_i), we have as above
ψ(y) = ∑ψ(μ_i)ƒ_i,
ψ(λ) being a scalar polynomial. Now ψ(y)ƒ_i = ψ(μ_i)ƒ_i hence, if ψ(y) = 0, then also ψ(μ_i) = 0 (i = 1, 2,....., n); and conversely. Hence if the notation is so chosen that μ₁, μ₂,...., μ_r are the distinct values of μ_i the reduced characteristic function of y = g(z) is $\prod \limits_1^r (\lambda - \mu_i)$

If the determinant |λ - x| = ƒ(λ) is expanded in powers of λ, it is easily seen that the coefficient a_r of λ^n-r is (—1)^r times the sum of the principal minors of x of order r; this coefficient is therefore a homogeneous polynomial of degree r in the coordinates of x. In particular, —a₁ is the sum of the coordinates in the main diagonal: this sum is called the trace of x and is denoted by tr x.

If y is an arbitrary matrix, μ a scalar variable, and z = x + μy, the coefficients of the characteristic equation of z, say
(17)          zⁿ + b₁z^n-1 + .... + b_n = 0,
are polynomials in μ of the form
(18)          b_s = a_s0 + μa_s1 + .... + μ^sa_ss,      (a_s0 = a_s, a₀₀ = 1)
and the powers of z are also polynomials in μ, say
$(19) \ \ \ \ \ \ \ \ z^r = x^r + \mu \begin{Bmatrix} x & y \\ r-1 & 1 \end{Bmatrix} + \mu^2 \begin{Bmatrix} x & y \\ r-2 & 2 \end{Bmatrix} + ... + \mu^r y^r$
where $\begin{Bmatrix} x & y \\ 8 & t \end{Bmatrix}$ is obtained by multiplying s x's and t y's together in every possible way and adding the terms so obtained, e.g.,
         $\begin{Bmatrix} x & y \\ 2 & t \end{Bmatrix} = x^2 y + xyx + yx^2$

If we substitute (18) and (19) in (17) and arrange according to powers of μ, then, since μ is an independent variable, the coefficients of its several powers must be zero. This leads to a series of'relations connecting x and y of the form
$(20) \ \ \ \ \ \ \ \ \ \sum \limits_{i,j} a_{ij} \begin{Bmatrix} x & y \\ n - 8 - i + j & s - j \end{Bmatrix} = 0$ (s = 0,1,2,...)
where a_ij are the coefficients defined in (18) and $\begin{Bmatrix} x & y \\ n - 8 - i + j & s - j \end{Bmatrix}$ is replaced by 0 when j > s. In particular, if s = 1,
$\begin{Bmatrix} x & y \\ n - 1 & 1 \end{Bmatrix} + a_1 \begin{Bmatrix} x & y \\ n - 2 & 1 \end{Bmatrix} + ... + a_{n-1}y + a_{11}x^{n-1} + ... + a_{n1} = 0$
which, when xy = yx, becomes
ƒ'(x)y = -(a₁₁x^n-1 + .... + a_n1) = g(x).
When x has no repeated roots, ƒ'(λ) has no root in common with ƒ(λ) and ƒ'(x) has an inverse (cf. §2.06) so that y = g(x)/ƒ'(x) which can be expressed as a scalar polynomial in x; and conversely every such polynomial is commutative with x. We therefore have the following theorem:

Theorem 4. If x has no multiple roots, the only matrices commutative with it are scalar polynomials in x.

2.09 Matrices with multiple roots.

We shall now extend the main results of §2.07 to matrices whose roots are not necessarily simple. Suppose in the first place that x has only one distinct root and that its reduced characteristic function is φ(λ) = (λ — λ₁)^ν, and set
ηⁱ₁ = η_i = (x - λ₁)ⁱ = (x - λ₁)η_i-1      (i = 1, 2,, ν - 1);
then
η^ν₁ = 0,      xη_ν-1 = λ₁η_ν-1,      xη_i = λ₁η_i + η_i+1      (i = 1, 2,....,ν — 2)
and      x^s = (λ₁ + η₁)^s = λ^s₁ + sλ^s-1₁η₁ + ${s \choose 2}$ + ....
where the binomial expansion is cut short with the term η^ν-1₁ since η^ν₁ = 0. Again, if g(λ) is any scalar polynomial, then
     $g(x) = g(\lambda_1 + \eta_1) = g(\lambda_1) + g'(\lambda_1) \eta_1 + ... + \frac{g^{(v-1)}(\lambda_1) }{(v-1)!} \eta_1^{(v-1)}$
It follows immediately that, if g^(s)(λ) is the first derivative of g(λ) which is not 0 when λ = λ₁ and (k — 1)s < ν ≤ ks, then the reduced equation of g(x) is
         [g(x) - g(λ₁]^k = 0.

It should be noted that the first ν — 1 powers of η₁ are linearly independent since φ(λ) is the reduced characteristic function of x.

2.10

We shall now suppose that x has more than one root. Let the reduced characteristic function be
$(21) \ \ \ \ \ \ \varphi(\lambda) = \prod \limits_{i=1}^r (\lambda - \lambda_i)^{v_i} \ \ \ \ \ (\sum v_i = v, \ r > 1)$
and set
(22)          h_i(λ) = φ(λ)/(λ - λ_i)^ν_i.
We can determine two scalar polynomials, M_i(λ) and N_i(λ), of degrees not exceeding ν_i — 1 and ν — ν_i - 1, respectively, such that
M_i(λ)h_i(λ) + (λ - λ_i)^ν_iN_i(λ) ≡ 1,      M_i(λ_i) ≠ 0.
If we set
(23)          φ_i(λ) = M_i(λ)h_i(λ),
then 1 — ∑φ_i(λ) is exactly divisible by φ(λ) and, being of degree ν — 1 at most, must be identically 0; hence
(24)          $\sum \limits_1^r \varphi_i (\lambda) = 1$
Again, from (22) and (23), φ(λ) is a factor of φ_i(λ)φ_j(λ) (i ≠ j) arid hence on multiplying (24) by φ_i(λ) we have
(25)      [φ_i(λ)]² ≡ φ_i(λ),      φ_i(λ)φ_j(λ) ≡ 0, mod φ(λ)      (i ≠ j).
Further, if gr(x) is a scalar polynomial, then
$(26) \ \ \ \ \ \ \ \ g(\lambda) = \sum \limits_1^r g(\lambda) \varphi_i (\lambda)$
       $= \sum \limits_1^r [g(\lambda_i) + g'(\lambda_i)(\lambda - \lambda_i) + ... + \frac{g^{(v_i-1)}(\lambda_i)}{(v_i-1)!}(\lambda - \lambda_i)^{(v_i-1)}] \varphi_i(\lambda) + R$
where R has the form ∑C_i(λ)(λ — λ_i)^ν_iφ_i(λ), C_i being a polynomial, so that R vanishes when x is substituted for x.

2.11

If we put x for λ in (23) and set φ_i for φ_i(x), then (24) and (25) show that
$(27) \ \ \ \ \ \ \ \ \varphi_i^2 = \varphi_i, \ \varphi_i \varphi_j = 0, \ \ (i \neq j) \ \ \sum \limits_1^r \varphi_i (\lambda) = 1$
It follows as in §2.07 that the matrices φ_i are linearly independent and none is zero, since φ_i(λ_i) ≠ 0 so that φ(λ) is not a factor of φ_i(lambda;), which would be the case were φ_i(x) = 0. We now put x for λ in (26) and set
(28)          η_i = (x - λ_i)φ_i      (i = 1, 2,...., r).
Since the ν_ith power of (λ — λ_i)^jφ_i(λ) is the first which has φ(λ) as a factor, η_i is a nilpotent matrix of index ν_i (cf. §1.05) and, remembering that φ²_i = φ_i, we have
(29)          η^j_i = (x — λ_i)^jφ_i ≠ 0 (j < ν_i),      η_iφ_i = η_i = φ_iη_i,
(30)          xφ_i = λ_iφ_i + η_i,      xη^j_i = λ_iη^j_i + η^j+1_i
equation (26) therefore becomes
(31)      $g(x) = \sum \limits_1^r [g(\lambda_i)\varphi_i + g'(\lambda_i) \eta_i + ... + \frac{g^{(v_i-1)}(\lambda_i) }{(v_i-1)!} \eta_i^{(v_i-1)}]$
and in particular
$(32) \ \ \ \ \ \ x = \sum \limits_1^r (\lambda_i \varphi_i + \eta_i) = \sum x_i$

The matrices φ_i and η_i are called the principal idempotent and nilpotent elements of x corresponding to the root λ_i. The matrices φ_i are uniquely determined by the following conditions: if ψ_i (i = 1, 2,...., r) are any matrices such that
       (i) xψ_i = ψ_ix,
(33)       (ii) (x - λ_i)ψ_i is nilpotent,
      (iii) $\sum \limits_i \psi_i = 1, \ \ \ \psi_i^2 = \psi_i \neq 0$,
then ψ_i = φ_i (i = 1, 2,...., r). For let θ_ij = φ_iψ_j; from (i) θ_ij also equals ψ_jφ_i. From (ii) and (28)
         η_i = xφ_i — λ_iφ_i,      ξ_j = xψ_j - λ_jψ_j
are both nilpotent and, since φ_i and η_i are polynomials in x, they are commutative with ψ_j and therefore with ξ_j; also
     xθ_ij = λ_iθ_ij + (x — λ_i)φ_iψ_j = λ_iθ_ij + η_iψ_j
         = λ_jθ_ij + (x - λ_j)φ_iψ_j = λ_jθ_ij + ξ_jφ_i.
Hence (λ_i — λ_j)θ_ij = ξ_jφ_i - η_iψ_j. But if μ is the greater of the indices of ξ_j and η_i then, since all the matrices concerned are commutative, each term of (ξ_jφ_i — η_iψ_j)^2μ contains ξ^μ_j or η^μ_i as a factor and is therefore 0. If θ_ij ≠ 0, this is impossible when i ≠ j since θ_ij a is idempotent and &lambdea;_i — λ_j ≠ 0. Hence φ_iψ_j = 0 when i ≠ j and from (iii)
         ψ_j = ψ_j∑φ_i = ψ_jφ_j = φ_j∑ψ_i = φ_i
which proves the uniqueness of the φ's.

2.12

We shall now determine the reduced equation of g(x). If we set g_i for g(x)φ_i, then
$(34) \ \ \ \ \ \ \ \ g_i = g(\lambda_i)\varphi_i + g'(\lambda_i) \eta_i + ... + \frac{g^{(v_i-1)}(\lambda_i) }{(v_i-1)!} \eta_i^{(v_i-1)}$
= g(λ_i)φ_i + ζ_i,
say, and if s_i is the order of the first derivative in (34) which is not 0, then ζ_i is a nilpotent matrix whose index k_i is given by k_i = 1 < ν_i/s_i ≤ k_i.

If Φ(λ) is a scalar polynomial, and γ_i = g(λ_i),
$\Phi(g(x)) = \sum \Phi(g_i)\varphi_i = \sum [\Phi(\gamma_i)\varphi_i + \Phi'(\gamma_i)\xi_i + ... + \frac{\Phi^{(k_i-1)}(\gamma_i) }{(k_i-1)!} \xi_i^{(k_i-1)}]$
so that Φ(g(x)) = 0 if, and only if, g(λ_i) is a root of Φ(λ) of multiplicity k_i. Hence, if
Ψ(λ) = Π[λ - g(λ_i]^k_i
where when two or more values of i give the same value of g(λ_i), only that one is to be taken for which k_i is greatest, then Ψ(λ) is the reduced characteristic function of g(x). As a part of this result we have the following theorem.

Theorem 5. If g(λ) is a scalar polynomial and x a matrix whose distinct roots are λ₁, λ₂,...., λ_r, the roots of the matrix g(x) are
g(λ₁), g(λ₂),...., g(λ_r).

If the roots g(λ_i) are all distinct, the principal idempotent elements of g(x) are the same as those of x; for condition (33) of §2.11 as applied to g(x) are satisfied by φ_i (i = 1, 2, ...., r), and these conditions were shown to characterize the principal idempotent elements completely.

2.13 The square roof of a matrix.

Although the general question of functions of a matrix will not be taken up till a later chapter, it is convenient to give here one determination of the square root of a matrix x.

If α and β are scalars, α ≠ 0, and (α + β)^1/2 is expanded formally in a Taylor series,
$(\alpha + \beta)^{\frac{1}{2}} = \alpha^{\frac{1}{2}} \sum \limits_0^r \delta_r (\frac{\beta}{\alpha})^r$
then, if $S_v = \alpha^{\frac{1}{2}} \alpha^{\frac{1}{2}} \sum \limits_0^{r-1} \delta_r (\frac{\beta}{\alpha})^r$, it follows that
(35) S²_ν = α + β + αT_ν

where T_ν is a polynomial in β/α which contains no power of β/α lower than the νth. If a and b are commutative matrices and a is the square of a known non-singular matrix a^1/2, then (35) being an algebraic identity in α and β remains true when a and b are put in their place.

If x_i = λ_iφ_i + η_i is the matrix defined in §2.11 (32), then so long as λ_i ≠ 0, we may set α = λ_iφ_i, β = η_i since λ_iφ_i = (λ^1/2_iφ_i)²; and in this case the Taylor series terminates since η^ν_i_i = 0, that is, T_{ν_i} = 0 and the square of the terminating series for (λ_iφ_i + η_i)^1/2 in powers of η_i equals λ_iφ_i + η_i. It follows immediately from (32) and (27) that, if x is a matrix no one of whose roots is 0, the square of the matrix
$(36) x^{\frac{1}{2}} = \sum \limits_1^r \lambda_i^{\frac{1}{2}} [ \varphi_i + \frac{1}{2} \lambda_i^{-1} \eta_i - ... + (-1)^{(v_i - 2)} \frac{(2v_i - 4)!}{2^{(2v_i - 3)}(v_i - 2)!(v_i - 1)!}(\frac{\eta_i}{\lambda_i})^{v_i - 1} ]$
is x.

If the reduced equation of x has no multiple roots, (36) becomes
(37) x^1/2 = ∑λ^1/2_iφ_i
and this is valid even if one of the roots is 0. If, however, 0 is a multiple root of the reduced equation, x may have no square root as, for example, the matrix $\begin{Vmatrix} 0 & 1 \\ 0 & 0 \end{Vmatrix}$.

Formula (36) gives 2^r determinations of x^1/2 but we shall see later that an infinity of determinations is possible in certain cases.

2.14 Reducible matrices.

If x = x₁ + x₂ is the direct sum of x₁ and x₂ and e₁, e₂ are the corresponding idempotent elements, that is,
     e_ix = x_i = xi,      e_ie_j = 0      (i ≠ j; i, j = 1, 2),
then x^r = x^r₁ + x^r₂ (r ≥ 2) and we may set as before 1 = x⁰ = x⁰₁ + x⁰₂ = e₁ + e₂. Hence, if ƒ(λ) = λ^m + b₁λ^m-1 + .... + b_m is any scalar polynomial, we have
ƒ(x) = e₁ƒ(x₁) + e₂ƒ(x₂) = f(x₁) + f(x₂) - b_m,
and if g(λ) is a second scalar polynomial
ƒ(x)g(x) = e₁f(x₁)g(x₁) + e₂ƒ(x₂)g(x₂).
Now if ƒ_i(λ) is the reduced characteristic function of x_i regarded as a matrix in the space determined by e_i, then the reduced characteristic function of x_i as a matrix in the original fundamental space is clearly λƒ_i(λ) unless λ is a factor of ƒ_i(λ) in which case it is simply ƒ_i(λ). Further the reduced characteristic function of x = x₁ + x₂ is clearly the least common multiple of ƒ₁(λ) and ƒ₂(λ); for if
         ψ(λ) = ƒ₁(λ)g₁(λ) = ƒ₂(λ)g₂(λ)
then
     ψ(x₁ + x₂) = e₁ψ(x₁ + e₂ψ(x₂)
           = e₁ƒ₁(x₁)g₁(x₁) + e₂ƒ₂(x₂)g₂(x₂) = 0.