Matrices and Vectors

1.01 Linear transformations and vectors.

In a set of linear equations
η'1 = a11η1 + a12η2 + ..... + a1nηn
η'2 = a21η1 + a22η2 + ..... + a2nηn
η'n = an1η1 + an2η2 + ..... + annηn
$(1) \ \ \ \ \ \ \underline{\eta'_i = \sum \limits_{i=1}^n a_{ij}\eta_i \ \ \ \ \ \ (i = 1, 2, . . . ,n)}$

the quantities η12,...., ηn may be regarded as the coordinates of a point P in n-space and the point P'(η'1, η'2,....,η'n) is then said to be derived from P by the linear homogeneous transformation (1). Or, in place of regarding the η's as the coordinates of a point we may look on them as the components of a vector y and consider (1) as defining an operation which transforms y into a new vector y'. We shall be concerned here with the properties of such transformations, sometimes considered abstractly as entities in themselves, and sometimes in conjunction with vectors.

To prevent misconceptions as to their meaning we shall now define a few terms which are probably already familiar to the reader. By a scalar or number we mean an element of the field in which all coefficients of transformations and vectors are supposed to lie; unless otherwise stated the reader may assume that a scalar is an ordinary number real or complex.

A vector of order n is defined as a set of n scalars (ξ1, ξ2,...., ξn) given in a definite order. This set, regarded as a single entity, is denoted by a single symbol, say x, and we write
x = (ξ1, ξ2,.....,ξn).
The scalars ξ1, ξ2,....., ξn are called the coordinates or components of the vector. If y = (η12,.....,ηn) is also a vector, we say that x = y if, and only if, corresponding coordinates are equal, that is, ξi = ηI, (i = 1, 2,......, n). The vector
z = (ζ1, ζ2,...., ζn) = (ξ1 + η1, ξ2 + η2,....., ξn + ηn)
is called the sum of x and y and is written x + y; it is easily seen that the operation of addition so defined is commutative and associative, and it has a unique inverse if we agree to write 0 for the vector (0, 0,...., 0).

If ρ is a scalar, we shall write
ρx = xρ = (ρξi, ρξ2,...., ρξn).
This is the only kind of multiplication we shall use regularly in connection with vectors.

1.02 Linear dependence.

In this section we shall express in terms of vectors the familiar notions of linear dependence. If x1, x2,....., xr are vectors and ω1, ω2,...., ωr scalars, any vector of the form
(2)              x = ω1x1 + ω2x2 + ..... + ωrxr
is said to be linearly dependent on x1, x2,...., xr; and these vectors are called linearly independent if an equation which is reducible to the form
             0 = ω1x1 + ω2x2 + ..... + ωrxr
can only be true when each ωi = 0. Geometrically the r vectors determine an r-dimensional subspace of the original n-space and, if x1, x2,...., xr are taken as the coordinate axes, ω1, ω2,...., ωr in (2) are the coordinates of x.

We shall call the totality of vectors x of the form (2) the linear set or subspace (x1, x2,...., xr) and, when x1, x2,...., xr are linearly independent, they are said to form a basis of the set. The number of elements in a basis of a set is called the order of the set.

Suppose now that (x1, x2,...., xr), (y1, y2,...., ys) are bases of the same linear set and assume s ≥ r. Since the x's form a basis, each y can be expressed in the form
(3)              yi = ai1x1 + ai2x2 + .... + airxr (i = 1, 2,...., s)
and, since the y'a form a basis, we may set
             xi = bi1y1 + bi2y2 + ..... + bisys (i = 1, 2,...., r)
and therefore from (3)
(4)          $y_i = \sum \limits_{j=1}^r a_{ij}x_j = \sum \limits_{j=1}^r a_{ij} \sum \limits_{k=1}^s b_{jk}y_k = \sum \limits_{k=1}^s c_{ik}y_k,$
where $c_{ik} = \sum \limits_{j=1}^r a_{ij}b_{jk}$ which may also be written
(5)              $c_{ik} = \sum \limits_{j=1}^s a_{ij}b_{jk} \ \ \ \ \ (i = 1,2,...,s)$
if we agree to set aij = 0 when j > r. Since the y's are linearly independent, (4) can only hold true if cii = 1, cik = 0 (i ≠ k) so that the determinant |cik| = 1. But from the rule for forming the product of two determinants it follows from (5) that |cik| = | aik||bik| which implies (i) that |aik| ≠ 0 and (ii) that r = s, since otherwise |aik| contains the column ai,r+1 each element of which is 0. The order of a set is therefore independent of the basis chosen to represent it.

It follows readily from the theory of linear equations (or from §1.11 below) that, if |aij| ≠ 0 in (3), then these equations can be solved for the x's in terms of the y'sy so that the conditions established above are sufficient as well as necessary in order that the y's shall form a basis.

If ei denotes the vector whose i-th coordinate is 1 and whose other coordinates are 0, we see immediately that we may write
         x = ξ1e1 + ξ2e2 + .... + ξnen
in place of x = (ξ1, ξ2,..., ξn). Hence e1, e2,...., en form a basis of our n-space. We shall call this the fundamental basis and the individual vectors ei the fundamental unit vectors.

If x1, x2,....., xr(r < n) is a basis of a subspace of order r, we can always find n — r vectors xr+1,...., xn such that x1, x2,...., xn is a basis of the fundamental space. For, if xr+1 is any vector not lying in (x1, x2,...., xr), there cannot be any relation
ω1x1 + ω2x2 + .... + ωrxr + ωr+1xr+1 = 0
in which ωr+1 ≠ 0 (in fact every ω must be 0) and hence the order of (x1, x2,...., xr,xr+1) is r + 1. Since the order of (e1, e2,..., en) is n, a repetition of this process leads to a basis x1, x2,..., xr,...., xn of order n after a finite number of steps; a suitably chosen ei may be taken for xr+1. The (n — r)-space (xr+1,...., xn) is said to be complementary to (x1, x2,..., xr); it is of course not unique.

1.03 Linear vector functions and matrices.

The set of linear equations given in §1.01, namely,
(6)          $\underline{\eta'_i = \sum \limits_{j=1}^n a_{ij}\eta_j}$
define the vector y' = (η'1, η'2,.....,η'n) as a linear homogeneous function of the coordinates of y = (η1, η2,..., ηn) and in accordance with the usual functional notation it is natural to write y' = A(y); it is usual to omit the brackets and we therefore set in place of (6)
y' = Ay.

The function or operator A when regarded as a single entity is called a matrix; it is completely determined, relatively to the fundamental basis, when the n2 numbers aij are known, in much the same way as the vector y is determined by its coordinates. We call the aij the coordinates of A and write
(7)          $A = \begin{Vmatrix}a_{11} & a_{12} & . . . & a_{1n} \\ a_{21} & a_{22} & . . . & a_{2n} \\ . . & . . & . . . & . \\ . . & . . & . . . & . \\ a_{n1} & a_{n2} & . . . & a_{nn} \end{Vmatrix}$
or, when convenient, A = ||aij||. It should be noted that in aij the first suffix denotes the row in which the coordinate occurs while the second gives the column.

If B = ||bij|| is a second matrix, y" = A(By) is a vector which is a linear vector homogeneous function of y, and from (6) we have
         $\eta''_i = \sum \limits_{p=1}^n a_{ip} \sum \limits_{p=1}^n b_{pj} \eta_j = \sum \limits_{j=1}^n d_{ij} \eta_j,$
(8)          $d_{ij} = \sum \limits_{p=1}^n a_{ip} b_{pj}$
The matrix D = ||dij|| is called the product of A into B and is written AB. The form of (8) should be carefully noted; in it each element of the i-th row of A is multiplied into the corresponding element of the j-th column of B and the terms so formed are added. Since the rows and columns are not interchangeable, AB is in general different from BA; for instance
         $\begin{Vmatrix}1 & 0 \\ 2 & 1\end{Vmatrix} \ \ \begin{Vmatrix} a & b \\ c & d\end{Vmatrix} = \begin{Vmatrix} a & b \\ 2a + c & 2b + d\end{Vmatrix} \\ \begin{Vmatrix} a & b \\ c & d\end{Vmatrix} \ \ \begin{Vmatrix}1 & 0 \\ 2 & 1\end{Vmatrix} = \begin{Vmatrix} a + 2b & b \\ c + 2d & d\end{Vmatrix}$

The product defined by (8) is associative; for if C = ||cij||, the element in the i-th row and j-th column of (AB)C is
         $\sum \limits_{q=1}^n (\sum \limits_{p=1}^n a_{ip} b_{pq}) c_{qj} = \sum \limits_{p=1}^n a_{ip} (\sum \limits_{q=1}^n b_{pq} c_{qj})$
and the term on the right is the (i, j) coordinate of A(BC).

If we add the vectors Ay and By, we get a vector whose i-th coordinate is (cf. (6))
         $\eta'_i = \sum \limits_{j=1}^n a_{ij}\eta_j + \sum \limits_{j=1}^n b_{ij}\eta_j + \sum \limits_{j=1}^n c_{ij}\eta_j$
where cij = aij + bij. Hence Ay + By may be written Cy where C = ||cij||. We define C to be the sum of A and B and write C = A + B; two matrices are then added by adding corresponding coordinates just as in the case of vectors. It follows immediately from the definition of sum and product that
         A + B = B + A,

         (A + B) + C = A + (B + C),

         A(B + C) = AB + AC, (B + C)A = BA + CA,

         A(x + y) = Ax + Ay,
A, B, C being any matrices and x, y vectors. Also, if k is a scalar and we set y' = Ay, y" = ky', then
         y" = ky' = kA(y) = A(ky)
or in terms of the coordinates
         $\eta''_i = \sum \limits_{i} ka_{ij}\eta_j$
Hence kA may be interpreted as the matrix derived from A by multiplying each coordinate of A by k.

On the analogy of the unit vectors e» we now define the fundamental unit matrices eij (i, j = 1, 2,..., n). Here eij is the matrix whose coordinates are all 0 except the one in the i-th row and j-th column whose value is 1. Corresponding to the form ∑ξiei for a vector we then have
(9)              $A = \sum \limits_{i,j = 1}^n a_{ij} e_{ij}$
Also from the definition of multiplication in (8)
(10)          eijejk = eik, eijepq = 0, (j ≠ p)
a set of relations which might have been made the basis of the definition of the product of two matrices. It should be noted that it follows from the definition of eij that
(11)          eijej = ei, eijek = 0 (j ≠ k),
(12)          $A e_k = \sum \limits_{i,j} a_{ij} e_{ij} e_k = \sum \limits_{i} a_{ik} e_i$
Hence the coordinates of Aek are the coordinates of A that lie in the k-th column.

1.04 Scalar matrices.

If k is a scalar, the matrix K defined by Ky = ky is called a scalar matrix; from (1) it follows that, if K = ||kij||, then kii = k (i = 1, 2,...., n), kij = 0 (i ≠ j). The scalar matrix for which k = 1 is called the identity matrix of order n; it is commonly denoted by I but, for reasons explained below, we shall here usually denote it by 1, or by 1n if it is desired to indicate the order. When written at length we have
$I_n = \begin{Vmatrix} 1 & & & & & \\ & 1 & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & 1 \end{Vmatrix}, \ \ K = \begin{Vmatrix} k & & & & & \\ & k & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & k \end{Vmatrix}$

A convenient notation for the coordinates of the identity matrix was introduced by Kronecker: if δij is the numerical function of the integers i, j defined by
(13)          δii = 1,      δij = 0 (i ≠ j)
then 1n = ||δij||. We shall use this Kronecker delta function in future without further comment.

Theorem 1. Every matrix is commutative with a scalar matrix.

Let k be the scalar and K = ||kij|| = ||kδijij|| is any matrix, then from the definition of multiplication
$KA = || \sum \limits_p k_{ip} a_{pj}|| = || \sum \limits_p k \delta_{ip} a_{pj}|| = ||k a_{ij}|| \\ AK = || \sum \limits_p a_{ip} k_{pj}|| = || \sum \limits_p k a_{ip} \delta_{pj}|| = ||k a_{ij}||$
so that AK = KA.

If k and h are two scalars and K, H the corresponding scalar matrices, then K + H and KH are the scalar matrices corresponding to k + h and kh. Hence the one-to-one correspondence between scalars and scalar matrices is maintained under the operations of addition and multiplication, that is, the two sets are simply isomorphic with respect to these operations. So long therefore as we are concerned only with matrices of given order, there is no confusion introduced if we replace each scalar by its corresponding scalar matrix, just as in the theory of ordinary complex numbers, (a, b) = a + bi, the set of numbers of the form (a, 0) is identified with the real continuum. We shall therefore as a rule denote ||δij|| by 1 and ||kδij|| by k.

1.05 Powers of a matrix; adjoint matrices.

Positive integral powers of A = ||aij|| are readily defined by induction; thus
A2 = A • A, A3 = A • A2, ...., Am = A • Am-1
With this definition it is clear that ArAs = Ar+s for any positive integers r, s. Negative powers, however, require more careful consideration.

Let the determinant formed from the array of coefficients of a matrix be denoted by
|A| = det.A
and let αqp be the cofactor of apq in A, so that from the properties of determinants
(14)          $\sum \limits_p a_{ip} \alpha_{pj} = |A| \delta_{ij} = \sum \limits_p \alpha_{ip} a_{pj} \ \ \ \ (i,j = 1, 2, ..., n)$
The matrix ||αij|| is called the adjoint of A and is denoted by adj A. In this notation (14) may be written
(15)          A (adj A) = |A| = (adj A)A,
so that a matrix and its adjoint are commutative.

If |A| ≠ 0, we define A-1 by
(16)          A-1 = |A|-1 adj A.
Negative integral powers are then defined by A-r = (A-1)r; evidently A-r = (Ar)-1. We also set A0 = 1, but it will appear later that a different interpretation must be given when |A| = 0. Since AB • B-1A-1 = A • BB-1 • A-1 = AA-1 = 1, the reciprocal of the product AB is
(AB)-1 = B-1A-1

If A and B are matrices, the rule for multiplying determinants, when stated in our notation, becomes
|AB| = |A||B|.
In particular, if AB = 1, then |A||B| = 1; hence, if |A| = 0, there is no matrix B such that AB = 1 or BA = 1. The reader should notice that, if k is a scalar matrix of order n, then |k| = kn.

If A = 0, A is said to be singular; if A ≠ 0, A is regular or non-singular. When A is regular, A-1 is the only solution of AX = 1 or of XA = 1. For, if AX = 1, then
A-1 = A-1 • 1 = A-1AX = X.
If AX = 0, then either X = 0 or A is singular; for, if A-1 exists,
0 = A-1Ax = X.

If A2 = A ≠ 0, then A is said to be idempotent, for example e11 and $\begin{Vmatrix} 4 & -2 \\ 6 & -3 \end{Vmatrix}$ are idempotent. A matrix a power of which is 0 is called nilpotent. If the lowest power of A which is 0 is Ar, r is called the index of A; for example, if A = e12 + e23 + e34, then
A2 = e13 + e24, A3 = e14, A4 = 0,
so that the index of A in this case is 4.

1.06 The transverse of a matrix.

If A = ||aij|| the matrix ||a'ij|| in which a'ij = aij is called the transverse of A and is denoted by A'. For instance the transverse of
$\begin{Vmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{Vmatrix} \ is \ \begin{Vmatrix} a_{11} & a_{21} & a_{31} \\ a_{12} & a_{22} & a_{32} \\ a_{13} & a_{23} & a_{33} \end{Vmatrix}$
The transverse, then, is obtained by the interchange of corresponding rows and columns. It must be carefully noted that this definition is relative to a particular set of fundamental units and, if these are altered, the transverse must also be changed.

Theorem 2. The transverse of a sum is the sum of the transverses of the separate terms, and the transverse of a product, is the product of the transverses of the separate factors in the reverse order.

The proof of the first part of the theorem h immediate and is left to the reader. To prove the second it is sufficient to consider two factors. Let A = ||aij||, B = ||bij||, C = AB = ||cij|| and, as above, set a'ij = aji, b'ij = bji,c'ij = cji, then
theorem 2 $c'_{ij} = c_{ji} = \sum \limits_p a_{jp} b_{pi} = \sum \limits_p b'_{ip} a'_{pj}$
(AB)' = C' = B'A'.
The proof for any number of factors follows by induction.

If A = A', A is said to be symmetric and, if A = —A', it is called skew-symmetric or skew. A scalar matrix k is symmetric and the transverse of kA is kA'.

Theorem 3. Every matrix can be expressed uniquely as the sum of a symmetric and a skew matrix.

For if A = B + C, B' = B, C' = -C, then A' = B' + C' = B - C and therefore
B = (A + A')/2, C = (A - A')/2.
Conversely 2A = (A + A') + (A — A') and A + A' is symmetric, A — A' skew.

1.07 Bilinear forms.

A scalar bilinear form in two variable vectors, x = ∑ξiei, y = ∑ηiei, is a function of the form
(17)          $A(x,y) = \sum \limits_{i,j = 1}^n a_{ij} \xi_i \eta_j$
There is therefore a one-to-one correspondence between such forms and matrices, A = ||aij|| corresponding to A(x, y). The special form for which A = ||δij|| = 1 is of very frequent occurrence and we shall denote it by S; it is convenient to omit the brackets and write simply
(18)          Sxy = ξ1η1 + ξ2η2 + ..... + ξnηn
and, because of the manner in which it appears in vector analysis, we shall call it the scalar of xy. Since S is symmetric, Sxy = Syx.

The function (17) can be conveniently expressed in terms of A and S; for we may write A(x, y) in the form
$A(x,y) = \sum \limits_{i = 1}^n \xi_i (\sum \limits_{j = 1}^n a_{ij} \eta_j) = Sx Ay$
It may also be written
$\sum \limits_{j = 1}^n (\sum \limits_{i = 1}^n a_{ij} \xi_i) \eta_j = S A'xy = Sy A'x;$
(19)          SxAy = SyA'x,
so that the form (17) is unaltered when x and y are interchanged if at the same time A is changed into A'. This gives another proof of Theorem 2. For
         Sx(AB)'y = SyABx = SBxA'y = SxB'A'y,
which gives (AB)' = B'A' since x and y are independent variables.

1.0.8 Change of basis.

We shall now investigate more closely the effect of a change in the fundamental basis on the coordinates of a vector or matrix. If f1, f2,...,fn is a basis of our n-space, we have seen (§1.02) that the f's are linearly independent. Let
$(20) \ \ \ \ f_i = \sum \limits_{j = 1}^n p_{ji} e_j = P e_i, \ \ \ (i = 1,2,...,n)$
         P = ||pij||.
Since the f's form a basis, the e's are linearly expressible in terms of them, say
$(21) \ \ \ \ e_i = \sum \limits_{j = 1}^n q_{ji} f_i,$
and, if Q = ||qij||, this may be written
$(22) \ \ \ \ e_i = \sum \limits_j q_{ji} \sum \limits_k p_{kj} e_k = PQ e_i \ \ \ \ (i = 1, 2, 3,..., n)$
Hence PQ = 1, which is only possible if |P| ≠ 0, Q = P-1.

Conversely, if |P| ≠ 0, Q = P-1, and fi = Pei as in (20), then (22) holds and therefore also (21), that is, the e's, and therefore also any vector x, are linearly expressible in terms of the f's. We have therefore the following theorem.

Theorem 4. If fi = Pei (i = 1, 2,...., n), the vectors fi form a basis if, and only if |P| ≠ 0.

If we have fewer than n vectors, say f1, f2, ....,fr, we have seen in 1.02 that we can choose fr+1,...., fn so that f1, f2,...., fn form a basis. Hence

Theorem 5. If f1,f2,....,fr are linearly independent, there exists at least one non-singular matrix P such that Pei = fi; (i = 1, 2,...., r).

We shall now determine how the form Sxy which was defined relatively to the fundamental basis, is altered by a change of basis. As above let
(23)          fi = Pei,      ei = P-1fi = Qfi,      |P| ≠ 0,          (i = 1, 2,...., n)
be a basis and
         x = ∑ξiei = ∑ξ'ifi,      y = ∑ηiei = ∑η'ifi
variable vectors; then from (23)
         x = Q∑ξifi = P∑ξ'iei,      y = Q∑ηifi = P∑η'iei
         ∑ξ'iei = P-1x = Qx,      ∑η'iei = Qy.
Let us set temporarily Sexy for Sxy and also put Sfxy = ∑ξ'iη'i, the corresponding form with reference to the new basis; then
(24)          Sfxy = SeQxQy = SexQ'Qy          Sexy = SfPxPy.

Consider now a matrix A = ||aij|| defined relatively to the fundamental basis and let A1 be the matrix which has the same coordinates when expressed in terms of the new basis as A has in the old. From the definition of A and from ξi = SeejX we have
$Ax = \sum \limits_{i,j} a_{ij} \xi_j e_i = \sum \limits_{i,j} a_{ij} e_i S_e e_j x$
and hence
(25)          A1 = ∑aijξ'ifi = ∑ aijfiSffjx = ∑aijQ-1eiSeQfjQx = Q-1∑aijeiSsejQx = Q-1AQx
We have therefore, remembering that Q = P-1,

Theorem 6. If fi = Pei; (i = 1, 2,...., n) is a basis and A any matrix, the matrix PAP-1 has the same coordinates when expressed in terms of this basis as A has in terms of the fundamental basis.

The matrix Q-1AQ is said to be similar to A and to be the transform of A by Q. Obviously the transform of a product (sum) is the product (sum) of the transforms of the individual factors (terms) with the order unaltered. For instance Q-1ABQ = Q-1AQ • Q-1BQ.

Theorem 6 gives the transformation of the'matric units eij defined in §1.03 which corresponds to the vector transformation (23); the result is that, if fij is the unit in the new system corresponding to eij, then
         fij = PeijP-1
which is readily verified by setting
         A = eij = eiSeej( ),      A1 = fij = fiSffi( )
in (25). The effect of the change of basis on the form of the transverse is found as follows. Let A* be defined by
         SfxAy = SfyA*x;
         SfyA*x = SfxAy = SeQxQAy = SexQ'QAy = SeQy(Q')A'Q'Qx
             := Sfy(Q'Q)A'Q'Qx.
(26)          A* = (Q'Q)A'Q'Q.

1.09 Reciprocal and orthogonal bases.

With the same notation as in the previous section we have Sffifj = 0 (i ≠ j), Sffifj = 1. Hence
         δij = Sffifj = SeQf,sub>iQfj = SefiQ'Qfj.
If, therefore, we set
(27)          f'iQ'Qfi      (j= 1,2, .... n),
we have, on omitting the subscript e in Se,
(28)          Sfif'j = δij      (i,j = 1,2,...., n).
Since |Q'Q| ≠ 0, the vectors f'1, f'2,..., f'n form a basis which we say is reciprocal to f1, f2,.....,fn. This definition is of course relative to the fundamental basis since it depends on the function S but, apart from this the basis (f'i) is uniquely defined when the basis (fi) is given since the vectors fi determine P and Q = P-1.

The relation between (f'i) and (fi) is a reciprocal one; for
f'j = Q'Qfj = Q'QPej = Q'ej,
and, if R = (Q')-1 we have fj = R'Rf'j.

If only the set (f1, f2,...., fr) is supposed given originally, and this set of linearly independent vectors is extended by fr+1,...., fn to form a basis of the n-space, then f'r+1,...., f'n individually depend on the choice of fr+1,...., fn. But (28) shows that, if Sfix = 0 (i = 1, 2,...., r), then x belongs to the linear set (f'r+1,....,f'n); hence this linear set is uniquely determined although the individual members of its basis are not. We may therefore without ambiguity call ℑ' = (f'r+1,...., f'n) reciprocal to ℑ = (f1,f2,...., fr); ℑ' is then the set of all vectors x for which Sxy = 0 whenever y belongs to ℑ.

In a later chapter we shall require the following lemma.

Lemma 1. If (f1, f2,...., fr) and (f'r+1,.....,f'n) are reciprocal, so also are (B-1f1, B-1f2,...., B-1fr) and (B'f'r+1, B'f'r+2,....., B'f'n) wtare B is any non-singular matrix.

For SB'f'itB-1fj = Sf'iBB-1fj, = Sf'ifj = δij.

Reciprocal bases have a close connection with reciprocal or inverse matrices in terms of which they might have been defined. If P is non-singular and Pei = fi as above, then P = ∑fiSei( ) and, if Q = ∑eiSf'i( ), then
         PQ = ∑eiSf'ifjSej( ) = ∑δijejSej( ) = 1
so that Q = P-1.

If QQ' = 1, the bases (fi) and (f'i) are identical and Sfifj= δij for all i and j; the basis is then said to be orthogonal as is also the matrix Q. The inverse of an orthogonal matrix and the product of two or more orthogonal matrices are orthogonal; for, if RR' = 1,
         (RQ)(RQ)' = RQQ'R' = RR' = 1.

Suppose that h1, h2,....., hr are real vectors which are linearly independent and for which Shihj = δij (i ≠ j); since hi is real, we have Shihi ≠ 0. If r < n, we can always find a real vector x which is not in the linear set (h1,....., hr) and, if we put
$h_{r + 1} = x - \sum \limits_1^r \frac{h_i S h_i x}{S h_i h_i}$
then hr+1 ≠ 0 and Shihr+1 = 0 (i = 1, 2,......, r). Hence we can extend the original set to form a basis of the fundamental n-space. If we set fi = hi/(Shihi)i then Sfifj = δij even when i = j, this modified basis is called an orthogonal basis of the set.

If the vectors hi are not necessarily real, it is not evident that x can be chosen so that Shr+1hr+1 ≠ 0 when Shihi ≠ 0 (i = 1, 2,...., r). This may be shown as follows. In the first place we cannot have Syhr+1 = 0 for every y, and hence Shr+1hr+1 ≠ 0 when r = n — 1. Suppose now that for every choice of x we have Shr+1hr+1 = 0; we can then choose a basis hr+1,...., hn supplementary to h1,...., hr such that Shihi = 0 (i = r + 1,...., n) and Shihj = 0 (i = r + 1, ...., n; j = 1, 2,....., r). Since we cannot have Shr+1hi = 0 for every hi of the basis of the n-space, this scalar must be different from 0 for some value of i > r, say r + k. If we then put h'r+1 = hr+1 + hr+k; in place of hr+1, we have Shih'r+1 = 0 (i = 1, 2,...., r) as before and also
         Sh'r+1h'r+1 = Shr+1hr+1 + Shr+khr+k + 2Shr+1hr+k
             = 2Shr+1hr+k ≠ 0.
We can therefore extend the basis in the manner indicated for real vectors even when the vectors are complex.

When complex coordinates are in question the following lemma is useful; it contains the case discussed above when the vectors used are real.

Lemma 2. When a linear set of order r is given, it is always possible to choose a basis g1, g2,...., gn of the fundamental space such that g1,...., gr is a basis of the given set and such that Sgig^j = δij where g^j is the vector whose coordinates are the conjugates of the coordinates of gj when expressed in terms of the fundamental basis.

The proof is a slight modification of the one already given for the real case. Suppose that g1,....., gs are chosen so that Sgig^i = δij (i, j = 1, 2,....,s) and such that (g1,...., gs) lies in the given set when s < r and when s > r, then g1,...., gr is a basis of this set. We now put
         $g'_{s + 1} = x - \sum \limits_1^s \frac{g_i S g_i x}{S \bar{g}_i g_i}$
which is not 0 provided x is not in (g1,...., gs) and, if s < r, will lie in the given set provided x does. We may then put
         gs+1 = g's+1/(Sg's+1g^s+1)1/2
and the lemma follows readily, by induction.

If U is the matrix ∑eiSgi, then U^ = ∑eiSg^i and
(29)          UU^' = 1.
Such a matrix is called a unitary matrix and the basis g1, g2,....., gn is called a unitary basis. A real unitary matrix is of course orthogonal.

1.10 The rank of a matrix.

Let A = ||aij|| be a matrix and set (cf. (12) §1.03)
         hi = Aei = ajiej;
then, if
         x = ∑ξiei = ∑eiSeix
is any vector, we have
         Ax = A∑eiSeix = ∑AeiSeix
$(30) \ \ \ \ \ Ax = \sum \limits_1^n h_i S e_i x$

Any expression of the form $Ax = \sum \limits_1^m a_i S b_i x$, where ai, bi are constant vectors, is a linear homogeneous vector function of x. Here (30) shows that it is never necessary to take m > n, but it is sometimes convenient to do so. When we are interested mainly in the matrix and not in x, we may write A = ∑aiSbi( ) or, omitting the brackets, merely
(31)          A = ∑aiSbi.
It follows readily from the definition of the transverse that
(32)          A' = ∑biSai.

No matter what vector x is, Ax, being equal to ∑aiSbix is linearly dependent on a1, a2,..., am or, if the form (30) is used, on h1, h2,...., hn. When |A| ≠ 0, we have seen in Theorem 4 that the h's are linearly independent but, if A is singular, there are linear relations connecting them, and the order of the linear set (a1, a2,...., am) is less than n.

Suppose in (31) that the a'a are not linearly independent, say
         as = α1a1 + α2a2 + ..... + αs-1as-1,
then on substituting this value of as in (31) we have
     $A = a_1 S (b_1 + \alpha_1 b_s) + ... + a_{s-1} S (b_{s-1} + \alpha_{s-1} b_s) + \sum \limits_{s+1}^m a_i S b_i,$
an expression similar to (31) but having at least one term less. A similar reduction can be carried out if the b's are not linearly independent. After a finite number of repetitions of this process we shall finally reach a form
$(33) \ \ \ \ \ \ A =\sum \limits_1^r c_i S d_i$
in which c1, c2,..., cr are linearly independent and also d1, d2,..., dr. The integer r is called the rank of A.

It is clear that the value of r is independent of the manner in which the reduction to the form (33) is carried out since it is the order of the linear set (Ae1, Ae2,...., Aen). We shall, however, give a proof of this which inci-dently yields some important information regarding the nature of A.

Suppose that by any method we have arrived at two forms of A
         $A =\sum \limits_1^r c_i S d_i = \sum \limits_1^s p_i S q_i,$
where (c1, c2,...., cr) and (d1, d2,...., dr) are spaces of order r and (p1, p2,....,ps), (q1, q2,...., qs) spaces of order s, and let (c'r+1, c'r+2,..., c'n),...., (q's+1, q's+2,...., q'n) be the corresponding reciprocal spaces, Then
         $A q'_j = \sum \limits_1^s p_i S q_i q'_j = p_j \ \ \ \ \ \ \ (j = 1, 2, ..., s)$
and also Aq'j = ∑ ciSdiq'j. Hence each pj lies in (c1, c2,...., cr). Similarly each ci lies in (p1, p2,..., ps) so that these two subspaces are the same and, in particular, their orders are equal, that is, r = s. The same discussion with A' in place of A shows that (d1, d2,...., dr) and (q1, q2,...., qs) are the same. We shall call the spaces ℘l = (c1, c2,...., cr), ℘r = (d1, d2,...., dr) the left and right grounds of A, and the total space ℘ = (c1,...., cr, d1,...., dr) will be called the (total) ground of A.

If x is any vector in the subspace Rr = (d'r+1, d'r+2,..., d'n) reciprocal to ℘r, then Ax = 0 since Sdid'j = 0 (i ≠ j). Conversely, if
         0 = Ax = ∑ ciSdix,
each multiplier Sdix must be 0 since the c's are linearly independent; hence every solution of Ax = 0 lies in Rr. Similarly every solution of A'x = 0 lies in Rl = (c'r+1, c'r+2,...., c'n). We call Rr and Rl the right and left nullspaces of A; their order, n — r, is called the nullity of A.

We may summarize these results as follows.

Theorem 7. If a matrix A is expressed in the form $\sum \limits_1^r a_i S b_i$, where ℘l = (a1, a2,...., ar) and ℘r = (b1, b2,...., br) define spaces of order r, then, no matter how the reduction to this form is carried out, the spaces ℘r and ℘l are always the same. Further, if Rl and Rr are the spaces of order n — r reciprocal to ℘l and ℘r, respectively, every solution of Ax = 0 lies in Rr and every solution of A'x = 0 in Rl.

The following theorem is readily deduced from Theorem 7 and its proof is left to the reader.

Theorem 8. If A, B are matrices of rank r, s, the rank of A + B is not greater than r + s and the rank of AB is not greater than the smaller of r and s.

1.11 Linear dependence.

The definition of the rank of a matrix in the preceding section was made in terms of the linear dependence of vectors associated with the matrix. In this section we consider briefly the theory of linear dependence introducing incidentally a notation which we shall require later.

Let $x_i = \sum \limits_{j=1}^n \xi_{ij} e_j$ (i = 1, 2,...., r; r ≤ n) be a set of r vectors. From the rectangular array of their coordinates
         ξ11 ξ12 ...... ξ1n
         ξ21 ξ22 ...... ξ2n
(34)         .......................................................
         ξr1 ξr2 ...... ξrn
there can be formed n!/r!(n — r)! different determinants of order r by choosing r columns out of (34), these columns being taken in their natural order. If these determinants are arranged in some definite order, we may regard them as the coordinates of a vector in space of order n!/r!(n — r)! and, when this is done, we shall denote this vector by
(35)          |x1x2.....xr|
and call it a pure vector of grade r. It follows from this definition that |x1x2....xr| has many of the properties of a determinant; its sign is changed if two x's are interchanged, it vanishes when two x's are equal and, if λ and μ are scalars,
(36)      |(λx1 + μx'1)x2....xr| = λ|x1x2....xr| + μ|x'1x2.....xr|.

If we replace the x'a in (35) by r different units ei1, ei2,...., eir, the result is clearly not 0: we thus obtain ${n \choose r}$ vectors which we shall call the fundamental unit vectors of grade r; and any linear combination of these units, say
         ∑ ξ|ei1ei1....eir|,
is called a vector of grade r. It should be noticed that not every vector is a pure vector except when r equals 1 or n.

If we replace xi by ∑ ξijej in (35), we get
         |x1x2....xr| = ∑ ξ1j1ξ2j2.....ξrjr|ej1ej2....ejr|
where the summation extends over all permutations j1, j2,...., jr of 1, 2,...., n taken r at a time. This summation may be effected by grouping together the sets j1, j2,...., jr which are permutations of the same combination i1, i2,...., ir, whose members may be taken to be arranged in natural order, and then summing these partial sums over all possible combinations i1, i2,...., ir. Taking the first step only we have
$\sum \xi_{1j_1} \xi_{2j_2} ... \xi_{rj_r} | e_{j_1} e_{j_2} ... e_{j_r} | = \sum \delta^{i_1 ... i_r}_{j_1 ... j_r} \xi_{1j_1} ... \xi_{rj_r} | e_{i_1} e_{i_2} ... e_{i_r}|$
where $\delta^{i_1 ... i_r}_{j_1 ... j_r}$ is the sign corresponding to the permutations ${i_1 i_2 ... i_r \choose j_1 j_2 ... j_r}$ and this equals |ξ1i1.....ξrir||ei1....eir|. We have therefore
$(37) \ \ \ \ \ \ | x_1 x_2 ... x_r | = \sum \limits_{(i)}^* |\xi_{1i_1} \xi_{2i_2} . . . \xi_{ri_r}| \ |e_{i_1} e_{i_2} . . . e_{i_r}|$
where the asterisk on ∑ indicates that the sum is taken over all r-combinations of 1, 2, ...., n each combination being arranged in natural order.

Theorem 9 |x1x2....xr| = 0 if, and only if, x1, x2,..., xr are linearly dependent.

The first part of this theorem is an immediate consequence of (36). To prove the converse it is sufficient to show that, if |x1x2....xr-1| ≠ 0, then there exist scalars α1, α2,...., αr-1 such that
         xr = α1x1 + α2x2 + ..... + αr-1xr-1.

Let $x_i = \sum \limits_j \xi_{ij} e_j$. Since |x1x2....xr-1| ≠ 0, at least one of its coordinates is not 0, and for convenience we may suppose without loss of generality that
(38)          |ξ11ξ22.....ξr-1,r-1| ≠ 0.
Since |x1x2....xr| = 0, all its coordinates equal 0 and in particular
         |ξ11ξ22....ξr-1,r-1| = 0      (i = 1, 2,...., n).
If we expand this determinant according to the elements of its last column, we get a relation of the form
         β1ξri + β2ξ1i + ...... + βrξr-1,i = 0
where the β's are independent of i and β1 ≠ 0 by (38). Hence we may write
(39)      ξri = α1ξ1i + .... + αr-1ξr-1,i      (i = 1, 2,...., n)
the α's being independent of i. Multiplying (39) by ei and summing with regard to i, we have
         xr = α1x1 + .... + αr-1xr-1,
which proves the theorem.

If (a1, a2,...., am) is a linear set of order r, then some set of r a's form a basis, that is, are linearly independent while each of the other a's is linearly dependent on them. By a change of notation, if necessary, we may take a1, a2,...., ar as this basis and write
$(40) \ \ \ \ \ \ a_{r+i} = \sum \limits_{j=1}^r \beta_{ij} a_j, \ \ \ \ \ \ \ (i = 1, 2, ... , m-r)$
We shall now discuss the general form of all linear relations among the a's in terms of the special relations (40); and in doing so we may assume the order of the space to be equal to or greater than m since we may consider any given space as a subspace of one of arbitrarily higher dimensionality.

$(41) \ \ \ \ \ \ \ \ \sum \limits_1^m r_j a_j = 0$
be a relation connecting the a's and set
         $c = \sum \limits_1^m r_j e_j$
Then (40), considered as a special case of (41), corresponds to settmg for c
$(42) \ \ \ \ \ \ c_i = - \sum \limits_{j=1}^r \beta_{ij} e_j + e_{r+i}, \ \ \ \ \ \ \ (i = 1, 2, ... , m-r)$
and there is clearly no linear relation connecting ihese vectors so that they define a linear set of order m — r. Using (40) in (41) we have
         $\sum \limits_{j=1}^r (\gamma_j + \sum \limits_{i=1}^{m-r} \gamma_{r+i} \beta_{ij} ) a_j = 0$
and, since a1, a2,...., ar are linearly independent, we have
         $j = - \sum \limits_{i=1}^{m-r} \beta_{ij} \gamma_{r+i}$
$(43) \ \ \ \ \ \ \ c = \sum \limits_1^m \gamma_j e_j = - \sum \limits_{i=1}^{m-r} \gamma_{r+i} \sum \limits_{j=1}^{r} \beta_{ij} e_j + \sum \limits_{i=1}^{m-r} \gamma_{r+i} e_{r+i} = \sum \limits_{i=1}^{m-r} \gamma_{r+i} c_i,$
so that c is linearly dependent on c1, c2,...., cm-r. Conversely, on retracing these steps in the reverse order we see that, if c is linearly dependent on these vectors, so that γr+i (i = 1, 2,...., m — r) are known, then from (43) the γj (j = 1, 2,...., r) are defined in such a way that $c = \sum \limits_1^m \gamma_j e_j \ and \ \sum \limits_1^m \gamma_j a_j = 0$. We have therefore the following theorem.

Theorem 10. If a1, a2,...., am is a linear set of order r, there exist m — r linear relations $\sum \limits_{j=1}^m \gamma_{ij} a_j = 0$ (i = 1, 2,...., m — r) such that (i) the vectors $c_i = \sum \limits_{j=1}^m \gamma_{ij} e_j$ are linearly independent and (ii) if ∑ γjaj = 0 is any linear relation connecting the a's, and if c = ∑ γjej, then c belongs to the linear set (c1, c2,...., cm—r).

This result can be translated immediately in terms concerning the solution of a system of ordinary linear equations or in terms of matrices. If $a_j = \sum \limits_i a_{ji} e_i$, then (41) may be written
         a11γ1 + a21γ2 + ...... + am1γm = 0
(44)          ...........................................................................................
         a1nγ1 + a2nγ2 + ...... + amnγm = 0
a system of linear homogeneous equations in the unknowns γ1, γ2,...., γm. Hence (44) has solutions for which some γi ≠ 0 if, and only if, the rank r of the array
$(45) \ \ \ \ \ \ \ \ \ \ \begin{matrix} a_{11} & a_{21} & ... & a_{m1} \\ a_{12} & a_{22} & ... & a_{m2} \\ . & . & ... & . \\ a_{1n} & a_{2n} & ... & a_{mn} \end{matrix}$
is less then m and, when this condition is satisfied, every solution is linearly dependent on the set of m — r solutions given by (42) which are found by the method given in the discussion of Theorem 9.

Again, if we make (45) a square array by the introduction of columns or rows of zeros and set A = ||aij||, c = ∑ γiei, then (41) becomes A'c = 0 and Theorem 10 may therefore be interpreted as giving the properties of the nullspace of A' which were derived in §1.10.

Contact email:
Follow us on   Twitter   Facebook
  Math10 Banners  
Copyright © 2005 - 2022