Matrices and Vectors

1.01 Linear transformations and vectors.

In a set of linear equations
η'₁ = a₁₁η₁ + a₁₂η₂ + ..... + a_1nη_n
η'₂ = a₂₁η₁ + a₂₂η₂ + ..... + a_2nη_n
................................................................................................................................
η'_n = a_n1η₁ + a_n2η₂ + ..... + a_nnη_n
or
$(1) \ \ \ \ \ \ \underline{\eta'_i = \sum \limits_{i=1}^n a_{ij}\eta_i \ \ \ \ \ \ (i = 1, 2, . . . ,n)}$

the quantities η₁,η₂,...., η_n may be regarded as the coordinates of a point P in n-space and the point P'(η'₁, η'₂,....,η'_n) is then said to be derived from P by the linear homogeneous transformation (1). Or, in place of regarding the η's as the coordinates of a point we may look on them as the components of a vector y and consider (1) as defining an operation which transforms y into a new vector y'. We shall be concerned here with the properties of such transformations, sometimes considered abstractly as entities in themselves, and sometimes in conjunction with vectors.

To prevent misconceptions as to their meaning we shall now define a few terms which are probably already familiar to the reader. By a scalar or number we mean an element of the field in which all coefficients of transformations and vectors are supposed to lie; unless otherwise stated the reader may assume that a scalar is an ordinary number real or complex.

A vector of order n is defined as a set of n scalars (ξ₁, ξ₂,...., ξ_n) given in a definite order. This set, regarded as a single entity, is denoted by a single symbol, say x, and we write
x = (ξ₁, ξ₂,.....,ξ_n).
The scalars ξ₁, ξ₂,....., ξ_n are called the coordinates or components of the vector. If y = (η₁,η₂,.....,η_n) is also a vector, we say that x = y if, and only if, corresponding coordinates are equal, that is, ξ_i = η_I, (i = 1, 2,......, n). The vector
z = (ζ₁, ζ₂,...., ζ_n) = (ξ₁ + η₁, ξ₂ + η₂,....., ξ_n + η_n)
is called the sum of x and y and is written x + y; it is easily seen that the operation of addition so defined is commutative and associative, and it has a unique inverse if we agree to write 0 for the vector (0, 0,...., 0).

If ρ is a scalar, we shall write
ρx = xρ = (ρξ_i, ρξ₂,...., ρξ_n).
This is the only kind of multiplication we shall use regularly in connection with vectors.

1.02 Linear dependence.

In this section we shall express in terms of vectors the familiar notions of linear dependence. If x₁, x₂,....., x_r are vectors and ω₁, ω₂,...., ω_r scalars, any vector of the form
(2) x = ω₁x₁ + ω₂x₂ + ..... + ω_rx_r
is said to be linearly dependent on x₁, x₂,...., x_r; and these vectors are called linearly independent if an equation which is reducible to the form
0 = ω₁x₁ + ω₂x₂ + ..... + ω_rx_r
can only be true when each ω_i = 0. Geometrically the r vectors determine an r-dimensional subspace of the original n-space and, if x₁, x₂,...., x_r are taken as the coordinate axes, ω₁, ω₂,...., ω_r in (2) are the coordinates of x.

We shall call the totality of vectors x of the form (2) the linear set or subspace (x₁, x₂,...., x_r) and, when x₁, x₂,...., x_r are linearly independent, they are said to form a basis of the set. The number of elements in a basis of a set is called the order of the set.

Suppose now that (x₁, x₂,...., x_r), (y₁, y₂,...., y_s) are bases of the same linear set and assume s ≥ r. Since the x's form a basis, each y can be expressed in the form
(3)              y_i = a_i1x₁ + a_i2x₂ + .... + a_irx_r (i = 1, 2,...., s)
and, since the y'a form a basis, we may set
             x_i = b_i1y₁ + b_i2y₂ + ..... + b_isy_s (i = 1, 2,...., r)
and therefore from (3)
(4)          $y_i = \sum \limits_{j=1}^r a_{ij}x_j = \sum \limits_{j=1}^r a_{ij} \sum \limits_{k=1}^s b_{jk}y_k = \sum \limits_{k=1}^s c_{ik}y_k,$
where $c_{ik} = \sum \limits_{j=1}^r a_{ij}b_{jk}$ which may also be written
(5)              $c_{ik} = \sum \limits_{j=1}^s a_{ij}b_{jk} \ \ \ \ \ (i = 1,2,...,s)$
if we agree to set a_ij = 0 when j > r. Since the y's are linearly independent, (4) can only hold true if c_ii = 1, c_ik = 0 (i ≠ k) so that the determinant |c_ik| = 1. But from the rule for forming the product of two determinants it follows from (5) that |c_ik| = | a_ik||b_ik| which implies (i) that |a_ik| ≠ 0 and (ii) that r = s, since otherwise |a_ik| contains the column a_i,r+1 each element of which is 0. The order of a set is therefore independent of the basis chosen to represent it.

It follows readily from the theory of linear equations (or from §1.11 below) that, if |a_ij| ≠ 0 in (3), then these equations can be solved for the x's in terms of the y'sy so that the conditions established above are sufficient as well as necessary in order that the y's shall form a basis.

If e_i denotes the vector whose i-th coordinate is 1 and whose other coordinates are 0, we see immediately that we may write
x = ξ₁e₁ + ξ₂e₂ + .... + ξ_ne_n
in place of x = (ξ₁, ξ₂,..., ξ_n). Hence e₁, e₂,...., e_n form a basis of our n-space. We shall call this the fundamental basis and the individual vectors e_i the fundamental unit vectors.

If x₁, x₂,....., x_r(r < n) is a basis of a subspace of order r, we can always find n — r vectors x_r+1,...., x_n such that x₁, x₂,...., x_n is a basis of the fundamental space. For, if x_r+1 is any vector not lying in (x₁, x₂,...., x_r), there cannot be any relation
ω₁x₁ + ω₂x₂ + .... + ω_rx_r + ω_r+1x_r+1 = 0
in which ω_r+1 ≠ 0 (in fact every ω must be 0) and hence the order of (x₁, x₂,...., x_r,x_r+1) is r + 1. Since the order of (e₁, e₂,..., e_n) is n, a repetition of this process leads to a basis x₁, x₂,..., x_r,...., x_n of order n after a finite number of steps; a suitably chosen e_i may be taken for x_r+1. The (n — r)-space (x_r+1,...., x_n) is said to be complementary to (x₁, x₂,..., x_r); it is of course not unique.

1.03 Linear vector functions and matrices.

The set of linear equations given in §1.01, namely,
(6) $\underline{\eta'_i = \sum \limits_{j=1}^n a_{ij}\eta_j}$
define the vector y' = (η'1, η'₂,.....,η'_n) as a linear homogeneous function of the coordinates of y = (η₁, η₂,..., η_n) and in accordance with the usual functional notation it is natural to write y' = A(y); it is usual to omit the brackets and we therefore set in place of (6)
y' = Ay.

The function or operator A when regarded as a single entity is called a matrix; it is completely determined, relatively to the fundamental basis, when the n² numbers a_ij are known, in much the same way as the vector y is determined by its coordinates. We call the a_ij the coordinates of A and write
(7) $A = \begin{Vmatrix}a_{11} & a_{12} & . . . & a_{1n} \\ a_{21} & a_{22} & . . . & a_{2n} \\ . . & . . & . . . & . \\ . . & . . & . . . & . \\ a_{n1} & a_{n2} & . . . & a_{nn} \end{Vmatrix}$
or, when convenient, A = ||a_ij||. It should be noted that in a_ij the first suffix denotes the row in which the coordinate occurs while the second gives the column.

If B = ||b_ij|| is a second matrix, y" = A(By) is a vector which is a linear vector homogeneous function of y, and from (6) we have
         $\eta''_i = \sum \limits_{p=1}^n a_{ip} \sum \limits_{p=1}^n b_{pj} \eta_j = \sum \limits_{j=1}^n d_{ij} \eta_j,$
where
(8)          $d_{ij} = \sum \limits_{p=1}^n a_{ip} b_{pj}$
The matrix D = ||d_ij|| is called the product of A into B and is written AB. The form of (8) should be carefully noted; in it each element of the i-th row of A is multiplied into the corresponding element of the j-th column of B and the terms so formed are added. Since the rows and columns are not interchangeable, AB is in general different from BA; for instance
         $\begin{Vmatrix}1 & 0 \\ 2 & 1\end{Vmatrix} \ \ \begin{Vmatrix} a & b \\ c & d\end{Vmatrix} = \begin{Vmatrix} a & b \\ 2a + c & 2b + d\end{Vmatrix} \\ \begin{Vmatrix} a & b \\ c & d\end{Vmatrix} \ \ \begin{Vmatrix}1 & 0 \\ 2 & 1\end{Vmatrix} = \begin{Vmatrix} a + 2b & b \\ c + 2d & d\end{Vmatrix}$

The product defined by (8) is associative; for if C = ||c_ij||, the element in the i-th row and j-th column of (AB)C is
$\sum \limits_{q=1}^n (\sum \limits_{p=1}^n a_{ip} b_{pq}) c_{qj} = \sum \limits_{p=1}^n a_{ip} (\sum \limits_{q=1}^n b_{pq} c_{qj})$
and the term on the right is the (i, j) coordinate of A(BC).

If we add the vectors Ay and By, we get a vector whose i-th coordinate is (cf. (6))
         $\eta'_i = \sum \limits_{j=1}^n a_{ij}\eta_j + \sum \limits_{j=1}^n b_{ij}\eta_j + \sum \limits_{j=1}^n c_{ij}\eta_j$
where c_ij = a_ij + b_ij. Hence Ay + By may be written Cy where C = ||c_ij||. We define C to be the sum of A and B and write C = A + B; two matrices are then added by adding corresponding coordinates just as in the case of vectors. It follows immediately from the definition of sum and product that
         A + B = B + A,

         (A + B) + C = A + (B + C),

         A(B + C) = AB + AC, (B + C)A = BA + CA,

         A(x + y) = Ax + Ay,
A, B, C being any matrices and x, y vectors. Also, if k is a scalar and we set y' = Ay, y" = ky', then
         y" = ky' = kA(y) = A(ky)
or in terms of the coordinates
         $\eta''_i = \sum \limits_{i} ka_{ij}\eta_j$
Hence kA may be interpreted as the matrix derived from A by multiplying each coordinate of A by k.

On the analogy of the unit vectors e» we now define the fundamental unit matrices e_ij (i, j = 1, 2,..., n). Here e_ij is the matrix whose coordinates are all 0 except the one in the i-th row and j-th column whose value is 1. Corresponding to the form ∑ξ_ie_i for a vector we then have
(9)              $A = \sum \limits_{i,j = 1}^n a_{ij} e_{ij}$
Also from the definition of multiplication in (8)
(10)          e_ije_jk = e_ik, e_ije_pq = 0, (j ≠ p)
a set of relations which might have been made the basis of the definition of the product of two matrices. It should be noted that it follows from the definition of e_ij that
(11)          e_ije_j = e_i, e_ije_k = 0 (j ≠ k),
(12)          $A e_k = \sum \limits_{i,j} a_{ij} e_{ij} e_k = \sum \limits_{i} a_{ik} e_i$
Hence the coordinates of Ae_k are the coordinates of A that lie in the k-th column.

1.04 Scalar matrices.

If k is a scalar, the matrix K defined by Ky = ky is called a scalar matrix; from (1) it follows that, if K = ||k_ij||, then k_ii = k (i = 1, 2,...., n), k_ij = 0 (i ≠ j). The scalar matrix for which k = 1 is called the identity matrix of order n; it is commonly denoted by I but, for reasons explained below, we shall here usually denote it by 1, or by 1_n if it is desired to indicate the order. When written at length we have
$I_n = \begin{Vmatrix} 1 & & & & & \\ & 1 & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & 1 \end{Vmatrix}, \ \ K = \begin{Vmatrix} k & & & & & \\ & k & & & & \\ & & . & & & \\ & & & . & & \\ & & & & . & \\ & & & & & k \end{Vmatrix}$

A convenient notation for the coordinates of the identity matrix was introduced by Kronecker: if δ_ij is the numerical function of the integers i, j defined by
(13) δ_ii = 1, δ_ij = 0 (i ≠ j)
then 1_n = ||δ_ij||. We shall use this Kronecker delta function in future without further comment.

Theorem 1. Every matrix is commutative with a scalar matrix.

Let k be the scalar and K = ||k_ij|| = ||kδ_ijij|| is any matrix, then from the definition of multiplication
$KA = || \sum \limits_p k_{ip} a_{pj}|| = || \sum \limits_p k \delta_{ip} a_{pj}|| = ||k a_{ij}|| \\ AK = || \sum \limits_p a_{ip} k_{pj}|| = || \sum \limits_p k a_{ip} \delta_{pj}|| = ||k a_{ij}||$
so that AK = KA.

If k and h are two scalars and K, H the corresponding scalar matrices, then K + H and KH are the scalar matrices corresponding to k + h and kh. Hence the one-to-one correspondence between scalars and scalar matrices is maintained under the operations of addition and multiplication, that is, the two sets are simply isomorphic with respect to these operations. So long therefore as we are concerned only with matrices of given order, there is no confusion introduced if we replace each scalar by its corresponding scalar matrix, just as in the theory of ordinary complex numbers, (a, b) = a + bi, the set of numbers of the form (a, 0) is identified with the real continuum. We shall therefore as a rule denote ||δ_ij|| by 1 and ||kδ_ij|| by k.

1.05 Powers of a matrix; adjoint matrices.

Positive integral powers of A = ||a_ij|| are readily defined by induction; thus
A² = A • A, A³ = A • A², ...., A^m = A • A^m-1
With this definition it is clear that A^rA^s = A^r+s for any positive integers r, s. Negative powers, however, require more careful consideration.

Let the determinant formed from the array of coefficients of a matrix be denoted by
|A| = det.A
and let α_qp be the cofactor of a_pq in A, so that from the properties of determinants
(14) $\sum \limits_p a_{ip} \alpha_{pj} = |A| \delta_{ij} = \sum \limits_p \alpha_{ip} a_{pj} \ \ \ \ (i,j = 1, 2, ..., n)$
The matrix ||α_ij|| is called the adjoint of A and is denoted by adj A. In this notation (14) may be written
(15) A (adj A) = |A| = (adj A)A,
so that a matrix and its adjoint are commutative.

If |A| ≠ 0, we define A^-1 by
(16) A^-1 = |A|^-1 adj A.
Negative integral powers are then defined by A^-r = (A^-1)r; evidently A^-r = (A^r)^-1. We also set A⁰ = 1, but it will appear later that a different interpretation must be given when |A| = 0. Since AB • B^-1A^-1 = A • BB^-1 • A^-1 = AA^-1 = 1, the reciprocal of the product AB is
(AB)^-1 = B^-1A^-1

If A and B are matrices, the rule for multiplying determinants, when stated in our notation, becomes
|AB| = |A||B|.
In particular, if AB = 1, then |A||B| = 1; hence, if |A| = 0, there is no matrix B such that AB = 1 or BA = 1. The reader should notice that, if k is a scalar matrix of order n, then |k| = kⁿ.

If A = 0, A is said to be singular; if A ≠ 0, A is regular or non-singular. When A is regular, A^-1 is the only solution of AX = 1 or of XA = 1. For, if AX = 1, then
A^-1 = A^-1 • 1 = A^-1AX = X.
If AX = 0, then either X = 0 or A is singular; for, if A^-1 exists,
0 = A^-1Ax = X.

If A² = A ≠ 0, then A is said to be idempotent, for example e₁₁ and $\begin{Vmatrix} 4 & -2 \\ 6 & -3 \end{Vmatrix}$ are idempotent. A matrix a power of which is 0 is called nilpotent. If the lowest power of A which is 0 is A^r, r is called the index of A; for example, if A = e₁₂ + e₂₃ + e₃₄, then
A² = e₁₃ + e₂₄, A³ = e₁₄, A⁴ = 0,
so that the index of A in this case is 4.

1.06 The transverse of a matrix.

If A = ||a_ij|| the matrix ||a'_ij|| in which a'_ij = a_ij is called the transverse of A and is denoted by A'. For instance the transverse of
$\begin{Vmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{Vmatrix} \ is \ \begin{Vmatrix} a_{11} & a_{21} & a_{31} \\ a_{12} & a_{22} & a_{32} \\ a_{13} & a_{23} & a_{33} \end{Vmatrix}$
The transverse, then, is obtained by the interchange of corresponding rows and columns. It must be carefully noted that this definition is relative to a particular set of fundamental units and, if these are altered, the transverse must also be changed.

Theorem 2. The transverse of a sum is the sum of the transverses of the separate terms, and the transverse of a product, is the product of the transverses of the separate factors in the reverse order.

The proof of the first part of the theorem h immediate and is left to the reader. To prove the second it is sufficient to consider two factors. Let A = ||a_ij||, B = ||b_ij||, C = AB = ||c_ij|| and, as above, set a'_ij = a_ji, b'_ij = b_ji,c'_ij = c_ji, then
theorem 2 $c'_{ij} = c_{ji} = \sum \limits_p a_{jp} b_{pi} = \sum \limits_p b'_{ip} a'_{pj}$
whence
(AB)' = C' = B'A'.
The proof for any number of factors follows by induction.

If A = A', A is said to be symmetric and, if A = —A', it is called skew-symmetric or skew. A scalar matrix k is symmetric and the transverse of kA is kA'.

Theorem 3. Every matrix can be expressed uniquely as the sum of a symmetric and a skew matrix.

For if A = B + C, B' = B, C' = -C, then A' = B' + C' = B - C and therefore
B = (A + A')/2, C = (A - A')/2.
Conversely 2A = (A + A') + (A — A') and A + A' is symmetric, A — A' skew.

1.07 Bilinear forms.

A scalar bilinear form in two variable vectors, x = ∑ξ_ie_i, y = ∑η_ie_i, is a function of the form
(17) $A(x,y) = \sum \limits_{i,j = 1}^n a_{ij} \xi_i \eta_j$
There is therefore a one-to-one correspondence between such forms and matrices, A = ||a_ij|| corresponding to A(x, y). The special form for which A = ||δ_ij|| = 1 is of very frequent occurrence and we shall denote it by S; it is convenient to omit the brackets and write simply
(18) Sxy = ξ₁η₁ + ξ₂η₂ + ..... + ξ_nη_n
and, because of the manner in which it appears in vector analysis, we shall call it the scalar of xy. Since S is symmetric, Sxy = Syx.

The function (17) can be conveniently expressed in terms of A and S; for we may write A(x, y) in the form
$A(x,y) = \sum \limits_{i = 1}^n \xi_i (\sum \limits_{j = 1}^n a_{ij} \eta_j) = Sx Ay$
It may also be written
$\sum \limits_{j = 1}^n (\sum \limits_{i = 1}^n a_{ij} \xi_i) \eta_j = S A'xy = Sy A'x;$
hence
(19) SxAy = SyA'x,
so that the form (17) is unaltered when x and y are interchanged if at the same time A is changed into A'. This gives another proof of Theorem 2. For
Sx(AB)'y = SyABx = SBxA'y = SxB'A'y,
which gives (AB)' = B'A' since x and y are independent variables.

1.0.8 Change of basis.

We shall now investigate more closely the effect of a change in the fundamental basis on the coordinates of a vector or matrix. If f₁, f₂,...,f_n is a basis of our n-space, we have seen (§1.02) that the f's are linearly independent. Let
$(20) \ \ \ \ f_i = \sum \limits_{j = 1}^n p_{ji} e_j = P e_i, \ \ \ (i = 1,2,...,n)$
P = ||p_ij||.
Since the f's form a basis, the e's are linearly expressible in terms of them, say
$(21) \ \ \ \ e_i = \sum \limits_{j = 1}^n q_{ji} f_i,$
and, if Q = ||q_ij||, this may be written
$(22) \ \ \ \ e_i = \sum \limits_j q_{ji} \sum \limits_k p_{kj} e_k = PQ e_i \ \ \ \ (i = 1, 2, 3,..., n)$
Hence PQ = 1, which is only possible if |P| ≠ 0, Q = P^-1.

Conversely, if |P| ≠ 0, Q = P^-1, and f_i = Pe_i as in (20), then (22) holds and therefore also (21), that is, the e's, and therefore also any vector x, are linearly expressible in terms of the f's. We have therefore the following theorem.

Theorem 4. If f_i = Pe_i (i = 1, 2,...., n), the vectors f_i form a basis if, and only if |P| ≠ 0.

If we have fewer than n vectors, say f₁, f₂, ....,f_r, we have seen in 1.02 that we can choose f_r+1,...., f_n so that f₁, f₂,...., f_n form a basis. Hence

Theorem 5. If f₁,f₂,....,f_r are linearly independent, there exists at least one non-singular matrix P such that Pe_i = f_i; (i = 1, 2,...., r).

We shall now determine how the form Sxy which was defined relatively to the fundamental basis, is altered by a change of basis. As above let
(23)          f_i = Pe_i,      e_i = P^-1f_i = Qf_i,      |P| ≠ 0,          (i = 1, 2,...., n)
be a basis and
         x = ∑ξ_ie_i = ∑ξ'_if_i,      y = ∑η_ie_i = ∑η'_if_i
variable vectors; then from (23)
         x = Q∑ξ_if_i = P∑ξ'_ie_i,      y = Q∑η_if_i = P∑η'_ie_i
and
         ∑ξ'_ie_i = P^-1x = Qx,      ∑η'_ie_i = Qy.
Let us set temporarily S_exy for Sxy and also put S_fxy = ∑ξ'_iη'_i, the corresponding form with reference to the new basis; then
(24)          S_fxy = S_eQxQy = S_exQ'Qy          S_exy = S_fPxPy.

Consider now a matrix A = ||a_ij|| defined relatively to the fundamental basis and let A₁ be the matrix which has the same coordinates when expressed in terms of the new basis as A has in the old. From the definition of A and from ξ_i = S_ee_jX we have
$Ax = \sum \limits_{i,j} a_{ij} \xi_j e_i = \sum \limits_{i,j} a_{ij} e_i S_e e_j x$
and hence
(25) A₁ = ∑a_ijξ'_if_i = ∑ a_ijf_iS_ff_jx = ∑a_ijQ^-1e_iS_eQf_jQx = Q^-1∑a_ije_iS_se_jQx = Q^-1AQx
We have therefore, remembering that Q = P^-1,

Theorem 6. If f_i = Pe_i; (i = 1, 2,...., n) is a basis and A any matrix, the matrix PAP^-1 has the same coordinates when expressed in terms of this basis as A has in terms of the fundamental basis.

The matrix Q^-1AQ is said to be similar to A and to be the transform of A by Q. Obviously the transform of a product (sum) is the product (sum) of the transforms of the individual factors (terms) with the order unaltered. For instance Q^-1ABQ = Q^-1AQ • Q^-1BQ.

Theorem 6 gives the transformation of the'matric units e_ij defined in §1.03 which corresponds to the vector transformation (23); the result is that, if f_ij is the unit in the new system corresponding to e_ij, then
         f_ij = Pe_ijP^-1
which is readily verified by setting
         A = e_ij = e_iS_ee_j( ),      A₁ = f_ij = f_iS_ff_i( )
in (25). The effect of the change of basis on the form of the transverse is found as follows. Let A* be defined by
         S_fxAy = S_fyA*x;
then
         S_fyA*x = S_fxAy = S_eQxQAy = S_exQ'QAy = S_eQy(Q')A'Q'Qx
             := S_fy(Q'Q)A'Q'Qx.
Hence
(26)          A* = (Q'Q)A'Q'Q.

1.09 Reciprocal and orthogonal bases.

With the same notation as in the previous section we have S_ff_if_j = 0 (i ≠ j), S_ff_if_j = 1. Hence
         δ_ij = S_ff_if_j = S_eQf,sub>iQf_j = S_ef_iQ'Qf_j.
If, therefore, we set
(27)          f'_iQ'Qf_i      (j= 1,2, .... n),
we have, on omitting the subscript e in S_e,
(28)          Sf_if'_j = δ_ij      (i,j = 1,2,...., n).
Since |Q'Q| ≠ 0, the vectors f'₁, f'₂,..., f'_n form a basis which we say is reciprocal to f₁, f₂,.....,f_n. This definition is of course relative to the fundamental basis since it depends on the function S but, apart from this the basis (f'_i) is uniquely defined when the basis (f_i) is given since the vectors f_i determine P and Q = P^-1.

The relation between (f'_i) and (f_i) is a reciprocal one; for
f'_j = Q'Qf_j = Q'QPe_j = Q'e_j,
and, if R = (Q')^-1 we have f_j = R'Rf'_j.

If only the set (f₁, f₂,...., f_r) is supposed given originally, and this set of linearly independent vectors is extended by f_r+1,...., f_n to form a basis of the n-space, then f'_r+1,...., f'_n individually depend on the choice of f_r+1,...., f_n. But (28) shows that, if Sf_ix = 0 (i = 1, 2,...., r), then x belongs to the linear set (f'_r+1,....,f'_n); hence this linear set is uniquely determined although the individual members of its basis are not. We may therefore without ambiguity call ℑ' = (f'_r+1,...., f'_n) reciprocal to ℑ = (f₁,f₂,...., f_r); ℑ' is then the set of all vectors x for which Sxy = 0 whenever y belongs to ℑ.

In a later chapter we shall require the following lemma.

Lemma 1. If (f₁, f₂,...., f_r) and (f'_r+1,.....,f'_n) are reciprocal, so also are (B^-1f₁, B^-1f₂,...., B^-1f_r) and (B'f'_r+1, B'f'_r+2,....., B'f'_n) wtare B is any non-singular matrix.

For SB'f'_itB^-1f_j = Sf'_iBB^-1f_j, = Sf'_if_j = δ_ij.

Reciprocal bases have a close connection with reciprocal or inverse matrices in terms of which they might have been defined. If P is non-singular and Pe_i = f_i as above, then P = ∑f_iSe_i( ) and, if Q = ∑e_iSf'_i( ), then
PQ = ∑e_iSf'_if_jSe_j( ) = ∑δ_ije_jSe_j( ) = 1
so that Q = P^-1.

If QQ' = 1, the bases (f_i) and (f'_i) are identical and Sf_if_j= δ_ij for all i and j; the basis is then said to be orthogonal as is also the matrix Q. The inverse of an orthogonal matrix and the product of two or more orthogonal matrices are orthogonal; for, if RR' = 1,
(RQ)(RQ)' = RQQ'R' = RR' = 1.

Suppose that h₁, h₂,....., h_r are real vectors which are linearly independent and for which Sh_ih_j = δ_ij (i ≠ j); since h_i is real, we have Sh_ih_i ≠ 0. If r < n, we can always find a real vector x which is not in the linear set (h₁,....., h_r) and, if we put
$h_{r + 1} = x - \sum \limits_1^r \frac{h_i S h_i x}{S h_i h_i}$
then h_r+1 ≠ 0 and Sh_ih_r+1 = 0 (i = 1, 2,......, r). Hence we can extend the original set to form a basis of the fundamental n-space. If we set f_i = h_i/(Sh_ih_i)ⁱ then Sf_if_j = δ_ij even when i = j, this modified basis is called an orthogonal basis of the set.

If the vectors h_i are not necessarily real, it is not evident that x can be chosen so that Sh_r+1h_r+1 ≠ 0 when Sh_ih_i ≠ 0 (i = 1, 2,...., r). This may be shown as follows. In the first place we cannot have Syh_r+1 = 0 for every y, and hence Sh_r+1h_r+1 ≠ 0 when r = n — 1. Suppose now that for every choice of x we have Sh_r+1h_r+1 = 0; we can then choose a basis h_r+1,...., h_n supplementary to h₁,...., h_r such that Sh_ih_i = 0 (i = r + 1,...., n) and Sh_ih_j = 0 (i = r + 1, ...., n; j = 1, 2,....., r). Since we cannot have Sh_r+1h_i = 0 for every h_i of the basis of the n-space, this scalar must be different from 0 for some value of i > r, say r + k. If we then put h'_r+1 = h_r+1 + h_r+k; in place of h_r+1, we have Sh_ih'_r+1 = 0 (i = 1, 2,...., r) as before and also
Sh'_r+1h'_r+1 = Sh_r+1h_r+1 + Sh_r+kh_r+k + 2Sh_r+1h_r+k
= 2Sh_r+1h_r+k ≠ 0.
We can therefore extend the basis in the manner indicated for real vectors even when the vectors are complex.

When complex coordinates are in question the following lemma is useful; it contains the case discussed above when the vectors used are real.

Lemma 2. When a linear set of order r is given, it is always possible to choose a basis g₁, g₂,...., g_n of the fundamental space such that g₁,...., g_r is a basis of the given set and such that Sg_ig^_j = δ_ij where g^_j is the vector whose coordinates are the conjugates of the coordinates of g_j when expressed in terms of the fundamental basis.

The proof is a slight modification of the one already given for the real case. Suppose that g₁,....., g_s are chosen so that Sg_ig^{^}_i = δ_ij (i, j = 1, 2,....,s) and such that (g₁,...., g_s) lies in the given set when s < r and when s > r, then g₁,...., g_r is a basis of this set. We now put
$g'_{s + 1} = x - \sum \limits_1^s \frac{g_i S g_i x}{S \bar{g}_i g_i}$
which is not 0 provided x is not in (g₁,...., g_s) and, if s < r, will lie in the given set provided x does. We may then put
g_s+1 = g'_s+1/(Sg'_s+1g^{^}_s+1)^1/2
and the lemma follows readily, by induction.

If U is the matrix ∑e_iSg_i, then U^{^} = ∑e_iSg^{^}_i and
(29) UU^{^}' = 1.
Such a matrix is called a unitary matrix and the basis g₁, g₂,....., g_n is called a unitary basis. A real unitary matrix is of course orthogonal.

1.10 The rank of a matrix.

Let A = ||a_ij|| be a matrix and set (cf. (12) §1.03)
         h_i = Ae_i = a_jie_j;
then, if
         x = ∑ξ_ie_i = ∑e_iSe_ix
is any vector, we have
         Ax = A∑e_iSe_ix = ∑Ae_iSe_ix
or
$(30) \ \ \ \ \ Ax = \sum \limits_1^n h_i S e_i x$

Any expression of the form $Ax = \sum \limits_1^m a_i S b_i x$, where a_i, b_i are constant vectors, is a linear homogeneous vector function of x. Here (30) shows that it is never necessary to take m > n, but it is sometimes convenient to do so. When we are interested mainly in the matrix and not in x, we may write A = ∑a_iSb_i( ) or, omitting the brackets, merely
(31) A = ∑a_iSb_i.
It follows readily from the definition of the transverse that
(32) A' = ∑b_iSa_i.

No matter what vector x is, Ax, being equal to ∑a_iSb_ix is linearly dependent on a₁, a₂,..., a_m or, if the form (30) is used, on h₁, h₂,...., h_n. When |A| ≠ 0, we have seen in Theorem 4 that the h's are linearly independent but, if A is singular, there are linear relations connecting them, and the order of the linear set (a₁, a₂,...., a_m) is less than n.

Suppose in (31) that the a'a are not linearly independent, say
a_s = α₁a1 + α₂a2 + ..... + α_s-1as-1,
then on substituting this value of a_s in (31) we have
$A = a_1 S (b_1 + \alpha_1 b_s) + ... + a_{s-1} S (b_{s-1} + \alpha_{s-1} b_s) + \sum \limits_{s+1}^m a_i S b_i,$
an expression similar to (31) but having at least one term less. A similar reduction can be carried out if the b's are not linearly independent. After a finite number of repetitions of this process we shall finally reach a form
$(33) \ \ \ \ \ \ A =\sum \limits_1^r c_i S d_i$
in which c₁, c₂,..., c_r are linearly independent and also d₁, d₂,..., d_r. The integer r is called the rank of A.

It is clear that the value of r is independent of the manner in which the reduction to the form (33) is carried out since it is the order of the linear set (Ae₁, Ae₂,...., Ae_n). We shall, however, give a proof of this which inci-dently yields some important information regarding the nature of A.

Suppose that by any method we have arrived at two forms of A
$A =\sum \limits_1^r c_i S d_i = \sum \limits_1^s p_i S q_i,$
where (c₁, c₂,...., c_r) and (d₁, d₂,...., d_r) are spaces of order r and (p₁, p₂,....,p_s), (q₁, q₂,...., q_s) spaces of order s, and let (c'_r+1, c'_r+2,..., c'_n),...., (q'_s+1, q'_s+2,...., q'_n) be the corresponding reciprocal spaces, Then
$A q'_j = \sum \limits_1^s p_i S q_i q'_j = p_j \ \ \ \ \ \ \ (j = 1, 2, ..., s)$
and also Aq'_j = ∑ c_iSd_iq'_j. Hence each p_j lies in (c₁, c₂,...., c_r). Similarly each c_i lies in (p₁, p₂,..., p_s) so that these two subspaces are the same and, in particular, their orders are equal, that is, r = s. The same discussion with A' in place of A shows that (d₁, d₂,...., d_r) and (q₁, q₂,...., q_s) are the same. We shall call the spaces ℘_l = (c₁, c₂,...., c_r), ℘_r = (d₁, d₂,...., d_r) the left and right grounds of A, and the total space ℘ = (c₁,...., c_r, d₁,...., d_r) will be called the (total) ground of A.

If x is any vector in the subspace R_r = (d'_r+1, d'_r+2,..., d'_n) reciprocal to ℘_r, then Ax = 0 since Sd_id'_j = 0 (i ≠ j). Conversely, if
0 = Ax = ∑ c_iSd_ix,
each multiplier Sd_ix must be 0 since the c's are linearly independent; hence every solution of Ax = 0 lies in R_r. Similarly every solution of A'x = 0 lies in R_l = (c'_r+1, c'_r+2,...., c'_n). We call R_r and R_l the right and left nullspaces of A; their order, n — r, is called the nullity of A.

We may summarize these results as follows.

Theorem 7. If a matrix A is expressed in the form $\sum \limits_1^r a_i S b_i$, where ℘_l = (a₁, a₂,...., a_r) and ℘_r = (b₁, b₂,...., b_r) define spaces of order r, then, no matter how the reduction to this form is carried out, the spaces ℘_r and ℘_l are always the same. Further, if R_l and R_r are the spaces of order n — r reciprocal to ℘_l and ℘_r, respectively, every solution of Ax = 0 lies in R_r and every solution of A'x = 0 in R_l.

The following theorem is readily deduced from Theorem 7 and its proof is left to the reader.

Theorem 8. If A, B are matrices of rank r, s, the rank of A + B is not greater than r + s and the rank of AB is not greater than the smaller of r and s.

1.11 Linear dependence.

The definition of the rank of a matrix in the preceding section was made in terms of the linear dependence of vectors associated with the matrix. In this section we consider briefly the theory of linear dependence introducing incidentally a notation which we shall require later.

Let $x_i = \sum \limits_{j=1}^n \xi_{ij} e_j$ (i = 1, 2,...., r; r ≤ n) be a set of r vectors. From the rectangular array of their coordinates
         ξ₁₁ ξ₁₂ ...... ξ_1n
         ξ₂₁ ξ₂₂ ...... ξ_2n
(34)         .......................................................
         ξ_r1 ξ_r2 ...... ξ_rn
there can be formed n!/r!(n — r)! different determinants of order r by choosing r columns out of (34), these columns being taken in their natural order. If these determinants are arranged in some definite order, we may regard them as the coordinates of a vector in space of order n!/r!(n — r)! and, when this is done, we shall denote this vector by
(35)          |x₁x₂.....x_r|
and call it a pure vector of grade r. It follows from this definition that |x₁x₂....x_r| has many of the properties of a determinant; its sign is changed if two x's are interchanged, it vanishes when two x's are equal and, if λ and μ are scalars,
(36)      |(λx₁ + μx'₁)x₂....x_r| = λ|x₁x₂....x_r| + μ|x'₁x₂.....x_r|.

If we replace the x'a in (35) by r different units e_i₁, e_i₂,...., e_{i_r}, the result is clearly not 0: we thus obtain ${n \choose r}$ vectors which we shall call the fundamental unit vectors of grade r; and any linear combination of these units, say
∑ ξ_{i₁i₂....i_r}|e_i₁e_i₁....e_{i_r}|,
is called a vector of grade r. It should be noticed that not every vector is a pure vector except when r equals 1 or n.

If we replace x_i by ∑ ξ_ije_j in (35), we get
|x₁x₂....x_r| = ∑ ξ_1j₁ξ_2j₂.....ξ_{rj_r}|e_j₁e_j₂....e_{j_r}|
where the summation extends over all permutations j₁, j₂,...., j_r of 1, 2,...., n taken r at a time. This summation may be effected by grouping together the sets j₁, j₂,...., j_r which are permutations of the same combination i₁, i₂,...., i_r, whose members may be taken to be arranged in natural order, and then summing these partial sums over all possible combinations i₁, i₂,...., i_r. Taking the first step only we have
$\sum \xi_{1j_1} \xi_{2j_2} ... \xi_{rj_r} | e_{j_1} e_{j_2} ... e_{j_r} | = \sum \delta^{i_1 ... i_r}_{j_1 ... j_r} \xi_{1j_1} ... \xi_{rj_r} | e_{i_1} e_{i_2} ... e_{i_r}|$
where $\delta^{i_1 ... i_r}_{j_1 ... j_r}$ is the sign corresponding to the permutations ${i_1 i_2 ... i_r \choose j_1 j_2 ... j_r}$ and this equals |ξ_1i₁.....ξ_{ri_r}||e_i₁....e_{i_r}|. We have therefore
$(37) \ \ \ \ \ \ | x_1 x_2 ... x_r | = \sum \limits_{(i)}^* |\xi_{1i_1} \xi_{2i_2} . . . \xi_{ri_r}| \ |e_{i_1} e_{i_2} . . . e_{i_r}|$
where the asterisk on ∑ indicates that the sum is taken over all r-combinations of 1, 2, ...., n each combination being arranged in natural order.

Theorem 9 |x₁x₂....x_r| = 0 if, and only if, x₁, x₂,..., x_r are linearly dependent.

The first part of this theorem is an immediate consequence of (36). To prove the converse it is sufficient to show that, if |x₁x₂....x_r-1| ≠ 0, then there exist scalars α₁, α₂,...., α_r-1 such that
x_r = α₁x₁ + α₂x₂ + ..... + α_r-1x_r-1.

Let $x_i = \sum \limits_j \xi_{ij} e_j$. Since |x₁x₂....x_r-1| ≠ 0, at least one of its coordinates is not 0, and for convenience we may suppose without loss of generality that
(38)          |ξ₁₁ξ₂₂.....ξ_r-1,r-1| ≠ 0.
Since |x₁x₂....x_r| = 0, all its coordinates equal 0 and in particular
         |ξ₁₁ξ₂₂....ξ_r-1,r-1| = 0      (i = 1, 2,...., n).
If we expand this determinant according to the elements of its last column, we get a relation of the form
         β₁ξ_ri + β₂ξ_1i + ...... + β_rξ_r-1,i = 0
where the β's are independent of i and β₁ ≠ 0 by (38). Hence we may write
(39)      ξ_ri = α₁ξ_1i + .... + α_r-1ξ_r-1,i      (i = 1, 2,...., n)
the α's being independent of i. Multiplying (39) by e_i and summing with regard to i, we have
         x_r = α₁x₁ + .... + α_r-1x_r-1,
which proves the theorem.

If (a₁, a₂,...., a_m) is a linear set of order r, then some set of r a's form a basis, that is, are linearly independent while each of the other a's is linearly dependent on them. By a change of notation, if necessary, we may take a₁, a₂,...., a_r as this basis and write
$(40) \ \ \ \ \ \ a_{r+i} = \sum \limits_{j=1}^r \beta_{ij} a_j, \ \ \ \ \ \ \ (i = 1, 2, ... , m-r)$
We shall now discuss the general form of all linear relations among the a's in terms of the special relations (40); and in doing so we may assume the order of the space to be equal to or greater than m since we may consider any given space as a subspace of one of arbitrarily higher dimensionality.

Let
$(41) \ \ \ \ \ \ \ \ \sum \limits_1^m r_j a_j = 0$
be a relation connecting the a's and set
         $c = \sum \limits_1^m r_j e_j$
Then (40), considered as a special case of (41), corresponds to settmg for c
$(42) \ \ \ \ \ \ c_i = - \sum \limits_{j=1}^r \beta_{ij} e_j + e_{r+i}, \ \ \ \ \ \ \ (i = 1, 2, ... , m-r)$
and there is clearly no linear relation connecting ihese vectors so that they define a linear set of order m — r. Using (40) in (41) we have
         $\sum \limits_{j=1}^r (\gamma_j + \sum \limits_{i=1}^{m-r} \gamma_{r+i} \beta_{ij} ) a_j = 0$
and, since a₁, a₂,...., a_r are linearly independent, we have
         $j = - \sum \limits_{i=1}^{m-r} \beta_{ij} \gamma_{r+i}$
whence
$(43) \ \ \ \ \ \ \ c = \sum \limits_1^m \gamma_j e_j = - \sum \limits_{i=1}^{m-r} \gamma_{r+i} \sum \limits_{j=1}^{r} \beta_{ij} e_j + \sum \limits_{i=1}^{m-r} \gamma_{r+i} e_{r+i} = \sum \limits_{i=1}^{m-r} \gamma_{r+i} c_i,$
so that c is linearly dependent on c₁, c₂,...., c_m-r. Conversely, on retracing these steps in the reverse order we see that, if c is linearly dependent on these vectors, so that γ_r+i (i = 1, 2,...., m — r) are known, then from (43) the γ_j (j = 1, 2,...., r) are defined in such a way that $c = \sum \limits_1^m \gamma_j e_j \ and \ \sum \limits_1^m \gamma_j a_j = 0$. We have therefore the following theorem.

Theorem 10. If a₁, a₂,...., a_m is a linear set of order r, there exist m — r linear relations $\sum \limits_{j=1}^m \gamma_{ij} a_j = 0$ (i = 1, 2,...., m — r) such that (i) the vectors $c_i = \sum \limits_{j=1}^m \gamma_{ij} e_j$ are linearly independent and (ii) if ∑ γ_ja_j = 0 is any linear relation connecting the a's, and if c = ∑ γ_je_j, then c belongs to the linear set (c₁, c₂,...., c_m—r).

This result can be translated immediately in terms concerning the solution of a system of ordinary linear equations or in terms of matrices. If $a_j = \sum \limits_i a_{ji} e_i$, then (41) may be written
         a₁₁γ₁ + a₂₁γ₂ + ...... + a_m1γ_m = 0
(44)          ...........................................................................................
         ............................................................................................
         a_1nγ₁ + a_2nγ₂ + ...... + a_mnγ_m = 0
a system of linear homogeneous equations in the unknowns γ₁, γ₂,...., γ_m. Hence (44) has solutions for which some γ_i ≠ 0 if, and only if, the rank r of the array
$(45) \ \ \ \ \ \ \ \ \ \ \begin{matrix} a_{11} & a_{21} & ... & a_{m1} \\ a_{12} & a_{22} & ... & a_{m2} \\ . & . & ... & . \\ a_{1n} & a_{2n} & ... & a_{mn} \end{matrix}$
is less then m and, when this condition is satisfied, every solution is linearly dependent on the set of m — r solutions given by (42) which are found by the method given in the discussion of Theorem 9.

Again, if we make (45) a square array by the introduction of columns or rows of zeros and set A = ||a_ij||, c = ∑ γ_ie_i, then (41) becomes A'c = 0 and Theorem 10 may therefore be interpreted as giving the properties of the nullspace of A' which were derived in §1.10.