# Linear Algebra: Advanced Topics ## Changing Basis :::{.proposition title="Changing to the standard basis"} The transition matrix from a given basis $\mathcal{B} = \ts{\vector b_i}_{i=1}^n$ to the standard basis is given by \[ A\da \begin{bmatrix} \vertbar & \vertbar & & \vertbar \\ \vector b_1 & \vector b_2 & \cdots & \vector b_n \\ \vertbar & \vertbar & & \vertbar \\ \end{bmatrix} ,\] and the transition matrix from the standard basis to $\mathcal{B}$ is $A\inv$. ::: ## Orthogonal Matrices Given a notion of orthogonality for vectors, we can extend this to matrices. A square matrix is said to be orthogonal iff $QQ^T = Q^TQ = I$. For rectangular matrices, we have the following characterizations: \[ QQ^T = I \implies &\text{The rows of } Q \text { are orthogonal,} \\ Q^TQ = I \implies &\text{The columns of } Q \text{ are orthogonal.} \] To remember which condition is which, just recall that matrix multiplication $AB$ takes the inner product between the **rows** of $A$ and the **columns** of $B$. So if, for example, we want to inspect whether or not the columns of $Q$ are orthogonal, we should let $B=Q$ in the above formulation -- then we just note that the rows of $Q^T$ are indeed the columns of $Q$, so $Q^TQ$ computes the inner products between all pairs of the columns of $Q$ and stores them in a matrix. ## Projections :::{.remark} A projection $P$ induces a decomposition \[ \dom(P) = \ker(P) \oplus \ker(P)\perp .\] ::: \todo[inline]{Check! Domain or range..?} Distance from a point $\vector p$ to a line $\vector a + t\vector b$: let $\vector w = \vector p - \vector a$, then: $\norm{\vector w - P(\vector w, \vector v)}$ :::{.proposition title="Projection onto range"} \[ \projection_{\range(A)}(\vector x) = A(A^t A)^{-1} A^t \vector x .\] Mnemonic: \[ P \approx {A^t A \over AA^t} .\] ::: With an inner product in hand and a notion of orthogonality, we can define a notion of **orthogonal projection** of one vector onto another, and more generally of a vector onto a subspace spanned by multiple vectors. ### Projection Onto a Vector Say we have two vectors $\vector x$ and $\vector y$, and we want to define "the component of $\vector x$ that lies along $\vector y$", which we'll call $\vector p$. We can work out what the formula should be using a simple model: We notice that whatever $p$ is, it will in the direction of $\vector y$, and thus $\vector p = \lambda \hat {\vector y}$ for some scalar $\lambda$, where in fact $\lambda = \norm {\vector p}$ since $\norm{\hat{\vector y}} = 1$. We will find that $\lambda = \inner{\vector x}{\hat{\vector y}}$, and so \[ \vector p = \inner{\vector x}{\hat{\vector y}}\hat{\vector y} = \frac{\inner{\vector x}{\vector y}}{\inner{\vector y}{\vector y}} \vector y .\] Notice that we can then form a "residual" vector $\vector r = \vector x - \vector p$, which should satisfy $\vector r \perp \vector p$. If we were to let $\lambda$ vary as a function of a parameter $t$ (making $\vector r$ a function of $t$ as well) we would find that this particular choice minimizes $\norm{\vector r (t)}$. ### Projection Onto a Subspace In general, supposing one has a subspace $S = \mathrm{span}\theset{\vector y_1, \vector y_2, \cdots, \vector y_n}$ and (importantly!) the $\vector y_i$ are orthogonal, then the projection of $\vector p$ of $x$ onto $S$ is given by the sum of the projections onto each basis vector, yielding \[ \vector p = \sum_{i=1}^n \frac{\inner{\vector x}{\vector y_i}}{\inner{\vector y_i}{\vector y_i}} \vector y_i = \sum_{i=1}^n \inner{\vector x}{\vector y_i} \hat{\vector y_i} .\] > Note: this is part of why having an orthogonal basis is desirable! Letting $A = [\vector y_1, \vector y_2, \cdots]$, then the following matrix projects vectors onto $S$, expressing them in terms of the basis $\vector y_i$[^projmatrix]: $$ \tilde P_A = (AA^T)^{-1}A^T, $$ while this matrix performs the projection and expresses it in terms of the standard basis: $$ P_A = A(AA^T)^{-1}A^T. $$ Equation of a plane: given a point $\vector p_0$ on a plane and a normal vector $\vector n$, any vector $\vector x$ on the plane satisfies $$ \inner{\vector x - \vector p_0}{\vector n} = 0 $$ To find the distance between a point $\vector a$ and a plane, we need only project $\vector a$ onto the subspace spanned by the normal $\vector n$: \[ d = \inner{\vector a}{\vector n} .\] One important property of projections is that for any vector $\vector v$ and for any subspace $S$, we have $\vector v - P_S(\vector v) \in S\perp$. Moreover, if $\vector v \in S\perp$, then $P_s(\vector v)$ must be zero. This follows by noting that in equation $\ref{projection_equation}$, every inner product appearing in the sum vanishes, by definition of $\vector v \in S\perp$, and so the projection is zero. ### Least Squares :::{.proposition title="Normal Equations"} $\vector x$ is a least squares solution to $A\vector x = \vector b$ iff \[ A^t A \vector x = A^t \vector b \] ::: \todo[inline]{Derivation of normal equations.} The general setup here is that we would like to solve $A\vector x = \vector b$ for $\vector x$, where $\vector b$ is not in fact in the range of $A$. We thus settle for a unique "best" solution $\tilde{\vector x}$ such that the error $\norm{A\tilde{\vector x} - \vector b}$ is minimized. Geometrically, the solution is given by projecting $\vector b$ onto the column space of $A$. To see why this is the case, define the residual vector $\vector r = A\tilde{\vector x} - \vector b$. We then seek to minimize $\norm{\vector r}$, which happens exactly when $\vector r \perp \im A$. But this happens exactly when $\vector r \in (\im A)\perp$, which by the fundamental subspaces theorem, is equivalent to $\vector r \in \ker A^T$. From this, we get the equation $$ A^T \vector r = \vector 0 \\ \implies A^T(A \tilde{\vector x} - \vector b) = \vector 0\\ \implies A^TA\tilde{\vector x} = A^T \vector b, $$ where the last line is described as the **normal equations**. If $A$ is an $m\times n$ matrix and is of full rank, so it has $n$ linearly independent columns, then one can show that $A^T A$ is nonsingular, and we thus arrive at the least-squares solution $$ \tilde{\vector x} = (A^TA)^{-1}A^T \vector b \qed $$ These equations can also be derived explicitly using Calculus applied to matrices, vectors, and inner products. This requires the use of the following formulas: \[ \dd{}{\vector x} \inner{\vector x}{\vector a} &= \vector a \\ \dd{}{\vector x} \inner{\vector x}{\vector A\vector x} &= (A+A^T)\vector x \] as well as the adjoint formula \[ \inner{A\vector x}{\vector x} = \inner{\vector x}{A^T \vector x}. .\] From these, by letting $A=I$ we can derive \[ \dd{}{\vector x} \norm{\vector x}^2 = \dd{}{\vector x} \inner{\vector x}{\vector x} = 2\vector x\\ .\] The derivation proceeds by solving the equation \[ \dd{}{\vector x} \norm{\vector b - A\vector x}^2 = \vector 0. .\] ## Normal Forms :::{.remark} Every square matrix is similar to a matrix in Jordan canonical form. ::: [^projmatrix]: For a derivation of this formula, see the section on least-squares approximations. ## Decompositions ### The QR Decomposition Gram-Schmidt is often computed to find an orthonormal basis for, say, the range of some matrix $A$. With a small modification to this algorithm, we can write $A = QR$ where $R$ is upper triangular and $Q$ has orthogonal columns. Why is this useful? One reason is that this also allows for a particularly simple expression of least-squares solutions. If $A=QR$, then $R$ will be invertible, and a bit of algebraic manipulation will show that \[ \tilde{\vector x} = R^{-1}Q^T\vector b. .\] How does it work? You simply perform Gram-Schmidt to obtain $\theset{\vector u_i}$, then $$Q = [\vector u_1, \vector u_2, \cdots ].$$ The matrix $R$ can then be written as \[ r_{ij} = \begin{cases} \inner{\vector u_i}{\vector x_j}, & i\leq j, \\ 0, & \text{else}. \end{cases} \] Explicitly, this yields the matrix \[ R = \begin{bmatrix} \inner{\vector u_1}{\vector x_1} & \inner{\vector u_1}{\vector x_2} & \inner{\vector u_1}{\vector x_3} & \cdots & \\ 0 & \inner{\vector u_2}{\vector x_2} & \inner{\vector u_2}{\vector x_3} & \cdots & \\ 0 & 0 & \inner{\vector u_3}{\vector x_3} & \cdots & \\ \vdots & \vdots & \vdots & \ddots \\ \end{bmatrix} \] \todo[inline]{Explain shortcut for diagonal.} # Appendix: Lists of things to know > Textbook: Leon, *Linear Algebra with Applications* ## Topics - 1.6: Partition Matrices - 3.5: Change of Basis - 4.1: Linear Transformations - 4.2: Matrix Representations - 4.3: Similarity - *Exam 1* - 5.1: Scalar Product in $\RR^n$ - 5.2: Orthogonal Subspaces - 5.3: Least Squares - 5.4: Inner Product Spaces - 5.5: Orthonormal Sets - 5.6: Gram-Schmidt - 6.1: Eigenvalues and Eigenvectors - *Exam 2* - 6.2: Systems of Linear Differential Equations - 6.3: Diagonalization - 6.6: Quadratic Forms - 6.7: Positive Definite Matrices - 6.5: Singular Value Decomposition - 7.7: The Moore-Penrose Pseudo-Inverse - *Final Exam* ## Definitions - System of equations - Homogeneous system - Consistent/inconsistent system - Matrix - Matrix (i.e. $A \vector x = \vector b$) - Inverse matrix - Singular matrix - Determinant - Trace - Rank - Elementary row operation - Row equivalence - Pivot - Row Echelon Form - Reduced Row Echelon Form - Gaussian elimination - Block matrix - Vector space - Vector subspace - Linear transformation - Span - Linear independence - Basis - Change of basis - Dimension - Row space - Column space - Image - Null space - Kernel - Direct sum - Projection - Orthogonal subspaces - Orthogonal complement - Normal equations - Least-squares solution - Orthonormal - Eigenvalue - Eigenvector - Characteristic polynomial - Similarity - Diagonalizable - Inner product - Bilinearity - Multilinearity - Defective - Singular value decomposition - QR factorization - Gram-Schmidt process - Spectral theorem - Symmetric matrix - Orthogonal matrix - Positive-definite - Quadratic form ## Lower-division review - Systems of linear equations - Consistent vs. Inconsistent - Possibilities for solutions - Geometric interpretation - Matrix Inverses - Detecting if a matrix is singular - Computing the inverse - Formula for 2x2 case - Augment with the identity - Cramer's Rule - Vector Spaces - Definition in terms of closures - Span - Linear Independence - Subspace and the subspace test - Basis - Common Computations - Reduction to RREF - Eigenvalues and eigenvectors - Basis for the column space - Basis for the nullspace - Basis for the eigenspace - Construct matrix from a given linear map - Construct change of basis matrix - Construct matrix projection onto subspace - Convert a basis to an orthonormal basis ## Things to compute - Construct a matrix representing a linear map - With respect to the standard basis in both domain and range - With respect to a nonstandard basis in the range - With respect to a nonstandard basis in the domain - With respect to nonstandard bases in both the domain and range - Construct a change of basis matrix - Check that a map is a linear transformation - Compute the following spaces of a matrix and their orthogonal complements: - Row space - Column space - Null space - Compute the shortest distance between a point and a plane - Compute the least squares solution to linear system - Prove that something is a vector space - Prove that a map is an inner product - Compute determinants - Compute the RREF of a matrix - Compute characteristic polynomials, eigenvalues, and eigenvectors - Diagonalize a matrix - Solve a system of ODEs resulting arising from tank mixing - Compute the singular value decomposition of a matrix - Compute the rank and nullity of a matrix - Convert a set of vectors to a basis - Convert a basis to an orthonormal basis - Determine if a matrix is diagonalizable - Compute the matrix for a projection onto a subspace - Find the QR factorization of a matrix ## Things to prove - Prove facts about block matrices - Prove facts about injective linear maps - Prove facts about similar matrices - Prove facts about orthogonal spaces and orthogonal complements - Prove facts about inner products - Prove facts about orthonormal sets - Prove facts about eigenvalues/eigenvectors - Understand when a matrix can be diagonalized - Prove facts about diagonalizable matrices - Prove facts about the orthogonal decomposition theorem