Differential Calculus

Big Theorems / Tools:

\[\begin{align*} \frac{\partial}{\partial x} \int_a^x f(t) dt = f(x) \\ \\ \end{align*}\]

\[\begin{align*} \frac{\partial}{\partial x} \int_{a(x)}^{b(x)} f(x, t) dt - \int_{a(x)}^{b(x)} \frac{\partial}{\partial x} f(x, t) dt &= f(x, t) \cdot \frac{\partial}{\partial x}(t) \bigg\rvert_{t=a(x)}^{t=b(x)} \\ \\ &= f(x, b(x))\cdot b'(x) - f(x, a(x))\cdot a'(x) \\ \\ \end{align*}\]

If \(f(x,t) = f(t)\) doesn’t depend on \(x\), then \({\frac{\partial f}{\partial x}\,} = 0\) and the second integral vanishes:

\[\begin{align*} \frac{\partial}{\partial x} \int_{a(x)}^{b(x)} f(t) dt &= f(b(x))\cdot b'(x) - f(a(x))\cdot a'(x) \end{align*}\]

Note that you can recover the original FTC by taking \[\begin{align*} a(x) &= c \\ b(x) &= x \\ f(x,t) &= f(t) .\end{align*}\]

\[\begin{align*} \frac{\partial}{\partial x} \int_{1}^{x} f(x, t) dt = \int_{1}^{x} \frac{\partial}{\partial x} f(x, t) dt + f(x, x) \end{align*}\]


\[\begin{align*} f \in C^0(I) &\implies \exists p\in I: f(b) - f(a) = f'(p)(b-a) \\ &\implies \exists p\in I: \int_a^b f(x)~dx = f(p)(b-a) .\end{align*}\]


\[\begin{align*} \lim_{x\to {\{\text{pt}\}}} f(x) = \lim_{x\to {\{\text{pt}\}}} g(x) \in \left\{{0, \pm \infty}\right\}, && \forall x \in I, g'(x) \neq 0, && \lim_{x\to{\{\text{pt}\}}} \frac{ f'(x)}{\ g'(x)} \text{ exists}, \\ \end{align*}\]

Then it is necessarily the case that \[\begin{align*} \lim _ { x \rightarrow {\{\text{pt}\}}} \frac { f ( x ) } { g ( x ) } = \lim _ { x \rightarrow {\{\text{pt}\}}} \frac { f ^ { \prime } ( x ) } { g ^ { \prime } ( x ) }. \end{align*}\]

Note that this includes the following indeterminate forms: \[\begin{align*} \frac{0}{0}, \quad \frac{\infty}{\infty}, \quad 0 \cdot \infty, \quad 0^{0}, \quad \infty^{0}, \quad 1^{\infty}, \quad \infty-\infty .\end{align*}\]

For \(0\cdot \infty\), can rewrite as \({0 \over {1\over \infty}} = {0\over 0},\) or alternatively \({\infty \over {1\over 0}} = {\infty \over \infty}.\)

For \(1^\infty, \infty^0,\) and \(0^0\), set \[\begin{align*} L \mathrel{\vcenter{:}}=\lim f^g \implies \ln L = \lim g \ln(f) \end{align*}\] to recover \(\infty\cdot 0, 0 \cdot \infty,\) or \(0\cdot 0\).

\[\begin{align*} T(a, x) &= \sum _ { n = 0 } ^ { \infty } \frac { f ^ { ( n ) } ( a ) } { n ! } ( x - a ) ^ { n } \\ &= f ( a ) + f'(a)( x - a ) + \frac { 1 } { 2 }f ^ { \prime \prime } ( a ) ( x - a ) ^ { 2 } \\ & \quad \quad + \frac { 1} { 6 } f ^ { \prime \prime \prime } ( a ) ( x - a ) ^ { 3 } + \frac{1}{24}f^{(4)}(a)(x-a)^4 + ~\cdots \end{align*}\] There is a bound on the error: \[\begin{align*} {\left\lvert {f(x) - T_k(a,x)} \right\rvert} \leq {\left\lvert {\frac{f^{(k+1)}(a)}{(k+1)!}} \right\rvert} \end{align*}\] where \(T_k(a, x) = \sum _ { n = 0 } ^ { k } \frac { f ^ { ( n ) } ( a ) } { n ! } ( x - a ) ^ { n }\) is the \(k\)th truncation.

Approximating change: \(\Delta y \approx f'(x) \Delta x\)


Tools for finding limits

How to find \(\lim_{x\to a} f(x)\)in order of difficulty:

Be careful: limits may not exist!! Example \(:\lim_{x\to 0} \frac{1}{x} \neq 0\).


\[\begin{align*} f(x) = \frac{p(x)}{q(x)} = r(x) + \frac{s(x)}{t(x)} \sim r(x) \end{align*}\]



\[\begin{align*} {\frac{\partial }{\partial x}\,}(f\circ g) = (f' \circ g) \cdot g' \end{align*}\]

\[\begin{align*} {\frac{\partial }{\partial x}\,} f\cdot g =f'\cdot g + g' \cdot f \end{align*}\]

\[\begin{align*} {\frac{\partial }{\partial x}\,} \frac{f(x)}{g(x)} = \frac{f'g - g'f}{g^2} \end{align*}\]

Mnemonic: Low d-high minus high d-low

\[\begin{align*} {\frac{\partial f^{-1}}{\partial x}\,}(f(x_0)) = \left( {\frac{\partial f}{\partial x}\,} \right)^{-1}(x_0) = 1/f'(x_0) \end{align*}\]

Implicit Differentiation

\[\begin{align*} (f(x))' = f'(x)~dx, (f(y))' = f'(y)~dy \end{align*}\] - Often able to solve for \({\frac{\partial y}{\partial x}\,}\) this way.

General series of steps: want to know some unknown rate \(y_t\)

Example: ladder sliding down wall

Integral Calculus

Average Values

\[\begin{align*} \mu_f = \frac{1}{b-a}\int_a^b f(t) dt \end{align*}\]

Apply MVT to \(F(x)\).

Area Between Curves

Area in polar coordinates: \[\begin{align*} A = \int_{r_1}^{r_2} \frac{1}{2}r^2(\theta) ~d\theta \end{align*}\]

Solids of Revolution

\[\begin{align*} \text{Disks} && A = \int \pi r(t)^2 ~dt \\ \text{Cylinders} && A = \int 2\pi r(t)h(t) ~dt .\end{align*}\]

Arc Lengths

\[\begin{align*} L &= \int ~ds && ds = \sqrt{dx^2 + dy^2} \\ &= \int_{x_0}^{x_1}\sqrt{1 + {\frac{\partial y}{\partial x}\,}}~dx \\ &= \int_{y_0}^{y_1}\sqrt{{\frac{\partial x}{\partial y}\,} + 1}~dy \end{align*}\]

\[\begin{align*} SA = \int 2 \pi r(x) ~ds \end{align*}\]

Center of Mass

Given a density \(\rho(\mathbf x)\) of an object \(R\), the \(x_i\) coordinate is given by \[\begin{align*} x_i = \frac {\displaystyle\int_R x_i\rho(x) ~dx} {\displaystyle\int_R \rho(x)~dx} \end{align*}\]

Big List of Integration Techniques

Given \(f(x)\), we want to find an antiderivative \(F(x) = \int f\) satisfying \(\frac{\partial}{\partial x}F(x) = f(x)\)

Integration by Parts:

The standard form: \[\begin{align*} \int u dv = uv - \int v du \end{align*}\]

Shoelace Method

Derivatives Integrals Signs Result
\(u\) \(v\) NA NA
\(u'\) \(\int v\) \(+\) \(u\int v\)
\(u''\) \(\int\int v\) \(-\) \(-u'\int\int v\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)

Fill out until one column is zero (alternate signs). Get the result column by multiplying diagonally, then sum down the column.

Differentiating under the integral

\[\begin{align*} \frac{\partial}{\partial x} \int_{a(x)}^{b(x)} f(x, t) dt - \int_{a(x)}^{b(x)} \frac{\partial}{\partial x} f(x, t) dt &= f(x, \cdot)\frac{\partial}{\partial x}(\cdot) \bigg\rvert_{a(x)}^{b(x)} \\ &= f(x, b(x))~b'(x) - f(x, a(x))~a'(x) \end{align*}\]

Let \(F(x)\) be an antiderivative and compute \(F'(x)\) using the chain rule.

\[\begin{align*} \int_0^{\pi/2} \frac{1}{\sin \theta}~d\theta = 1/2 \end{align*}\]

Partial Fractions

Trigonometric Substitution

Other small but useful facts: \[\begin{align*} \int_0^{2\pi} \sin \theta~d\theta = \int_0^{2\pi} \cos \theta~d\theta = 0 .\end{align*}\]


Vector Calculus

Notation: \[\begin{align*} \mathbf{v}, \mathbf{a}, \cdots && \text{vectors in }{\mathbb{R}}^n \\ \mathbf{R}, \mathbf{A}, \cdots && \text{matrices} \\ \mathbf{r}(t) && \text{A parameterized curve }\mathbf{r}: {\mathbb{R}}\to {\mathbb{R}}^n \\ \\ \widehat{\mathbf{v}} && {\mathbf{v} \over {\left\lVert {\mathbf{v}} \right\rVert}} .\end{align*}\]

Plane Geometry

\[\begin{align*} \mathbf{v} = [x, y] \in {\mathbb{R}}^2 \implies m = \frac{y}{x} .\end{align*}\]

\[\begin{align*} \mathbf{R}_\theta = \left[ \begin{array} { l l } { \cos \theta } & { - \sin \theta } \\ { \sin \theta } & { \cos \theta } \end{array} \right] \implies \mathbf{R}_{\frac{\pi}{2}} = \left[ \begin{array} { l l } { 0 } & { - 1 } \\ { 1 } & { 0 } \end{array}\right] .\end{align*}\]

\[\begin{align*} \mathbf{R}_{\frac{\pi}{2}} \mathbf{x} \mathrel{\vcenter{:}}= \mathbf{R}_{\frac{\pi}{2}} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} -y \\ x \end{bmatrix} \in \mathbf{\mathbb{R}}\mathbf{x}^\perp .\end{align*}\] Thus if a planar line is defined by the span of \({\left[ {x, y} \right]}\) and a slope of \(m = y/x\), a normal vector is given by the span of \({\left[ {-y, x} \right]}\) of slope \(-{1 \over m} = -x/y\).

Given \(\mathbf{v}\), the rotated vector \(\mathbf{R}_{\frac{\pi}{2}} \mathbf{v}\) is orthogonal to \(\mathbf{v}\), so this can be used to obtain normals and other orthogonal vectors in the plane.

There is a direct way to come up with one orthogonal vector to any given vector: \[\begin{align*} \mathbf{v} = [a,b,c] \implies \mathbf{y} \mathrel{\vcenter{:}}= \begin{cases} \mathbf{[}-(b+c), a, a] & \mathbf{v} = [-1,-1,0], \\ [c,c, -(a+b)] & \text{else} \end{cases} \in {\mathbb{R}}\mathbf{v}^\perp .\end{align*}\]


For a subspace given by a single vector \(\mathbf{a}\): \[\begin{align*} \mathrm{proj}_\mathbf{a}( \textcolor{Aquamarine}{\textbf{x}} ) = {\left\langle {\textcolor{Aquamarine}{\textbf{x}} },~{\mathbf{a}} \right\rangle}\mathbf{\widehat{a}} \hspace{8em} \mathrm{proj}_{\mathbf{a}}^\perp(\textcolor{Aquamarine}{\textbf{x}}) = \textcolor{Aquamarine}{\textbf{x}} - \mathrm{proj}_\mathbf{a}(\textcolor{Aquamarine}{\textbf{x}}) = \textcolor{Aquamarine}{\textbf{x}} - {\left\langle {\textcolor{Aquamarine}{\textbf{x}}},~{\mathbf{a}} \right\rangle}\widehat{\mathbf{a}} \end{align*}\]

In general, for a subspace \(\operatorname{colspace}(A) = \left\{{\mathbf{a}_1, \cdots \mathbf{a}_n}\right\}\),

\[\begin{align*} \mathrm{proj}_A(\textcolor{Aquamarine}{\textbf{x}}) = \sum_{i=1}^n {\left\langle {\textcolor{Aquamarine}{\textbf{x}} },~{\mathbf{a}_i} \right\rangle}\mathbf{\widehat{a}_i} = A(A^T A)^{-1}A^T\textcolor{Aquamarine}{\textbf{x}} \end{align*}\]


\[\begin{align*} \text{General Equation} && Ax + By + C = 0 \\ \\ \text{Parametric Equation} && \mathbf{r}(t) = t\mathbf{x} + \mathbf{b} .\end{align*}\]

Characterized by an equation in inner products: \[\begin{align*} \mathbf{y} \in L \iff {\left\langle {\mathbf{y}},~{\mathbf{n}} \right\rangle} = 0 \end{align*}\]

Given \(\mathbf{p}_0, \mathbf{p}_1\), take \(\mathbf{x} = \mathbf{p}_1 - \mathbf{p}_0\) and \(\mathbf{b} = \mathbf{p}_i\) for either \(i\): \[\begin{align*} \mathbf{r}(t) = t(\mathbf{p}_1 - \mathbf{p}_0) + \mathbf{p}_0 && = t\mathbf{p}_1 + (1-t) \mathbf{p}_0 .\end{align*}\]

If a line \(L\) is given by \[\begin{align*} \mathbf{r}(t) = t {\left[ {x_1, x_2, x_3} \right]} + {\left[ {p_1, p_2, p_3} \right]} ,\end{align*}\] then \[\begin{align*} (x, y, z) \in L \iff \frac{x-p_1}{x_1} = \frac{y-p_{2}}{x_2} = \frac{z-p_{3}}{x_3} .\end{align*}\]

The symmetric equation of the line through \({\left[ {2,1,-3} \right]}\) and \({\left[ {1,4,-3} \right]}\) is given by \[\begin{align*} \frac{x-2}{1}=\frac{y+1}{-5}=\frac{z-3}{6} .\end{align*}\]

Tangent Lines / Planes

Key idea: just need a point and a normal vector, and the gradient is normal to level sets.

For any locus \(f(\mathbf{x}) = 0\), we have \[\begin{align*} \mathbf{x} \in T_f(\mathbf{p}) \implies {\left\langle {\nabla f(\mathbf{p})},~{\mathbf{x}-\mathbf{p}} \right\rangle} = 0 .\end{align*}\]

Normal Lines

Key idea: the gradient is normal.

To find a normal line, you just need a single point \(\mathbf{p}\) and a normal vector \(\mathbf{n}\); then \[\begin{align*} L = \left\{{\mathbf{x} \mathrel{\Big|}\mathbf{x} = \mathbf{p} + t\mathbf{v}}\right\} .\end{align*}\]


\[\begin{align*} \text{General Equation} && A x + B y + C z + D = 0 \\ \\ \text{Parametric Equation} &&\mathbf{y}(t,s) = t\mathbf{x}_1 + s\mathbf{x}_2 + \mathbf{b} \\ \\ .\end{align*}\]

Characterized by an equation in inner products: \[\begin{align*} \mathbf{y} \in P \iff {\left\langle {\mathbf{y} - \mathbf{p}_0},~{\mathbf{n}} \right\rangle} = 0 \end{align*}\]

Determined by a point \(\mathbf{p}_0\) and a normal vector \(\mathbf{n}\)

Given \(\mathbf{v}_0, \mathbf{v}_1\), set \(\mathbf{n} = \mathbf{v}_0 \times\mathbf{v}_1\).

Finding a Normal Vector

Distance from origin to plane

Distance from point to plane


\[\begin{align*} \mathbf{r}(t) = [x(t), y(t), z(t)] .\end{align*}\]

Tangent line to a curve

We have an equation for the tangent vector at each point: \[\begin{align*} \widehat{ \mathbf{T} }(t) = \widehat{\mathbf{r}'}(t) ,\end{align*}\] so we can write \[\begin{align*} \mathbf{L}_{T}(t) = \mathbf{r}(t_0) + t \widehat{ \mathbf{T}}(t_0) \mathrel{\vcenter{:}}=\mathbf{r}(t_0) + t \widehat{\mathbf{r}'}(t_0) .\end{align*}\]

Normal line to a curve

Special case: planar graphs of functions

Suppose \(y = f(x)\). Set \(g(x, y) = f(x) - y\), then \[\begin{align*} \nabla g = [f_x(x), -1]\implies m = -\frac{1}{f_x(x)} \end{align*}\]

Minimal Distances

Fix a point \(\mathbf{p}\). Key idea: find a subspace and project onto it.

Key equations: projection and orthogonal projection of \(\mathbf{b}\) onto \(\mathbf{a}\): \[\begin{align*} \mathrm{proj}_\mathbf{a}(\mathbf{b}) = {\left\langle {\mathbf{b}},~{\mathbf{a}} \right\rangle}\mathbf{\widehat{a}} \hspace{8em} \mathrm{proj}_{\mathbf{a}}^\perp(\mathbf{b}) = \mathbf{b} - \mathrm{proj}_\mathbf{a}(\mathbf{b}) = \mathbf{b} - {\left\langle {\mathbf{b}},~{\mathbf{a}} \right\rangle}\widehat{\mathbf{a}} \end{align*}\]

Point to plane

\[\begin{align*} d = {\left\lVert {\mathrm{proj}_{\mathbf{n}}(\mathbf{q} - \mathbf{p})} \right\rVert} = {\left\lVert {{\left\langle {\mathbf{q} - \mathbf{p}},~{\mathbf{n}} \right\rangle} \widehat{\mathbf{n}}} \right\rVert} = {\left\langle {\mathbf{q} - \mathbf{p}},~{\mathbf{n}} \right\rangle} .\end{align*}\]

Origin to plane

Special case: if \(\mathbf{p} = \mathbf{0}\), \[\begin{align*} d = {\left\lVert {\mathrm{proj}_{\mathbf{n}}(\mathbf{q})} \right\rVert} = {\left\lVert {{\left\langle {\mathbf{p}},~{\mathbf{n}} \right\rangle} \widehat{\mathbf{n}}} \right\rVert} = {\left\langle {\mathbf{p}},~{\mathbf{n}} \right\rangle}. .\end{align*}\]

Point to line

Point to curve

Line to line

Given \(\mathbf{r}_1(t) = \mathbf{p}_1 + t\mathbf{v}_2\) and \(\mathbf{r}_2(t) = \mathbf{p}_2 + t\mathbf{v}_2\), let \(d\) be the desired distance.


\[\begin{align*} S = \left\{{(x,y,z) \mathrel{\Big|}f(x,y, z) = 0}\right\} \hspace{10em} z = f(x,y) \end{align*}\]

Tangent plane to a surface

Surfaces of revolution

Multivariable Calculus

Given a function \(f: {\mathbb{R}}^n \to {\mathbb{R}}\), let \(S_k \mathrel{\vcenter{:}}=\left\{{\mathbf{p}\in {\mathbb{R}}^n ~{\text{s.t.}}~f(\mathbf{p}) = k}\right\}\) denote the level set for \(k\in {\mathbb{R}}\). Then \[\begin{align*} \nabla f(\mathbf{p}) \in S_k^\perp .\end{align*}\]


\[\begin{align*} \mathbf{v} &= [v_1, v_2, \cdots] && \text{a vector} \\ \\ \mathbf{e}_i &= [0, 0, \cdots, \overbrace{1}^{i \text{th term}}, \cdots, 0] && \text{the } i \text{th standard basis vector} \\ \\ \phi: {\mathbb{R}}^n &\to {\mathbb{R}} && \text{a functional on } {\mathbb{R}}^n\\ \phi(x_1, x_2, \cdots) &= \cdots && \\ \\ \mathbf{F}: {\mathbb{R}}^n &\to {\mathbb{R}}^n && \text{a multivariable function}\\ \mathbf{F}(x_1,x_2,\cdots) &= [\mathbf{F}_1(x_1, x_2, \cdots), \mathbf{F}_2(x_1, x_2, \cdots), \cdots, \mathbf{F}_n(x_1, x_2, \cdots)] \end{align*}\]

Partial Derivatives

For a functional \(f:{\mathbb{R}}^n\to {\mathbb{R}}\), the partial derivative of \(f\) with respect to \(x_i\) is \[\begin{align*} {\frac{\partial f}{\partial x_i}\,}(\mathbf p) \mathrel{\vcenter{:}}=\lim_{h\to 0}\frac{f(\mathbf p + h\mathbf e_i) - f(\mathbf p)}{h} \end{align*}\]

\[\begin{align*} f: {\mathbb{R}}^2 &\to {\mathbb{R}}\\ {\frac{\partial f}{\partial x}\,}(x_0,y_0) &= \lim_{h \to 0} \frac{f(x_0+h, y_0) - f(x_0,y_0)}{h} \end{align*}\]

General Derivatives

A function \(f: {\mathbb{R}}^n \to {\mathbb{R}}^m\) is differentiable iff there exists a linear transformation \(D_f: {\mathbb{R}}^n \to {\mathbb{R}}^m\) such that the following limit exists \[\begin{align*} \lim _ { \mathbf x \rightarrow \mathbf{p} } \frac { \left\| f (\mathbf x ) - f (\mathbf{p} ) - D_f (\mathbf x - \mathbf{p} ) \right\| } { \| \mathbf x - \mathbf{p} \| } = 0 .\end{align*}\]

\(D_f\) is the “best linear approximation” to \(f\).

When \(f\) is differentiable, \(D_f\) can be given in coordinates by
\[\begin{align*} (D_f)_{ij} = {\frac{\partial f_i}{\partial x_j}\,} \end{align*}\]

This yields the Jacobian of \(f\): \[\begin{align*} D_f(\mathbf{\mathbf}{p}) \begin{bmatrix} \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ \nabla f_1(\mathbf{p}) & \nabla f_2(\mathbf{p}) & \cdots & \nabla f_m(\mathbf{p}) \\ \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex} \end{bmatrix}^T = \left[ \begin{array} { c c c c } { \frac { \partial f _ { 1 } } { \partial x _ { 1 } } ( \mathbf{p} ) } & { \frac { \partial f _ { 1 } } { \partial x _ { 2 } } ( \mathbf{p} ) } & { \ldots } & { \frac { \partial f _ { 1 } } { \partial x _ { n } } ( \mathbf{p} ) } \\ { \frac { \partial f _ { 2 } } { \partial x _ { 1 } } ( \mathbf{p} ) } & { \frac { \partial f _ { 2 } } { \partial x _ { 2 } } ( \mathbf{p} ) } & { \dots } & { \frac { \partial f _ { 2 } } { \partial x _ { n } } ( \mathbf{p} ) } \\ { \vdots } & { \vdots } & { \ddots } & { \vdots } \\ { \frac { \partial f _ { m } } { \partial x _ { 1 } } ( \mathbf{p} ) } & { \frac { \partial f _ { m } } { \partial x _ { 2 } } ( \mathbf{p} ) } & { \cdots } & { \frac { \partial f _ { m } } { \partial x _ { n } } ( \mathbf{p} ) } \end{array} \right]. \end{align*}\]

This is equivalent to

For a function \(f: {\mathbb{R}}^n \to {\mathbb{R}}\), the Hessian is a generalization of the second derivative, and is given in coordinates by \[\begin{align*} (H_f)_{ij} = {\frac{\partial ^2f}{\partial x_i x_j}\,} \end{align*}\]

Explicitly, we have \[\begin{align*} H_f(\mathbf{p}) = \begin{bmatrix} \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ D \nabla f_1(\mathbf{p}) & D\nabla f_2(\mathbf{p}) & \cdots & D\nabla f_m(\mathbf{p}) \\ \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex} \end{bmatrix}^T = \left[ \begin{array} { c c c } { \frac { \partial ^ { 2 } f } { \partial x _ { 1 } \partial x _ { 1 } } ( \mathbf { a } ) } & { \dots } & { \frac { \partial ^ { 2 } f } { \partial x _ { 1 } \partial x _ { n } } ( \mathbf { a } ) } \\ { \vdots } & { \ddots } & { \vdots } \\ { \frac { \partial ^ { 2 } f } { \partial x _ { n } \partial x _ { 1 } } ( \mathbf { a } ) } & { \cdots } & { \frac { \partial ^ { 2 } f } { \partial x _ { n } \partial x _ { n } } ( \mathbf { a } ) } \end{array} \right]. \end{align*}\]

Mnemonic: make matrix with \(\nabla f\) as the columns, and then differentiate variables left to right.

The Chain Rule

Write out tree of dependent variables:

Then sum each possible path.

Let subscripts denote which variables are held constant, then \[\begin{align*} \left({\frac{\partial z}{\partial x}\,}\right)_y &= \left({\frac{\partial z}{\partial x}\,}\right)_{u,y,v} \\ & + \left({\frac{\partial z}{\partial v}\,}\right)_{x,y,u} \left({\frac{\partial v}{\partial x}\,}\right)_y \\ & + \left({\frac{\partial z}{\partial u}\,}\right)_{x,y,v} \left({\frac{\partial u}{\partial x}\,}\right)_{v,y} \\ & + \left({\frac{\partial z}{\partial u}\,}\right)_{x,y,v} \left({\frac{\partial u}{\partial v}\,}\right)_{x,y} \left({\frac{\partial v}{\partial x}\,}\right)_y \end{align*}\]


Let \(z = f(x,y)\), then to approximate near \(\mathbf{p}_0 = {\left[ {x_0, y_0} \right]}\), \[\begin{align*} f(\textcolor{Aquamarine}{\textbf{x}}) &\approx f(\mathbf{p}) + \nabla f (\textcolor{Aquamarine}{\textbf{x}} - \mathbf{p}_0) \\ \implies f(x,y) &\approx f(\mathbf{p}) + f_x(\mathbf{p})(x-x_0) + f_y(\mathbf{p})(y-y_0) \\ .\end{align*}\]


Classifying Critical Points

Critical points of \(f\) given by points \(\mathbf{p}\) such that the derivative vanishes: \[\begin{align*} \operatorname{crit}(f) = \left\{{\mathbf{p}\in {\mathbb{R}}^n ~{\text{s.t.}}~D_f({\mathbf p}) = 0}\right\} \end{align*}\]

. Compute \[\begin{align*} {\left\lvert {H_f(\mathbf p)} \right\rvert} \mathrel{\vcenter{:}}=\left| \begin{array} { l l } { f _ { x x } } & { f _ { x y } } \\ { f _ { y x } } & { f _ { y y } } \end{array} \right| ({ \mathbf p }) \end{align*}\] 2. Check by cases:

What’s really going on?

Lagrange Multipliers

The setup: \[\begin{align*} \text{Optimize } f(\mathbf x) &\quad \text{subject to } g(\mathbf x) = c \\ \implies \nabla f &= \lambda \nabla g \end{align*}\] 1. Use this formula to obtain a system of equations in the components of \(x\) and the parameter \(\lambda\).

  1. Use \(\lambda\) to obtain a relation involving only components of \(\mathbf{x}\).

  2. Substitute relations back into constraint to obtain a collection of critical points.

  3. Evaluate \(f\) at critical points to find max/min.

Change of Variables

For any \(f: {\mathbb{R}}^n \to {\mathbb{R}}^n\) and region \(R\), \[\begin{align*} \int _ { g ( R ) } f ( \mathbf { x } ) ~d V = \int _ { R } (f \circ g) ( \mathbf { x } ) \cdot {\left\lvert {D_g ( \mathbf { x })} \right\rvert} ~d V \end{align*}\]

Vector Calculus


\(R\) is a region, \(S\) is a surface, \(V\) is a solid.

\[\begin{align*} \oint _ { \partial S } \mathbf { F } \cdot d \mathbf { r } = \oint _ { \partial S } [\mathbf{F}_1, \mathbf{F}_2, \mathbf{F}_3] \cdot [dx, dy, dz] = \oint_{{\partial}S} \mathbf{F}_1dx + \mathbf{F}_2dy + \mathbf{F}_3dz \end{align*}\]

The main vector operators \[\begin{align*} \nabla: ({\mathbb{R}}^n \to {\mathbb{R}}) &\to ({\mathbb{R}}^n \to {\mathbb{R}}^n) \\ \phi &\mapsto \nabla \phi \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial \phi}{\partial x_i} ~\mathbf{e}_i \\ \\ \text{} \mathrm{div}(\mathbf{F}): ({\mathbb{R}}^n \to {\mathbb{R}}^n) &\to ({\mathbb{R}}^n \to {\mathbb{R}}) \\ \mathbf{F} &\mapsto \nabla \cdot \mathbf{F} \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial \mathbf{F}_i}{\partial x_i} \\ \\ \text{} \mathrm{curl}(\mathbf{F}): ({\mathbb{R}}^3 \to {\mathbb{R}}^3) &\to ({\mathbb{R}}^3 \to {\mathbb{R}}^3) \\ \mathbf{F} &\mapsto \nabla \times\mathbf{F} \\ \\ \text{} \end{align*}\] Some terminology: \[\begin{align*} \text{Scalar Field} && \phi:&~ X \to {\mathbb{R}}\\ \text{Vector Field} && \mathbf{F}:&~ X\to {\mathbb{R}}^n\\ \text{Gradient Field} && \mathbf{F}:&~ X \to {\mathbb{R}}^n \mathrel{\Big|}\exists \phi: X\to {\mathbb{R}}\mathrel{\Big|}\nabla \phi = F \end{align*}\]

\[\begin{align*} \mathbf x \cdot \mathbf y = {\left\langle {\mathbf x},~{\mathbf y} \right\rangle} = \sum_{i=1}^n {x_i y_i} = x_1y_1 + x_2y_2 + \cdots && \text{inner/dot product} \\ {\left\lVert {\mathbf x} \right\rVert} = \sqrt{{\left\langle {\mathbf x},~{\mathbf x} \right\rangle}} = \sqrt{\sum_{i=1}^n x_i^2} = \sqrt{x_1^2 + x_2^2 + \cdots} && \text{norm} \\ \mathbf a \times\mathbf b = \mathbf{\widehat{n}} {\left\lVert {\mathbf a} \right\rVert}{\left\lVert {\mathbf b} \right\rVert}\sin\theta_{\mathbf a,\mathbf b} = \left| \begin{array}{ccc} \mathbf{\widehat{x}} & \mathbf{\widehat{y}} & \mathbf{\widehat{z}} \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \end{array}\right| && \text{cross product} \\ \\ D_\mathbf{u}(\phi) = \nabla \phi \cdot \mathbf{\widehat{u}} && \text{directional derivative} \\ \\ \nabla \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial}{\partial x_i} \mathbf{e}_i = \left[\frac{\partial}{\partial x_1}, \frac{\partial}{\partial x_2}, \cdots, \frac{\partial}{\partial x_n}\right] && \text{del operator} \\ \\ \nabla \phi \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial \phi}{\partial x_i} ~\mathbf{e}_i = {\left[ { \frac{\partial \phi}{\partial x_1}, \frac{\partial \phi}{\partial x_2}, \cdots, \frac{\partial \phi}{\partial x_n}} \right]} && \text{gradient} \\ \\ \Delta \phi \mathrel{\vcenter{:}}=\nabla\cdot\nabla \phi \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial^2 \phi}{\partial x_i^2} = \frac{\partial^2 \phi}{\partial x_1^2} + \frac{\partial^2 \phi}{\partial x_2} + \cdots + \frac{\partial^2 \phi}{\partial x_n^2} && \text{Laplacian} \\ \\ \nabla \cdot \mathbf{F} \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial \mathbf{F}_i}{\partial x_i} = \frac{\partial \mathbf{F}_1}{\partial x_1} + \frac{\partial \mathbf{F}_2}{\partial x_2} + \cdots + \frac{\partial \mathbf{F}_n}{\partial x_n} && \text{divergence} \\ \\ \nabla \times \mathbf { F } = \left| \begin{array} { c c c } { \mathbf { e }_1 } & { \mathbf { e }_2 } & { \mathbf { e }_3 } \\ { \frac { \partial } { \partial x } } & { \frac { \partial } { \partial y } } & { \frac { \partial } { \partial z } } \\ { \mathbf{F} _ { 1 } } & { \mathbf{F} _ { 2 } } & { \mathbf{F} _ { 3 } } \end{array} \right| = [\mathbf{F}_{3y} - \mathbf{F}_{2z}, \mathbf{F}_{1z}- \mathbf{F}_{3x}, \mathbf{F}_{2x} -\mathbf{F}_{1y}] && \text{curl} \\ \iint _ { S } ( \nabla \times \mathbf { F } ) \cdot d \mathbf { S } = \iint _ { S } ( \nabla \times \mathbf { F } ) \cdot \mathbf { n } ~dS && \text{surface integral} \end{align*}\]

Big Theorems

Stokes’ and Consequences

\[\begin{align*} \oint _ { \partial S } \mathbf { F } \cdot d \mathbf { r } = \iint _ { S } ( \nabla \times \mathbf { F } ) \cdot d \mathbf { S } .\end{align*}\]

Note that if \(S\) is a closed surface, so \({\partial}S = \emptyset\), this integral vanishes.

\[\begin{align*} \oint _ { {\partial}R } ( L ~d x + M ~d y ) = \iint _ { R } \left( \frac { \partial M } { \partial x } - \frac { \partial L } { \partial y } \right) d x d y .\end{align*}\]

Recovering Green’s Theorem from Stokes’ Theorem:

Let \(\mathbf{F} = [L, M, 0]\), then \(\nabla\times\mathbf{F} = [0, 0, \frac{\partial M}{\partial x} - \frac{\partial L}{\partial y}]\)

\[\begin{align*} \iint_ { \partial V } \mathbf { F } \cdot d \mathbf { S } = \iiint _ { V } ( \nabla \cdot \mathbf { F } ) ~d V .\end{align*}\]

Directional Derivatives

\[\begin{align*} D_{\mathbf{v}} f(\mathbf{p}) \mathrel{\vcenter{:}}={\frac{\partial f}{\partial t}\,}(\mathbf{p} + t\mathbf{v}) \Big|_{t=0} .\end{align*}\]

Note that the directional derivative uses a normalized direction vector!

Suppose \(f:{\mathbb{R}}^n\to {\mathbb{R}}\) and \(\mathbf{v}\in {\mathbb{R}}^n\). Then \[\begin{align*} D_{\mathbf{v}}f(\mathbf{p}) = {\left\langle {\nabla f(\mathbf{p})},~{\mathbf{v}} \right\rangle} .\end{align*}\]

We first use the fact that we can find \(L\), the best linear approximation to \(f\): \[\begin{align*} L(\mathbf{x}) &\mathrel{\vcenter{:}}= f(\mathbf{p}) + D_f(\mathbf{p})(\mathbf{x} - \mathbf{p}) \\ \\ D_{\mathbf{v}}f(\mathbf{p}) &= D_{\mathbf{v}} L(\mathbf{p}) \\ &= \lim_{t\to 0} {L(\mathbf{p} + t\mathbf{v}) - L(\mathbf{p}) \over t}\\ &= \lim_{t\to 0} { f(\mathbf{p}) + D_f(\mathbf{p})(\mathbf{p} + t\mathbf{v} - \mathbf{p}) -\qty{f(\mathbf{p}) + D_f(\mathbf{p})(\mathbf{p} - \mathbf{p})} \over t }\\ &= \lim_{t\to 0} { D_f(\mathbf{p})(t\mathbf{v}) \over t} \\ &= D_f(\mathbf{p})\mathbf{v} \\ &\mathrel{\vcenter{:}}=\nabla f(\mathbf{p}) \cdot \mathbf{v} .\end{align*}\]

Computing Integrals

Changing Coordinates

Multivariable Chain Rule

Polar and Cylindrical Coordinates

\[\begin{align*} x = r\cos\theta \\ y = r\sin\theta \\ dV \mapsto r \quad dr~d\theta \end{align*}\]

Spherical Coordinates

\[\begin{align*} x = r\cos\theta = \rho\sin\phi\cos\theta \\ y = r\sin\theta = \rho\sin\phi\sin\theta \\ dV \mapsto r^2 \sin\phi \quad dr~d\phi~d\theta \end{align*}\]

Line Integrals


\[\begin{align*} \int_C f ~ds &\mathrel{\vcenter{:}}=\int_a^b (f\circ \mathbf{r})(t) ~{\left\lVert {\mathbf{r}'(t)} \right\rVert}~dt \\ &= \int_a^b f(x(t), y(t), z(t)) \sqrt{x_t^2 + y_t^2 + z_t^2} ~dt \end{align*}\]

Vector Fields

The function \(\phi\) can be found using the same method from ODEs.

\[\begin{align*} \int_a^b \mathbf F_1 ~dx + \mathbf F_2 ~dy + \cdots \mathrel{\vcenter{:}}=\int_C \mathbf F \cdot d\mathbf r \end{align*}\] in which case \([dx, dy, \cdots] \mathrel{\vcenter{:}}=[x_t, y_t, \cdots] = \mathbf r'(t)\).


\[\begin{align*} \iint_S \mathbf{F}\cdot d\mathbf{S} = \iint_S \mathbf{F}\cdot \mathbf{\widehat{n}} ~dS .\end{align*}\]


Given \(R\) and \(f(x,y) = 0\), \[\begin{align*} A(R) = \oint _ { {\partial}R } x ~d y = - \oint _ { {\partial}R } y ~d x = \frac { 1 } { 2 } \oint _ { {\partial}R } - y ~d x + x ~d y . \end{align*}\]

Compute \[\begin{align*} \oint_{{\partial}R} x ~dy = - \oint_{{\partial}R} y ~dx \\ = \frac{1}{2} \oint_{{\partial}R} -y~dx + x~dy = \frac{1}{2} \iint_R 1 - (-1) ~dA =\iint_R 1 ~dA \end{align*}\]

Surface Integrals

Other Results

\(\nabla \cdot \mathbf{F} = 0 \not \implies \exists G:~ \mathbf{F} = \nabla\times G\). A counterexample

\[\begin{align*} \mathbf{F}(x,y,z) =\frac{1}{\sqrt{x^2+y^2+z^2}}[x, y, z]~,\quad S = S^2 \subset {\mathbb{R}}^3 \\ \implies \nabla \mathbf{F} = 0 \text{ but } \iint_{S^2}\mathbf{F}\cdot d\mathbf{S} = 4\pi \neq 0 \end{align*}\] Where by Stokes’ theorem, \[\begin{align*} \mathbf{F} = \nabla\times\mathbf{G}\implies \iint_{S^2} \mathbf{F} &= \iint_{S^2} \nabla\times\mathbf{G} \\ \\ &= \oint_{{\partial}S^2}\mathbf{G}~d\mathbf{r} && \text{by Stokes}\\ &= 0 \end{align*}\] since \({\partial}S^2 = \emptyset\).

Sufficient condition: if \(\mathbf{F}\) is everywhere \(C^1\), \[\begin{align*} \exists \mathbf{G}:~ \mathbf{F} = \nabla \times\mathbf{G} \iff \iint_S \mathbf{F}\cdot d\mathbf{S} = 0 \text{ for all closed surfaces }S .\end{align*}\]

Linear Algebra

The underlying field will be assumed to be \({\mathbb{R}}\) for this section.


\[\begin{align*} \operatorname{Mat}(m, n) && \text{the space of all } m\times n\text{ matrices} \\ T && \text{a linear map } {\mathbb{R}}^n \to {\mathbb{R}}^m \\ A\in \operatorname{Mat}(m, n)&& \text{an } m\times n \text{ matrix representing }T \\ A^t\in \operatorname{Mat}(n, m) && \text{an } n\times m \text{ transposed matrix} \\ \mathbf{a} && \text{a } 1\times n \text{ column vector} \\ \mathbf{a}^t && \text{an } n\times 1 \text{ row vector} \\ A = {\left[ {\mathbf{a}_1, \cdots, \mathbf{a}_n} \right]} && \text{a matrix formed with } \mathbf{a}_i \text{ as the columns} \\ V, W && \text{vector spaces} \\ |V|, \dim(W) && \text{dimensions of vector spaces} \\ \det(A) && \text{the determinant of }A \\ \begin{bmatrix} A &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& \mathbf{b} \end{bmatrix} \mathrel{\vcenter{:}}={\left[ {\mathbf{a}_1, \mathbf{a}_2, \cdots \mathbf{a}_n, \mathbf{b}} \right]} && \text{augmented matrices} \\ \begin{bmatrix} A &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& B \end{bmatrix} \mathrel{\vcenter{:}}={\left[ {\mathbf{a}_1, \cdots \mathbf{a}_n, \mathbf{b}_1, \cdots, \mathbf{b}_m} \right]} && \text{block matrices}\\ \operatorname{Spec}(A) && \text{the multiset of eigenvalues of } A \\ A\mathbf{x} = \mathbf{b} && \text{a system of linear equations} \\ r\mathrel{\vcenter{:}}=\operatorname{rank}(A) && \text{the rank of }A\\ r_b = \operatorname{rank}\qty{ \begin{bmatrix} A &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& \mathbf{b} \\ \end{bmatrix} } && \text{the rank of }A\text{ augmented by }\mathbf{b} .\end{align*}\]

Big Theorems

\[\begin{align*} {\left\lvert {\ker(A)} \right\rvert} + {\left\lvert {\operatorname{im}(A)} \right\rvert} = {\left\lvert {\operatorname{dom}(A)} \right\rvert} ,\end{align*}\] where \(\operatorname{nullspace}(A) = {\left\lvert {\operatorname{im}{A}} \right\rvert}, \operatorname{rank}(A) = {\left\lvert {\operatorname{im}(A)} \right\rvert},\) and \(n\) is the number of columns in the corresponding matrix.

Generalization: the following sequence is always exact: \[\begin{align*} 0 \to \ker(A) \xhookrightarrow{\text{id}} \operatorname{dom}(A) % \xrightarrow[]{A}\mathrel{\mkern-14mu}\rightarrow \operatorname{im}(A) \to 0 .\end{align*}\] Moreover, it always splits, so \(\operatorname{dom}A = \ker A \oplus \operatorname{im}A\) and thus \({\left\lvert {\operatorname{dom}(A)} \right\rvert} = {\left\lvert {\ker(A)} \right\rvert} + {\left\lvert {\operatorname{im}(A)} \right\rvert}\).

We also have \[\begin{align*} \dim(\operatorname{rowspace}(A)) = \dim(\operatorname{colspace}(A)) = \operatorname{rank}(A) .\end{align*}\]

Big List of Equivalent Properties

Let \(A\) be an \(m\times n\) matrix. TFAE: - \(A\) is invertible and has a unique inverse \(A^{-1}\) - \(A^T\) is invertible - \(\det(A) \neq 0\) - The linear system \(A\mathbf{x} = \mathbf{b}\) has a unique solution for every \(b\ \in {\mathbb{R}}^m\) - The homogeneous system \(A\mathbf{x} = 0\) has only the trivial solution \(\mathbf{x} = 0\) - \(\operatorname{rank}(A) = n\) - i.e. \(A\) is full rank - \(\mathrm{nullity}(A) \mathrel{\vcenter{:}}=\dim\mathrm{nullspace}(A) = 0\) - \(A = \prod_{i=1}^k E_i\) for some finite \(k\), where each \(E_i\) is an elementary matrix. - \(A\) is row-equivalent to the identity matrix \(I_n\) - \(A\) has exactly \(n\) pivots - The columns of \(A\) are a basis for \({\mathbb{R}}^n\) - i.e. \(\operatorname{colspace}(A) = {\mathbb{R}}^n\) - The rows of \(A\) are a basis for \({\mathbb{R}}^m\) - i.e. \(\mathrm{rowspace}(A) = {\mathbb{R}}^m\) - \(\left(\operatorname{colspace}(A)\right)^\perp= \left(\mathrm{rowspace}A\right)^\perp= \left\{{\mathbf{0}}\right\}\) - Zero is not an eigenvalue of \(A\). - \(A\) has \(n\) linearly independent eigenvectors - The rows of \(A\) are coplanar.

Similarly, by taking negations, TFAE:

Reformulated in terms of linear maps \(T\), TFAE: - \(T^{-1}: {\mathbb{R}}^m \to {\mathbb{R}}^n\) exists - \(\operatorname{im}(T) = {\mathbb{R}}^n\) - \(\ker(T) = 0\) - \(T\) is injective - \(T\) is surjective - \(T\) is an isomorphism - The system \(A\mathbf{x} = 0\) has infinitely many solutions

Vector Spaces

Linear Transformations

It is common to want to know the range and kernel of a specific linear transformation \(T\). \(T\) can be given in many ways, but a general strategy for deducing these properties involves:

Useful fact: if \(V\leq W\) is a subspace and \(\dim(V) \geq \dim(W)\), then \(V=W\).

If \(V\subseteq W\), then \(V\) is a subspace of \(W\) if the following hold:

\[\begin{align*} (1) && \mathbf{0}\in V \\ (2) && \mathbf{a}, \mathbf{b}\in V\implies t\mathbf{a} + \mathbf{b}\in V .\end{align*}\]

Linear Independence

Any set of two vectors \(\left\{{\mathbf{v}, \mathbf{w}}\right\}\) is linearly dependent \(\iff \exists \lambda :~\mathbf{v} = \lambda \mathbf{w}\), i.e. one is not a scalar multiple of the other.


A set \(S\) forms a basis for a vector space \(V\) iff

  1. \(S\) is a set of linearly independent vectors, so \(\sum \alpha_i \vec{s_i} = 0 \implies \alpha_i = 0\) for all \(i\).
  2. \(S\) spans \(V\), so \(\vec{v} \in V\) implies there exist \(\alpha_i\) such that \(\sum \alpha_i \vec{s_i} = \vec{v}\)

In this case, we define the dimension of \(V\) to be \({\left\lvert {S} \right\rvert}\).

The Inner Product

The point of this section is to show how an inner product can induce a notion of “angle”, which agrees with our intuition in Euclidean spaces such as \({\mathbb{R}}^n\), but can be extended to much less intuitive things, like spaces of functions.

The Euclidean inner product is defined as \[\begin{align*} {\left\langle {\mathbf{a}},~{\mathbf{b}} \right\rangle} = \sum_{i=1}^n a_i b_i = a_1b_1 + a_2b_2 + \cdots + a_nb_n .\end{align*}\]

Also sometimes written as \(\mathbf{a}^T\mathbf{b}\) or \(\mathbf{a} \cdot \mathbf{b}\).

Yields a norm \[\begin{align*} {\left\lVert {\mathbf{x}} \right\rVert} \mathrel{\vcenter{:}}=\sqrt{{\left\langle {\mathbf{x}},~{\mathbf{x}} \right\rangle}} \end{align*}\]

which has a useful alternative formulation \[\begin{align*} {\left\langle {\mathbf{x}},~{\mathbf{x}} \right\rangle} = {\left\lVert {\mathbf{x}} \right\rVert}^2 .\end{align*}\]

This leads to a notion of angle: \[\begin{align*} {\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle} = {\left\lVert {\mathbf{x}} \right\rVert} {\left\lVert {\mathbf{y}} \right\rVert} \cos\theta_{x,y} \implies \cos \theta_{x,y} \mathrel{\vcenter{:}}=\frac{{\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle}}{{\left\lVert {\mathbf{x}} \right\rVert} {\left\lVert {\mathbf{y}} \right\rVert}} = {\left\langle {\widehat{\mathbf{x}}},~{\widehat{\mathbf{y}}} \right\rangle} \end{align*}\] where \(\theta_{x,y}\) denotes the angle between the vectors \(\mathbf{x}\) and \(\mathbf{y}\).

Since \(\cos \theta=0\) exactly when \(\theta = \pm \frac \pi 2\), we can can declare two vectors to be orthogonal exactly in this case: \[\begin{align*} \mathbf{x} \in \mathbf{y}^\perp\iff {\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle} = 0 .\end{align*}\]

Note that this makes the zero vector orthogonal to everything.

Given a subspace \(S \subseteq V\), we define its orthogonal complement \[\begin{align*} S^\perp= \left\{{\mathbf{v}\in V {~\mathrel{\Big|}~}\forall \mathbf{s}\in S,~ {\left\langle {\mathbf{v}},~{\mathbf{s}} \right\rangle} = 0}\right\}. \end{align*}\]

Any choice of subspace \(S\subseteq V\) yields a decomposition \(V = S \oplus S^\perp\).

A useful formula is \[\begin{align*} {\left\lVert {\mathbf{x} + \mathbf{y}} \right\rVert}^2 = {\left\lVert {\mathbf{x}} \right\rVert}^2 + 2{\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle} + {\left\lVert {\mathbf{y}} \right\rVert}^2, .\end{align*}\]

When \(\mathbf{x}\in \mathbf{y}^\perp\), this reduces to \[\begin{align*} {\left\lVert {\mathbf{x} + \mathbf{y}} \right\rVert}^2 = {\left\lVert {\mathbf{x}} \right\rVert}^2 + {\left\lVert {\mathbf{y}} \right\rVert}^2 .\end{align*}\]

. Bilinearity: \[\begin{align*} {\left\langle {\sum_j \alpha_j \mathbf{a}_j},~{\sum_k \beta_k \mathbf{b}_k} \right\rangle} = \sum_j \sum_i \alpha_j \beta_i {\left\langle {\mathbf{a}_j},~{\mathbf{b}_i} \right\rangle}. \end{align*}\]

  1. Symmetry: \[\begin{align*} {\left\langle {\mathbf{a}},~{\mathbf{b}} \right\rangle} = {\left\langle {\mathbf{b}},~{\mathbf{a}} \right\rangle} \end{align*}\]

  2. Positivity: \[\begin{align*} \mathbf{a} \neq \mathbf{0} \implies {\left\langle {\mathbf{a}},~{\mathbf{a}} \right\rangle} > 0 \end{align*}\]

  3. Nondegeneracy: \[\begin{align*} \mathbf{a} = \mathbf{0} \iff {\left\langle {\mathbf{a}},~{\mathbf{a}} \right\rangle} = 0 \end{align*}\]

Gram-Schmidt Process

Extending a basis \(\left\{{\mathbf{x}_i}\right\}\) to an orthonormal basis \(\left\{{\mathbf{u}_i}\right\}\)

\[\begin{align*} \mathbf{u}_1 &= N(\mathbf{x_1}) \\ \mathbf{u}_2 &= N(\mathbf{x}_2 - {\left\langle {\mathbf{x}_2},~{\mathbf{u}_1} \right\rangle}\mathbf{u}_1)\\ \mathbf{u}_3 &= N(\mathbf{x}_3 - {\left\langle {\mathbf{x}_3},~{\mathbf{u}_1} \right\rangle}\mathbf{u}_1 - {\left\langle {\mathbf{x}_3},~{\mathbf{u}_2} \right\rangle}\mathbf{u}_2 ) \\ \vdots & \qquad \vdots \\ \mathbf{u}_k &= N(\mathbf{x}_k - \sum_{i=1}^{k-1} {\left\langle {\mathbf{x}_k},~{\mathbf{u}_i} \right\rangle}\mathbf{u}_i) \end{align*}\]

where \(N\) denotes normalizing the result.

In more detail

The general setup here is that we are given an orthogonal basis \(\left\{{\mathbf{x}_i}\right\}_{i=1}^n\) and we want to produce an orthonormal basis from them.

Why would we want such a thing? Recall that we often wanted to change from the standard basis \(\mathcal{E}\) to some different basis \(\mathcal{B} = \left\{{\mathbf{b}_1, \mathbf{b}_2, \cdots}\right\}\). We could form the change of basis matrix \(B = [\mathbf{b}_1, \mathbf{b}_2, \cdots]\) acts on vectors in the \(\mathcal{B}\) basis according to \[\begin{align*} B[\mathbf{x}]_\mathcal{B} = [\mathbf{x}]_{\mathcal{E}} .\end{align*}\]

But to change from \(\mathcal{E}\) to \(\mathcal{B}\) requires computing \(B^{-1}\), which acts on vectors in the standard basis according to \[\begin{align*} B^{-1}[\mathbf{x}]_\mathcal{E} = [\mathbf{x}]_{\mathcal{B}} .\end{align*}\]

If, on the other hand, the \(\mathbf{b}_i\) are orthonormal, then \(B^{-1} = B^T\), which is much easier to compute. We also obtain a rather simple formula for the coordinates of \(\mathbf{x}\) with respect to \(\mathcal B\). This follows because we can write \[\begin{align*} \mathbf{x} = \sum_{i=1}^n {\left\langle {\mathbf{x}},~{\mathbf{b}_i} \right\rangle} \mathbf{b}_i \mathrel{\vcenter{:}}=\sum_{i=1}^n c_i \mathbf{b}_i, .\end{align*}\]

and we find that \[\begin{align*} [\mathbf{x}]_\mathcal{B} = \mathbf{c} \mathrel{\vcenter{:}}=[c_1, c_2, \cdots, c_n]^T. .\end{align*}\]

This also allows us to simplify projection matrices. Supposing that \(A\) has orthonormal columns and letting \(S\) be the column space of \(A\), recall that the projection onto \(S\) is defined by \[\begin{align*} P_S = Q(Q^TQ)^{-1}Q^T .\end{align*}\]

Since \(Q\) has orthogonal columns and satisfies \(Q^TQ = I\), this simplifies to \[\begin{align*} P_S = QQ^T. .\end{align*}\]

The Algorithm

Given the orthogonal basis \(\left\{{\mathbf{x}_i}\right\}\), we form an orthonormal basis \(\left\{{\mathbf{u}_i}\right\}\) iteratively as follows.

First define \[\begin{align*} N: {\mathbb{R}}^n &\to S^{n-1} \\ \mathbf{x} &\mapsto \widehat{\mathbf{x}} \mathrel{\vcenter{:}}=\frac {\mathbf{x}} {{\left\lVert {\mathbf{x}} \right\rVert}} \end{align*}\]

which projects a vector onto the unit sphere in \({\mathbb{R}}^n\) by normalizing. Then,

\[\begin{align*} \mathbf{u}_1 &= N(\mathbf{x_1}) \\ \mathbf{u}_2 &= N(\mathbf{x}_2 - {\left\langle {\mathbf{x}_2},~{\mathbf{u}_1} \right\rangle}\mathbf{u}_1)\\ \mathbf{u}_3 &= N(\mathbf{x}_3 - {\left\langle {\mathbf{x}_3},~{\mathbf{u}_1} \right\rangle}\mathbf{u}_1 - {\left\langle {\mathbf{x}_3},~{\mathbf{u}_2} \right\rangle}\mathbf{u}_2 ) \\ \vdots & \qquad \vdots \\ \mathbf{u}_k &= N(\mathbf{x}_k - \sum_{i=1}^{k-1} {\left\langle {\mathbf{x}_k},~{\mathbf{u}_i} \right\rangle}\mathbf{u}_i) \end{align*}\]

In words, at each stage, we take one of the original vectors \(\mathbf{x}_i\), then subtract off its projections onto all of the \(\mathbf{u}_i\) we’ve created up until that point. This leaves us with only the component of \(\mathbf{x}_i\) that is orthogonal to the span of the previous \(\mathbf{u}_i\) we already have, and we then normalize each \(\mathbf{u}_i\) we obtain this way.

Alternative Explanation:

Given a basis \[\begin{align*} S = \left\{\mathbf{v_1, v_2, \cdots v_n}\right\}, \end{align*}\]

the Gram-Schmidt process produces a corresponding orthogonal basis \[\begin{align*} S' = \left\{\mathbf{u_1, u_2, \cdots u_n}\right\} \end{align*}\] that spans the same vector space as \(S\).

\(S'\) is found using the following pattern: \[\begin{align*} \mathbf{u_1} &= \mathbf{v_1} \\ \mathbf{u_2} &= \mathbf{v_2} - \text{proj}_{\mathbf{u_1}} \mathbf{v_2}\\ \mathbf{u_3} &= \mathbf{v_3} - \text{proj}_{\mathbf{u_1}} \mathbf{v_3} - \text{proj}_{\mathbf{u_2}} \mathbf{v_3}\\ \end{align*}\]

where \[\begin{align*} \text{proj}_{\mathbf{u}} \mathbf{v} = (\text{scal}_{\mathbf{u}} \mathbf{v})\frac{\mathbf{u}}{\mathbf{{\left\lVert {u} \right\rVert}}} = \frac{\langle \mathbf{v,u} \rangle}{{\left\lVert {\mathbf{u}} \right\rVert}}\frac{\mathbf{u}}{\mathbf{{\left\lVert {u} \right\rVert}}} = \frac{{\left\langle {\mathbf{v}},~{\mathbf{u}} \right\rangle}}{{\left\lVert {\mathbf{u}} \right\rVert}^2}\mathbf{u} \end{align*}\] is a vector defined as the


The orthogonal set \(S'\) can then be transformed into an orthonormal set \(S''\) by simply dividing the vectors \(s\in S'\) by their magnitudes. The usual definition of a vector’s magnitude is \[\begin{align*} {\left\lVert {\mathbf{a}} \right\rVert} = \sqrt{{\left\langle {\mathbf{a}},~{\mathbf{a}} \right\rangle}} \text{ and } {\left\lVert {\mathbf{a}} \right\rVert}^2 = {\left\langle {\mathbf{a}},~{\mathbf{a}} \right\rangle} \end{align*}\]

As a final check, all vectors in \(S'\) should be orthogonal to each other, such that \[\begin{align*} {\left\langle {\mathbf{v_i}},~{\mathbf{v_j}} \right\rangle} = 0 \text{ when } i \neq j \end{align*}\]

and all vectors in \(S''\) should be orthonormal, such that \[\begin{align*} {\left\langle {\mathbf{v_i}},~{\mathbf{v_j}} \right\rangle} = \delta_{ij} \end{align*}\]

The Fundamental Subspaces Theorem

Given a matrix \(A \in \mathrm{Mat}(m, n)\), and noting that \[\begin{align*} A &: {\mathbb{R}}^n \to {\mathbb{R}}^m,\\ A^T &: {\mathbb{R}}^m \to {\mathbb{R}}^n \end{align*}\]

We have the following decompositions: \[\begin{align*} &{\mathbb{R}}^n &\cong \ker A &\oplus \operatorname{im}A^T &\cong \mathrm{nullspace}(A) &\oplus~ \mathrm{colspace}(A^T) \\ &{\mathbb{R}}^m &\cong \operatorname{im}A &\oplus \ker A^T &\cong \mathrm{colspace}(A) &\oplus~ \mathrm{nullspace}(A^T) \end{align*}\]

Computing change of basis matrices


An \(m\times n\) matrix is a map from \(n\)-dimensional space to \(m\)-dimensional space. The number of rows tells you the dimension of the codomain, the number of columns tells you the dimension of the domain.

The space of matrices is not an integral domain! Counterexample: if \(A\) is singular and nonzero, there is some nonzero \(\mathbf{v}\) such that \(A \mathbf{v} = \mathbf{0}\). Then setting \(B = {\left[ {\mathbf{v}, \mathbf{v}, \cdots} \right]}\) yields \(AB = 0\) with \(A\neq 0, B\neq 0\).

The rank of a matrix \(A\) representing a linear transformation \(T\) is \(\dim \operatorname{colspace}(A)\), or equivalently \(\dim \operatorname{im}T\).

\(\operatorname{rank}(A)\) is equal to the number of nonzero rows in \(\operatorname{RREF}(A)\).

\[\begin{align*} \mathrm{Trace}(A) = \sum_{i=1}^m A_{ii} \end{align*}\]

The following are elementary row operations on a matrix:

If \(A = {\left[ {\mathbf{a}_1, \mathbf{a}_2, \cdots} \right]} \in \mathrm{Mat}(m, n)\) and \(B = {\left[ {\mathbf{b}_1, \mathbf{b}_2, \cdots} \right]} \in\mathrm{Mat}(n, p)\), then \[\begin{align*} C \mathrel{\vcenter{:}}= AB \implies c_{ij} = \sum_{k=1}^n a_{ik}b_{kj} = {\left\langle {\mathbf{a_i}},~{\mathbf{b_j}} \right\rangle} \end{align*}\] where \(1\leq i \leq m\) and \(1\leq j \leq p\). In words, each entry \(c_{ij}\) is obtained by dotting row \(i\) of \(A\) against column \(j\) of \(B\).

Systems of Linear Equations

A system of linear equations is consistent when it has at least one solution. The system is inconsistent when it has no solutions.


Homogeneous systems are always consistent, i.e. there is always at least one solution.

There are three possibilities for a system of linear equations:

  1. No solutions (inconsistent)
  2. One unique solution (consistent, square or tall matrices)
  3. Infinitely many solutions (consistent, underdetermined, square or wide matrices)

These possibilities can be check by considering \(r\mathrel{\vcenter{:}}=\operatorname{rank}(A)\):


\[\begin{align*} \det{(A \mod p}) \mod p \equiv (\det{A}) \mod p \end{align*}\]

For \(2\times 2\) matrices, \[\begin{align*} A^{-1} = \left( \begin{array}{cc} a & b \\ c & d \end{array}\right)^{-1} = \frac{1}{\det{A}}\left( \begin{array}{cc} d & -b \\ -c & a \end{array}\right) \end{align*}\] In words, swap the main diagonal entries, and flip the signs on the off-diagonal.

Let \(A \in \mathrm{Mat}(m, n)\), then there is a function \[\begin{align*} \det: \operatorname{Mat}(m, m) &\to {\mathbb{R}}\\ A &\mapsto \det(A) \end{align*}\] satisfying the following properties:


Computing Determinants

Useful shortcuts:

The minor \(M_{ij}\) of \(A\in \operatorname{Mat}(n, n)\) is the determinant of the \((n-1) \times (n-1)\) matrix obtained by deleting the \(i\)th row and \(j\)th column from \(A\).

The cofactor \(C_{ij}\) is the scalar defined by \[\begin{align*} C_{ij} \mathrel{\vcenter{:}}=(-1)^{i+j} M_{ij} .\end{align*}\]

For any fixed \(i\), there is a formula \[\begin{align*} \det(A) = \sum_{j=1}^n a_{ij} C_{ij} .\end{align*}\]

Let \[\begin{align*} A = \left[\begin{array}{lll} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{array}\right] .\end{align*}\]

Then \[\begin{align*} \det A = 1 \cdot\left|\begin{array}{ll} 5 & 6 \\ 8 & 9 \end{array}\right|-2 \cdot\left|\begin{array}{ll} 4 & 6 \\ 7 & 9 \end{array}\right|+3 \cdot\left|\begin{array}{ll} 4 & 5 \\ 7 & 8 \end{array}\right| = 1 \cdot(-3)-2 \cdot(-6)+3 \cdot(-3) = 0 .\end{align*}\]

\(\det(A)\) can be computed by reducing \(A\) to \(\operatorname{RREF}(A)\) (which is upper triangular) and keeping track of the following effects:

Inverting a Matrix

Given a linear system \(A\mathbf{x} = \mathbf{b}\), writing \(\mathbf{x} = {\left[ {x_1, \cdots, x_n} \right]}\), there is a formula \[\begin{align*} x_i = \frac{\det(B_i)}{\det(A)} \end{align*}\] where \(B_i\) is \(A\) with the \(i\)th column deleted and replaced by \(\mathbf{b}\).

Under the equivalence relation of elementary row operations, there is an equivalence of augmented matrices: \[\begin{align*} \begin{bmatrix} A &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& I \end{bmatrix} \sim \begin{bmatrix} I &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& A^{-1} \end{bmatrix} \end{align*}\] where \(I\) is the \(n\times n\) identity matrix.

\[\begin{align*} A^{-1} = {1\over \det(A)} {\left[ {C_{ij}} \right]}^t .\end{align*}\] where \(C_{ij}\) is the cofactor() at position \(i,j\).1

\[\begin{align*} \left(\begin{array}{cc} a& b \\ c& d \end{array}\right)^{-1} = {1 \over a d - b c} \left(\begin{array}{rr} d & -b \\ -c & a \end{array}\right) \quad \text{ where } ad-bc \ne 0 \end{align*}\]

What’s the pattern?

  1. Always divide by determinant
  2. Swap the diagonals
  3. Hadamard product with checkerboard

\[\begin{align*} \begin{bmatrix} + & - \\ - & + \end{bmatrix} \end{align*}\]

\[\begin{align*} A^{-1} \mathrel{\vcenter{:}}= \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix} ^{-1} = {1 \over {\det A}} \begin{bmatrix} e i - f h & -(b i - c h) & b f - c e \\ -(d i - f g) &a i - c g &-(a f -c d) \\ d h - e g & -(a h - b g)& a e - b d \end{bmatrix} .\end{align*}\]

The pattern:

  1. Divide by determinant
  2. Each entry is determinant of submatrix of \(A\) with corresponding col/row deleted
  3. Hadamard product with checkerboard

\[\begin{align*} \begin{bmatrix} + & - & + \\ - & + & - \\ \ + & - & + \end{bmatrix} \end{align*}\]

  1. Transpose at the end!!

Bases for Spaces of a Matrix

Let \(A\in \operatorname{Mat}(m, n)\) represent a map \(T:{\mathbb{R}}^n\to {\mathbb{R}}^m\).


\[\begin{align*} \dim \operatorname{rowspace}(A) = \dim \operatorname{colspace}(A) .\end{align*}\]

The row space

\[\begin{align*} \operatorname{im}(T)^\vee= \operatorname{rowspace}(A) \subset {\mathbb{R}}^n .\end{align*}\]

Reduce to RREF, and take nonzero rows of \(\mathrm{RREF}(A)\).

The column space

\[\begin{align*} \operatorname{im}(T) = \operatorname{colspace}(A) \subseteq {\mathbb{R}}^m \end{align*}\]

Reduce to RREF, and take columns with pivots from original \(A\).

Not enough pivots implies columns don’t span the entire target domain

The nullspace

\[\begin{align*} \ker(T) = \operatorname{nullspace}(A) \subseteq {\mathbb{R}}^n \end{align*}\]

Reduce to RREF, zero rows are free variables, convert back to equations and pull free variables out as scalar multipliers.


For each \(\lambda \in \operatorname{Spec}(A)\), compute a basis for \(\ker(A - \lambda I)\).

Eigenvalues and Eigenvectors

A vector \(\mathbf{v}\) is said to be an eigenvector of \(A\) with eigenvalue \(\lambda\in \operatorname{Spec}(A)\) iff \[\begin{align*} A\mathbf{v} = \lambda \mathbf{v} \end{align*}\] For a fixed \(\lambda\), the corresponding eigenspace \(E_\lambda\) is the span of all such vectors.

For \(\lambda\in \operatorname{Spec}(A)\), \[\begin{align*} \mathbf{v}\in E_\lambda \iff \mathbf{v} \in \ker(A-I\lambda) .\end{align*}\]

Some miscellaneous useful facts:

Finding generalized eigenvectors


An \(n\times n\) matrix \(P\) is diagonalizable iff its eigenspace is all of \(\mathbb{R}^n\) (i.e. there are \(n\) linearly independent eigenvectors, so they span the space.)

\(A\) is diagonalizable if there is a basis of eigenvectors for the range of \(P\).

Useful Counterexamples

\[\begin{align*} A \mathrel{\vcenter{:}}=\left[ \begin{array} { c c } { 1 } & { 1 } \\ { 0 } & { 1 } \end{array} \right] &\implies A^n = \left[ \begin{array} { c c } { 1 } & { n } \\ { 0 } & { 1 } \end{array} \right], && \operatorname{Spec}(A) = [1,1] \\ A \mathrel{\vcenter{:}}=\left[ \begin{array} { c c } { 1 } & { 1 } \\ { 0 } & { - 1 } \end{array} \right] &\implies A^2 = I_2, && \operatorname{Spec}(A) = [1, -1] \end{align*}\]

Example Problems

Determine a basis for \[\begin{align*} S = \left\{a_0 + a_1 x + a_2 x^2\mathrel{\Big|}a_0,a_1,a_2 \in \mathbb{R} \land a_0 - a_1 -2a_2 =0\right\}. \end{align*}\]

Let \(a_2=t, a_1=s, a_0=s+2t\), then \[\begin{align*} S &= \left\{ (s+2t) + (sx+tx^2)\mathrel{\Big|}s,t\in\mathbb{R} \right\} \\ &= \left\{ (s+sx) + (2t+tx^2)\mathrel{\Big|}s,t\in\mathbb{R} \right\} \\ &= \left\{ s(1+x) + t(2+x^2)\mathrel{\Big|}s,t\in\mathbb{R} \right\} \\ &= \text{span}\left\{(1+x),(2+x^2)\right\} \end{align*}\]

and a basis for \(S\) is \[\begin{align*} \left\{(1+x), (2+x^2)\right\} \end{align*}\]

: If \(V\) is an \(n\)-dimensional vector space, then every set \(S\) with fewer than \(n\) vectors can be extended to a basis for \(V\).

Only sets with fewer than \(n\) vectors can be extended to form a basis for \(V\).

: The set of all 3 x 3 matrices forms a three-dimensional subspace of \(M_{3}(\mathbb{R})\).

This set forms a 6-dimensional subspace. A basis for this space would require six elements.

Given \(A=\) \[\begin{align*}\begin{bmatrix} 1 & 3 ` \\ -2 & -6 \end{bmatrix}\end{align*}\] what is the dimension of the null space of \(A\)?

The augmented matrix for the system \(A\mathbf{x} = \mathbf{0}\) is \[\begin{align*}\begin{bmatrix}[cc|c] 1 & 3 & 0 \\ 0 & 0 & 0 \end{bmatrix}\end{align*}\] which has one free variable.

Writing one variable in terms of another results in \(x_1 + 3x_2 = 0 \Rightarrow x_1 = 3x_2\).

Let \(x_2 = r\) where \(r \in R\), then \(S = \left\{ x \in \mathbb{R}^2 : \mathbf{x} = r(3,1), r \in \mathbb{R}\right\} = \text{span}\left\{(3,1)\right\}\).

So, the set \(B = \left\{(3,1)\right\}\) is a basis for the null space of \(A\) and\ .

Let \(S\) be the subspace of \(\mathbb{R}^3\) that consists of all solutions to the equation \(x-3y+z = 0\). Determine a basis for \(S\), and find dim[\(S\)].

The first goal is to find a way to express the set of 3-tuples that satisfy this equation.

Let \(y=r\) and \(z=s\), then \(x=r-s\). Then vectors \(\mathbf{v}\) that satisfy the equation are all of the form \[\begin{align*} \mathbf{v} = (3r-s, r, s) = (3r,r,0)+(-s,0,s) = r(3,1,0) + s(-1,0,1). \end{align*}\] (Note - the goal here is to separate the dependent variables into different vectors so they can be written as a linear combination of something.)

The set \(S\) that satisfies this equation is then \[\begin{align*} S &= \left\{ \mathbf{v} \in \mathbb{R}^3 : \mathbf{v} =r(3,1,0) + s(-1,0,1) \land r,s\in\mathbb{R} \right\} \\ &= \text{span}\left\{ (3,1,0), (-1,0,1)\right\} \end{align*}\]

All that remains is to check that the vectors in this span are linearly independent. This can be done by showing that \[\begin{align*} a(3,1,0) + b(-1,0,1) = (0,0,0) \end{align*}\] \(a=b=0\).

Since the two vectors are linearly independent and span the solution set \(S\), they form a basis for \(S\) of dimension 2.

Determine a basis for the subspace of \(M_2(\mathbb{R})\) spanned by \[\begin{align*}\begin{bmatrix} -1 & 3 \\ -1 & 2 \end{bmatrix}\end{align*}\]

\[\begin{align*}\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}\end{align*}\]

\[\begin{align*}\begin{bmatrix} -1 & 4 \\ 1 & 1 \end{bmatrix}\end{align*}\]

\[\begin{align*}\begin{bmatrix} 5 & 6 \\ -5 & 1 \end{bmatrix}\end{align*}\].

Note that because the set contains the zero matrix, it is linearly dependent. So only consider the other three, as they span the same subspace as the original set.

First, determine if \(\left\{ A_1, A_2, A_3\right\}\) is linearly independent. Start with the equation \[\begin{align*} c_1A_1 + c_2A_2 + c_3A_3 = 0_2 \end{align*}\]

which gives \[\begin{align*} c_1 - c_2 + 5c_3 &= 0 \\ 3c_1 + 4c_2 - 6c_3 &= 0 \\ -c_1 + c_2 - 5c_3 &= 0 \\ 2c_1 + c_2 + c_3 &= 0 \end{align*}\]

which has the solution \((-2r,3r,r)\). So the set is linearly dependent by the relation \[\begin{align*} -2A_1 + 3A_2 + A_3 = 0 \text{ or }\\ A_3 = 2A_1 - 3A_2 \end{align*}\]

So \(\left\{A_1, A_2\right\}\) spans the same subspace as the original set. It is also linearly independent, and therefore forms a basis for the original subspace.

Let \(A, B, C \in M_2 (\mathbb{R})\). Define \(\langle A,B\rangle = a_{11}b_{11}+2a_{12}b{12}+3a_{21}b_{21}\). Does this define an inner product on \(M_2 (\mathbb{R})\)?

Instead, let \(\langle A,B\rangle = a_{11} + b_{22}\). Does this define an inner product on \(M_2(\mathbb{R})\)?

Let \(p=a_0 + a_1 x + a_2 x^2\) and \(q=b_0 + b_1 x + b_2 x^2\). Define \(\langle p,q\rangle = \sum_{i=0}^{2}(i+1)a_i b_i\). Does this define an inner product on \(P_2\)?

Linear Algebra: Advanced Topics

Changing Basis

The transition matrix from a given basis \(\mathcal{B} = \left\{{\mathbf{b}_i}\right\}_{i=1}^n\) to the standard basis is given by \[\begin{align*} A\mathrel{\vcenter{:}}= \begin{bmatrix} \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_n \\ \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ \end{bmatrix} ,\end{align*}\] and the transition matrix from the standard basis to \(\mathcal{B}\) is \(A^{-1}\).

Orthogonal Matrices

Given a notion of orthogonality for vectors, we can extend this to matrices. A square matrix is said to be orthogonal iff \(QQ^T = Q^TQ = I\). For rectangular matrices, we have the following characterizations: \[\begin{align*} QQ^T = I \implies &\text{The rows of } Q \text { are orthogonal,} \\ Q^TQ = I \implies &\text{The columns of } Q \text{ are orthogonal.} \end{align*}\]

To remember which condition is which, just recall that matrix multiplication \(AB\) takes the inner product between the rows of \(A\) and the columns of \(B\). So if, for example, we want to inspect whether or not the columns of \(Q\) are orthogonal, we should let \(B=Q\) in the above formulation – then we just note that the rows of \(Q^T\) are indeed the columns of \(Q\), so \(Q^TQ\) computes the inner products between all pairs of the columns of \(Q\) and stores them in a matrix.


A projection \(P\) induces a decomposition \[\begin{align*} \operatorname{dom}(P) = \ker(P) \oplus \ker(P)^\perp .\end{align*}\]

Distance from a point \(\mathbf{p}\) to a line \(\mathbf{a} + t\mathbf{b}\): let \(\mathbf{w} = \mathbf{p} - \mathbf{a}\), then: \({\left\lVert {\mathbf{w} - P(\mathbf{w}, \mathbf{v})} \right\rVert}\)

\[\begin{align*} \operatorname{Proj}_{\operatorname{range}(A)}(\mathbf{x}) = A(A^t A)^{-1} A^t \mathbf{x} .\end{align*}\]

Mnemonic: \[\begin{align*} P \approx {A^t A \over AA^t} .\end{align*}\]

With an inner product in hand and a notion of orthogonality, we can define a notion of orthogonal projection of one vector onto another, and more generally of a vector onto a subspace spanned by multiple vectors.

Projection Onto a Vector

Say we have two vectors \(\mathbf{x}\) and \(\mathbf{y}\), and we want to define “the component of \(\mathbf{x}\) that lies along \(\mathbf{y}\)”, which we’ll call \(\mathbf{p}\). We can work out what the formula should be using a simple model:

We notice that whatever \(p\) is, it will in the direction of \(\mathbf{y}\), and thus \(\mathbf{p} = \lambda \widehat{\mathbf{y}}\) for some scalar \(\lambda\), where in fact \(\lambda = {\left\lVert {\mathbf{p}} \right\rVert}\) since \({\left\lVert {\widehat{\mathbf{y}}} \right\rVert} = 1\). We will find that \(\lambda = {\left\langle {\mathbf{x}},~{\widehat{\mathbf{y}}} \right\rangle}\), and so \[\begin{align*} \mathbf{p} = {\left\langle {\mathbf{x}},~{\widehat{\mathbf{y}}} \right\rangle}\widehat{\mathbf{y}} = \frac{{\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle}}{{\left\langle {\mathbf{y}},~{\mathbf{y}} \right\rangle}} \mathbf{y} .\end{align*}\]

Notice that we can then form a “residual” vector \(\mathbf{r} = \mathbf{x} - \mathbf{p}\), which should satisfy \(\mathbf{r} ^\perp\mathbf{p}\). If we were to let \(\lambda\) vary as a function of a parameter \(t\) (making \(\mathbf{r}\) a function of \(t\) as well) we would find that this particular choice minimizes \({\left\lVert {\mathbf{r} (t)} \right\rVert}\).

Projection Onto a Subspace

In general, supposing one has a subspace \(S = \mathrm{span}\left\{{\mathbf{y}_1, \mathbf{y}_2, \cdots, \mathbf{y}_n}\right\}\) and (importantly!) the \(\mathbf{y}_i\) are orthogonal, then the projection of \(\mathbf{p}\) of \(x\) onto \(S\) is given by the sum of the projections onto each basis vector, yielding

\[\begin{align*} \mathbf{p} = \sum_{i=1}^n \frac{{\left\langle {\mathbf{x}},~{\mathbf{y}_i} \right\rangle}}{{\left\langle {\mathbf{y}_i},~{\mathbf{y}_i} \right\rangle}} \mathbf{y}_i = \sum_{i=1}^n {\left\langle {\mathbf{x}},~{\mathbf{y}_i} \right\rangle} \widehat{\mathbf{y}_i} .\end{align*}\]

Note: this is part of why having an orthogonal basis is desirable!

Letting \(A = [\mathbf{y}_1, \mathbf{y}_2, \cdots]\), then the following matrix projects vectors onto \(S\), expressing them in terms of the basis \(\mathbf{y}_i\)2: \[\begin{align*} \tilde P_A = (AA^T)^{-1}A^T, \end{align*}\]

while this matrix performs the projection and expresses it in terms of the standard basis: \[\begin{align*} P_A = A(AA^T)^{-1}A^T. \end{align*}\]

Equation of a plane: given a point \(\mathbf{p}_0\) on a plane and a normal vector \(\mathbf{n}\), any vector \(\mathbf{x}\) on the plane satisfies \[\begin{align*} {\left\langle {\mathbf{x} - \mathbf{p}_0},~{\mathbf{n}} \right\rangle} = 0 \end{align*}\]

To find the distance between a point \(\mathbf{a}\) and a plane, we need only project \(\mathbf{a}\) onto the subspace spanned by the normal \(\mathbf{n}\): \[\begin{align*} d = {\left\langle {\mathbf{a}},~{\mathbf{n}} \right\rangle} .\end{align*}\]

One important property of projections is that for any vector \(\mathbf{v}\) and for any subspace \(S\), we have \(\mathbf{v} - P_S(\mathbf{v}) \in S^\perp\). Moreover, if \(\mathbf{v} \in S^\perp\), then \(P_s(\mathbf{v})\) must be zero. This follows by noting that in equation \(\ref{projection_equation}\), every inner product appearing in the sum vanishes, by definition of \(\mathbf{v} \in S^\perp\), and so the projection is zero.

Least Squares

\(\mathbf{x}\) is a least squares solution to \(A\mathbf{x} = \mathbf{b}\) iff \[\begin{align*} A^t A \mathbf{x} = A^t \mathbf{b} \end{align*}\]

The general setup here is that we would like to solve \(A\mathbf{x} = \mathbf{b}\) for \(\mathbf{x}\), where \(\mathbf{b}\) is not in fact in the range of \(A\). We thus settle for a unique “best” solution \(\tilde{\mathbf{x}}\) such that the error \({\left\lVert {A\tilde{\mathbf{x}} - \mathbf{b}} \right\rVert}\) is minimized.

Geometrically, the solution is given by projecting \(\mathbf{b}\) onto the column space of \(A\). To see why this is the case, define the residual vector \(\mathbf{r} = A\tilde{\mathbf{x}} - \mathbf{b}\). We then seek to minimize \({\left\lVert {\mathbf{r}} \right\rVert}\), which happens exactly when \(\mathbf{r} ^\perp\operatorname{im}A\). But this happens exactly when \(\mathbf{r} \in (\operatorname{im}A)^\perp\), which by the fundamental subspaces theorem, is equivalent to \(\mathbf{r} \in \ker A^T\).

From this, we get the equation \[\begin{align*} A^T \mathbf{r} = \mathbf{0} \\ \implies A^T(A \tilde{\mathbf{x}} - \mathbf{b}) = \mathbf{0}\\ \implies A^TA\tilde{\mathbf{x}} = A^T \mathbf{b}, \end{align*}\]

where the last line is described as the normal equations.

If \(A\) is an \(m\times n\) matrix and is of full rank, so it has \(n\) linearly independent columns, then one can show that \(A^T A\) is nonsingular, and we thus arrive at the least-squares solution \[\begin{align*} \tilde{\mathbf{x}} = (A^TA)^{-1}A^T \mathbf{b} \hfill\blacksquare \end{align*}\]

These equations can also be derived explicitly using Calculus applied to matrices, vectors, and inner products. This requires the use of the following formulas: \[\begin{align*} {\frac{\partial }{\partial \mathbf{x}}\,} {\left\langle {\mathbf{x}},~{\mathbf{a}} \right\rangle} &= \mathbf{a} \\ {\frac{\partial }{\partial \mathbf{x}}\,} {\left\langle {\mathbf{x}},~{\mathbf{A}\mathbf{x}} \right\rangle} &= (A+A^T)\mathbf{x} \end{align*}\]

as well as the adjoint formula \[\begin{align*} {\left\langle {A\mathbf{x}},~{\mathbf{x}} \right\rangle} = {\left\langle {\mathbf{x}},~{A^T \mathbf{x}} \right\rangle}. .\end{align*}\]

From these, by letting \(A=I\) we can derive \[\begin{align*} {\frac{\partial }{\partial \mathbf{x}}\,} {\left\lVert {\mathbf{x}} \right\rVert}^2 = {\frac{\partial }{\partial \mathbf{x}}\,} {\left\langle {\mathbf{x}},~{\mathbf{x}} \right\rangle} = 2\mathbf{x}\\ .\end{align*}\]

The derivation proceeds by solving the equation \[\begin{align*} {\frac{\partial }{\partial \mathbf{x}}\,} {\left\lVert {\mathbf{b} - A\mathbf{x}} \right\rVert}^2 = \mathbf{0}. .\end{align*}\]

Normal Forms

Every square matrix is similar to a matrix in Jordan canonical form.


The QR Decomposition

Gram-Schmidt is often computed to find an orthonormal basis for, say, the range of some matrix \(A\). With a small modification to this algorithm, we can write \(A = QR\) where \(R\) is upper triangular and \(Q\) has orthogonal columns.

Why is this useful? One reason is that this also allows for a particularly simple expression of least-squares solutions. If \(A=QR\), then \(R\) will be invertible, and a bit of algebraic manipulation will show that \[\begin{align*} \tilde{\mathbf{x}} = R^{-1}Q^T\mathbf{b}. .\end{align*}\]

How does it work? You simply perform Gram-Schmidt to obtain \(\left\{{\mathbf{u}_i}\right\}\), then \[\begin{align*}Q = [\mathbf{u}_1, \mathbf{u}_2, \cdots ].\end{align*}\]

The matrix \(R\) can then be written as

\[\begin{align*} r_{ij} = \begin{cases} {\left\langle {\mathbf{u}_i},~{\mathbf{x}_j} \right\rangle}, & i\leq j, \\ 0, & \text{else}. \end{cases} \end{align*}\]

Explicitly, this yields the matrix \[\begin{align*} R = \begin{bmatrix} {\left\langle {\mathbf{u}_1},~{\mathbf{x}_1} \right\rangle} & {\left\langle {\mathbf{u}_1},~{\mathbf{x}_2} \right\rangle} & {\left\langle {\mathbf{u}_1},~{\mathbf{x}_3} \right\rangle} & \cdots & \\ 0 & {\left\langle {\mathbf{u}_2},~{\mathbf{x}_2} \right\rangle} & {\left\langle {\mathbf{u}_2},~{\mathbf{x}_3} \right\rangle} & \cdots & \\ 0 & 0 & {\left\langle {\mathbf{u}_3},~{\mathbf{x}_3} \right\rangle} & \cdots & \\ \vdots & \vdots & \vdots & \ddots \\ \end{bmatrix} \end{align*}\]

Appendix: Lists of things to know

Textbook: Leon, Linear Algebra with Applications



Lower-division review

Things to compute

Things to prove

Ordinary Differential Equations

Techniques Overview

\[\begin{align*} p(y)y' = q(x) && \hspace{10em} \text{separable} \\ \\ y'+p(x)y = q(x) && \text{integrating factor} \\ \\ y' = f(x,y), f(tx,ty) = f(x,y) && y = xV(x)\text{ COV reduces to separable} \\ \\ y' +p(x)y = q(x)y^n && \text{Bernoulli, divide by $y^n$ and COV $u = y^{1-n}$} \\ \\ M(x,y)dx + N(x,y)dy = 0 && M_y = N_x: \phi(x,y) = c (\phi_x =M, \phi_y = N) \\ \\ P(D)y = f(x,y) && x^ke^{rx} \text{ for each root } \end{align*}\]

Where \(e^{zx}\) yields \(e^{ax}\cos bx, e^{ax}\sin bx\)

Types of Equations

Linear Homogeneous

General form: \[\begin{align*} y^{(n)} + c_{n-1} y^{(n-1)} + \cdots + c_2y'' + cy' + cy = 0 \\ p(D)y = \prod (D-r_i)^{m_i} y= 0 \end{align*}\] where \(p\) is a polynomial in the differential operator \(D\) with roots \(r_i\):

Example: by cases, second order equation of the form \[\begin{align*}ay'' + by' + cy = 0\end{align*}\] - Two distinct roots: \(c_1 e^{r_1 x} + c_2 e^{r_2 x}\) - One real root: \(c_1 e^{rx} + c_2 x e^{rx}\) - Complex conjugates \(\alpha \pm i \beta\): \(e^{\alpha x}(c_1 \cos \beta x + c_2 \sin \beta x)\)

Linear Inhomogeneous

General form: \[\begin{align*} y^{(n)} + c_{n-1} y^{(n-1)} + \cdots + c_2y'' + cy' + cy = F(x) \\ p(D)y = \prod (D-r_i)^{m_i} y= 0 \end{align*}\]

Then solutions are of the form \(y_c + y_p\), where \(y_c\) is the solution to the associated homogeneous system and \(y_p\) is a particular solution.

Methods of obtaining particular solutions

Undetermined Coefficients

Useful Annihilators: \[\begin{align*} &F(x) = p(x): & D^{\deg(p)+1} \\ &F(x) = p(x)e^{ax}: & (D-a)^{\deg(p)+1}\\ &F(x) = \cos(ax) + \sin(ax): & D^2 + a^2\\ &F(x) = e^{ax}(a_0\cos(bx) + b_0\sin(bx)): & (D-z)(D-{\overline{{z}}}) = D^2 -2aD + a^2 + b^2 \\ &F(x) = p(x)e^{ax}\cos(bx) + p(x)e^{ax}\cos(bx): & \left( (D-z)(D-{\overline{{z}}})\right)^{\max(\deg(p), \deg(q))+ 1} \end{align*}\]

Variation of Parameters

Reduction of Order

Systems of Differential Equations

General form: \[\begin{align*} \frac{\partial \mathbf{x}(t) }{\partial t} = A\mathbf{x}(t) + \mathbf{b}(t) \iff \mathbf{x}'(t) = A\mathbf{x}(t) + \mathbf{b}(t) \end{align*}\]

General solution to homogeneous equation: \[\begin{align*} c_1\mathbf{x_1}(t) + c_2\mathbf{x_2}(t)+ \cdots +c_n\mathbf{x_n}(t) = \mathbf{X}(t)\mathbf{c} \end{align*}\]

If \(A\) is a matrix of constants: \(\mathbf{x}(t) = e^{\lambda_i t}~\mathbf{v}_i\) is a solution for each eigenvalue/eigenvector pair \((\lambda_i, \mathbf{v}_i)\) - If \(A\) is defective, you’ll need generalized eigenvectors.

Inhomogeneous Equation: particular solutions given by \[\begin{align*} \mathbf{x}_p(t) = \mathbf{X}(t) \int^t \mathbf{X}^{-1}(s)\mathbf{b}(s) ~ds \end{align*}\]

Laplace Transforms

Definitions: \[\begin{align*} H_ { a } ( t ) \mathrel{\vcenter{:}}=\left\{ \begin{array} { l l } { 0 , } & { 0 \leq t < a } \\ { 1 , } & { t \geq a } \end{array} \right.\\ \delta(t): \int_{\mathbb{R}}\delta(t-a)f(t)~dt &= f(a),\quad \int_{\mathbb{R}}\delta(t-a)~dt = 1\\ (f \ast g )(t) &= \int_0^t f(t-s)g(s)~ds \\ L[f(t)] &= L[f] =\int_0^\infty e^{-st}f(t)dt = F(s) .\end{align*}\] Useful property: for \(a\leq b\), \(H_a(t) - H_b(t) = \indic{[a,b]}\). \[\begin{align*} t^n, n\in{\mathbb{N}}\quad&\iff &\frac{n!}{s^{n+1}},\quad &s > 0 \\ t^{-\frac{1}{2}} \quad&\iff &\sqrt{\pi} s^{-\frac{1}{2}}\quad & s>0\\ e^{at} \quad&\iff &\frac{1}{s-a},\quad &s > a \\ \cos(bt) \quad&\iff &\frac{s}{s^2+b^2},\quad &s>0 \\ \sin(bt) \quad&\iff &\frac{b}{s^2+b^2},\quad &s>0 \\ \cosh(bt) \quad&\iff &\frac{s}{s^2 - b^2},\quad &? \\ \sinh(bt) \quad&\iff &\frac{b}{s^2-b^2},\quad &? \\ \delta(t-a) \quad&\iff &e^{-as} \quad& \\ H_a(t) \quad&\iff &s^{-1}e^{-as}\quad& \\ e^{at}f(t) \quad&\iff &F(s-a)\quad & \\ H_a(t)f(t-a) \quad&\iff &e^{-as}F(s)& \\ f'(t) \quad&\iff & sL(f) - f(0) & \\ f''(t) \quad&\iff &s^2L(f) -sf(0) - f'(0) &\\ f^{(n)}(t) \quad&\iff & s^nL(f) - \sum_{i=0}^{n-1} s^{n-1-i}f^{(i)}(0) & \\ f(t)g(t) \quad&\iff &F(s) \ast G(s)\quad & \end{align*}\]

\[\begin{align*} p(y)y' = q(x) & & \hspace{10em} \text{separable} \\ \\ y'+p(x)y = q(x) & & \text{integrating factor} \\ \\ y' = f(x,y), f(tx,ty) = f(x,y) & & y = xV(x)\text{ COV reduces to separable} \\ \\ y' +p(x)y = q(x)y^n & & \text{Bernoulli, divide by $y^n$ and COV $u = y^{1-n}$} \\ \\ M(x,y)dx + N(x,y)dy = 0 & & M_y = N_x: \phi(x,y) = c (\phi_x =M, \phi_y = N) \\ \\ P(D)y = f(x,y) & & x^ke^{rx} \text{ for each root } \end{align*}\]

\[\begin{align*} L[e^{at} f(t)] = \int_0^\infty e^{(a-s)}f(t)dt = F(s-a), .\end{align*}\]

The general technique for solving differential equations with Laplace Transforms: - Take the Laplace Transform of all terms on both sides. - Solve for \(L[y]\) in terms of \(s\). - Attempt an inverse Laplace Transformations - This may involve partial fraction decomposition, completing the square, and splitting numerators to match terms with known inverse transformations.

Systems of Differential Equations

For a collection of \(n\) functions \(f_i: {\mathbb{R}}^n \to {\mathbb{R}}\), define the \(n\times 1\) column vector \[\begin{align*} W(f_i)(\mathbf{p}) \mathrel{\vcenter{:}}= \begin{bmatrix} f_i(\mathbf{p}) \\ f_i'(\mathbf{p}) \\ f_i''(\mathbf{p}) \\ \vdots \\ f^{(n-1)}(\mathbf{p}) \end{bmatrix} .\end{align*}\]

The Wronskian of this collection is defined as \[\begin{align*} W(f_1, \cdots, f_n)(\mathbf{p}) \mathrel{\vcenter{:}}= \det \begin{bmatrix} \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ W(f_1)(\mathbf{p}) & W(f_2)(\mathbf{p}) & \cdots & W(f_n)(\mathbf{p})\\ \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ \end{bmatrix} .\end{align*}\]

A set of functions \(\left\{{f_i}\right\}\) is linearly independent on \(I \iff \exists x_0 \in I: W(x_0) \neq 0\).

\(W \equiv 0\) on \(I\) does not imply that \(\left\{{f_i}\right\}\) is linearly dependent! Counterexample: \(\left\{{x, x+x^2, 2x-x^2}\right\}\) where \(W \equiv 0\) but \(x+x^2 = 3(x) + (2x-x^2)\) is a linear combination of the other two functions.

Linear Equations of Order \(n\)

The standard form of such equations is \[\begin{align*} y^{(n)} + a_1y^{(n-1)} + a_2y^{(n-2)} + \cdots +a_ny'' + a_{n-1}y' + y = F(x). \end{align*}\]

All solutions will be the sum of the solution to the associated homogeneous equation and a single particular solution.

In the homogeneous case, examine the discriminant of the characteristic polynomial. Three cases arise:

That is, every real root contributes a term of \(ce^{rx}\), while a multiplicity of \(m\) multiplies the solution by a polynomial in \(x\) of degree \(m-1\).

Every pair of complex roots contributes a term \(ce^r(a\cos \omega x + b\sin \omega x)\), where \(r\) is the real part of the roots and \(\omega\) is the complex part.

In the nonhomogeneous case, assume a solution in the most general form of \(F(x)\), and substitute it into the equation to solve for constant terms. For example,

Use to reduce a nonhomogeneous equation to a homogeneous one as a polynomial in the operator \(D\).

\(F(x)\) of the form \(e^{ax}sin(kx)\) can be rewritten as \(e^{(a+ki)x}\)


To Sort

Big List of Notation

\[\begin{align*} C(x) = && \left\{{g\in G : gxg^{-1} = x}\right\} && \subseteq G && \text{Centralizer} \\ C_G(x) = && \left\{{gxg^{-1} : g\in G}\right\} && \subseteq G && \text{Conjugacy Class} \\ G_x = && \left\{{g.x : x\in X}\right\} && \subseteq X && \text{Orbit} \\ x_0 = && \left\{{g\in G : g.x = x}\right\} && \subseteq G && \text{Stabilizer} \\ Z(G) = && \left\{{x\in G: \forall g\in G,~ gxg^{-1} = x}\right\} && \subseteq G && \text{Center} \\ \mathrm{Inn}(G) = && \left\{{\phi_g(x) = gxg^{-1} }\right\} && \subseteq {\operatorname{Aut}}(G) && \text{Inner Aut.} \\ \mathrm{Out}(G) = && {\operatorname{Aut}}(G) / \mathrm{Inn}(G) && \hookrightarrow{\operatorname{Aut}}(G) && \text{Outer Aut.} \\ N(H) = && \left\{{g\in G: gHg^{-1} = H}\right\} && \subseteq G && \text{Normalizer} \end{align*}\]

Group Theory

Notation: \(H < G\) a subgroup, \(N < G\) a normal subgroup, concatenation is a generic group operation.

Big Theorems

\[\begin{align*} \phi: G \to G’ \implies && \frac{G}{\ker{\phi}} \cong &~ \phi(G) \\ H {~\trianglelefteq~}G,~ K < G \implies && \frac{K}{H\cap K} \cong &~ \frac{HK}{H} \\ H,K {~\trianglelefteq~}G,~ K < H \implies && \frac{G/K}{H/K} \cong &~ \frac{G}{H} \end{align*}\]

Cyclic Groups

The Symmetric Group

Ring Theory

Number Theory

Notation and Basic Definitions

\[\begin{align*} (a, b) \mathrel{\vcenter{:}}=\gcd(a, b) && \text{the greatest common divisor} \\ {\mathbb{Z}}_n && \text{the ring of integers} \mod n \\ {\mathbb{Z}}_n^{\times}&& \text{the group of units}\mod n .\end{align*}\]

A function \(f:{\mathbb{Z}}\to {\mathbb{Z}}\) is said to be multiplicative iff \[\begin{align*} (a, b) = 1 \implies f(ab) = f(a) f(b) .\end{align*}\]

Big Theorems


Every \(n\in {\mathbb{Z}}\) can be written uniquely as \[\begin{align*} n = \prod_{i=1}^m p_i^{k_i} \end{align*}\] where the \(p_i\) are the \(m\) distinct prime divisors of \(n\).

Note that the number of distinct prime factors is \(m\), while the total number of factors is \(\prod_{i=1}^m(k_i + 1)\).


\[\begin{align*} a{~\Bigm|~}b \iff b \equiv 0 \mod a \iff \exists k \text{ such that } ak = b \end{align*}\]

\(\gcd, \operatorname{lcm}\)

\(\gcd(a, b)\) can be computed by taking prime factorizations of \(a\) and \(b\), intersecting the primes occurring, and taking the lowest exponent that appears. Dually, \(\operatorname{lcm}(a, b)\) can be computed by taking the union and the highest exponent.

\[\begin{align*} xy = \gcd{(x,y)}~\mathrm{lcm}{(x,y)} \end{align*}\]

If \(d\mathrel{\Big|}x\) and \(d\mathrel{\Big|}y\), then \[\begin{align*} \gcd(x,y) &= d\cdot \gcd\qty{ \frac x d, \frac y d} \\ \operatorname{lcm}(x,y) &= d\cdot \operatorname{lcm}\qty{ \frac x d, \frac y d} \end{align*}\]

\[\begin{align*} \gcd(x, y, z) &= \gcd(\gcd(x,y), z) \\ \gcd(x, y) &= \gcd(x\bmod y, y) \\ \gcd(x,y) &= \gcd(x-y, y) .\end{align*}\]

The Euclidean Algorithm

\(\gcd(a, b)\) can be computed via the Euclidean algorithm, taking the final bottom-right coefficient.

Modular Arithmetic

Generally concerned with the multiplicative group \(({\mathbb{Z}}_n, \times)\).

The system \[\begin{align*} \begin{array} { c } { x \equiv a _ { 1 } \mod m _ { 1 } } \\ { x \equiv a _ { 2 } \mod m _ { 2 } } \\ { \vdots } \\ { x \equiv a _ { r } \mod m _ { r } } \end{array} \end{align*}\]

has a unique solution \(x \mod \prod m_i \iff (m_i, m_j) = 1\) for each pair \(i,j\), given by \[\begin{align*} x = \sum_{j=1}^r a_j \frac{\prod_i m_i}{m_j} \left[ \frac{\prod_i m_i}{m_j} \right]^{-1}_{\mod m_j} .\end{align*}\]

\[\begin{align*} a^{\phi(p)} \equiv 1 \mod n .\end{align*}\]

\[\begin{align*} x^{p} &\equiv x \mod p \\ x^{p-1} &\equiv 1 \mod p \quad \text{ if } p \nmid a \end{align*}\]

Diophantine Equations

Consider \(ax + by = c\). This has solutions iff \(c = 0 \mod (a,b) \iff \gcd(a,b) \text{ divides } c\).


If \(x\equiv 0 \mod n\), then \(x\equiv 0 \mod p^k\) for all \(p^k\) appearing in the prime factorization of \(n\).

If there are factors of the modulus in the equation, peel them off with addition, using the fact that \(nk \equiv 0 \mod n\). \[\begin{align*} x &\equiv nk + r \mod n \\ &\equiv r \mod n \end{align*}\]

So take \(x=463, n = 4\), then use \(463 = 4\cdot 115 + 4\) to write \[\begin{align*} 463 &\equiv y \mod 4 \\ \implies 4\cdot 115 + 3 &\equiv y \mod 4 \\ \implies 3&\equiv y\mod 4 .\end{align*}\]

For any \(n\), \[\begin{align*} x^k \mod n \equiv (x^{k/d} \bmod n)^d \mod n .\end{align*}\]

\[\begin{align*} 2^{25} &\equiv (2^5 \mod 5)^5 \mod 5 \\ &\equiv 2^5 \mod 5 \\ &\equiv 2 \mod 5 \end{align*}\]

Make things easier with negatives! For example, \(\mod 5\), \[\begin{align*} 4^{25} &\equiv (-1)^{25} \mod 5\\ &\equiv (-1) \mod 5\\ &\equiv 4 \mod 5 \end{align*}\]


\[\begin{align*} xa = xb \mod n \implies a = b \mod \frac{n}{(x,n)} .\end{align*}\]

\[\begin{align*} x\in {\mathbb{Z}}_n^{\times}\iff (x, n) = 1 ,\end{align*}\] and thus \[\begin{align*} {\mathbb{Z}}_n^\times = \left\{{1\leq x \leq n : (x,n) = 1}\right\} \end{align*}\] and \({\left\lvert {{\mathbb{Z}}_n^\times} \right\rvert} = \phi(n)\).

One can reduce equations by dividing through by a unit. Pick any \(x\) such that \(x{~\Bigm|~}a\) and \(x{~\Bigm|~}b\) with \((x,n) = 1\), then \[\begin{align*} a =b \mod n \implies \frac a x = \frac b x \mod n .\end{align*}\]

The Totient Function

\[\begin{align*} \phi(n) = {\left\lvert {\left\{{1\leq x \leq n : (x,n) = 1}\right\}} \right\rvert} \end{align*}\]

\[\begin{align*} \phi(1) & = {\left\lvert {\left\{{1}\right\}} \right\rvert} = 1 \\ \phi(2) & = {\left\lvert {\left\{{1}\right\}} \right\rvert} = 1 \\ \phi(3) & = {\left\lvert {\left\{{1,2}\right\}} \right\rvert} = 2 \\ \phi(4) & = {\left\lvert {\left\{{1,3}\right\}} \right\rvert} = 2 \\ \phi(5) & = {\left\lvert {\left\{{1,2,3,4}\right\}} \right\rvert} = 4 \end{align*}\]

\[\begin{align*} \phi(p) & = p-1 \\ \phi(p^k) & = p^{k-1}(p-1) \\ \phi(n) &= n\prod_{i=1}^{?} \qty{1 - {1\over p_i}} \\ n &= \sum_{\tiny d{~\Bigm|~}n} \phi(d) \end{align*}\]

All numbers less than \(p\) are coprime to \(p\); there are \(p^k\) numbers less than \(p^k\) and the only numbers not coprime to \(p^k\) are multiples of \(p\), i.e. \(\left\{{p, p^2, \cdots p^{k-1}}\right\}\) of which there are \(k-1\), yielding \(p^k - p^{k-1}\)

Along with the fact that \(\phi\) is multiplicative, so \((p,q) = 1 \implies \phi(pq) = \phi(p)\phi(q)\), compute this for any \(n\) by taking the prime factorization.

With these properties, one can compute: \[\begin{align*} \phi(n) &= \phi\qty{ \prod_i p_i^{k_i}} \\ &= \prod_i p_i^{k_i-1}(p_i-1) \\ &= n \left(\frac{\prod_i (p_i-1)}{\prod_i p_i}\right) \\ &= n\prod_i \qty{ 1 - \frac{1}{p_i}} \end{align*}\]

\todo[inline]{Check and explain}

Quadratic Residues

\(x\) is a quadratic residue \(\bmod n\) iff there exists an \(a\) such that \(a^2 = x \mod n\).

In \({\mathbb{Z}}_p\), exactly half of the elements (even powers of generator) are quadratic residues.

\[\begin{align*} -1\text{ is a quadratic residue in } {\mathbb{Z}}_p \iff p = 1 \mod 4 .\end{align*}\]

Primality Tests

If \(n\) is prime, then \[\begin{align*} a^{n-1} = 1 \mod n \end{align*}\]

\(n\) is prime iff \[\begin{align*} x^2 = 1 \mod n \implies x = \pm 1 \end{align*}\]

Sequences in Metric Spaces

Every bounded sequence has a convergent subsequence.

In \({\mathbb{R}}^n, X\) is compact \(\iff X\) is closed and bounded.

Necessity of \({\mathbb{R}}^n\): \(X = ({\mathbb{Z}}, d(x,y) = 1)\) is closed, complete, bounded, but not compact since \(\left\{{1,2,\cdots}\right\}\) has no convergent subsequence

Converse holds iff bounded is replaced with totally bounded


Notation: \(\left\{{a_n}\right\}_{n\in{\mathbb{N}}}\) is a sequence, \(\displaystyle\sum_{i\in{\mathbb{N}}} a_i\) is a series.

Known Examples


A sequence \(\left\{{x_j}\right\}\) converges to \(L\) iff \[\begin{align*} \forall \varepsilon > 0,\, \exists N > 0 \text{ such that } \quad n\geq N \implies {\left\lvert {x_n - L} \right\rvert} < \varepsilon .\end{align*}\]

\[\begin{align*} b_n \leq a_n \leq c_n \text{ and } b_n,c_n \to L \implies a_n \to L \end{align*}\]

If \(\left\{{a_j}\right\}\) monotone and bounded, then \(a_j \to L = \lim\sup a_i < \infty\).

\({\left\lvert {a_m - a_n} \right\rvert} \to 0 \in {\mathbb{R}}\implies \left\{{a_i}\right\}\) converges.


Sums (“Series”)

A series is an function of the form \[\begin{align*} f(x) = \sum_{j=1}^\infty c_j x^j .\end{align*}\]

Known Examples

Conditionally Convergent

\[\begin{align*} \sum_{k=1}^\infty k^p &< \infty &&\iff p \leq 1 \\ \sum_{k=1}^\infty \frac{1}{k^p} &< \infty &&\iff p > 1 \\ \sum_{k=1}^\infty \frac{1}{k} &= \infty && \end{align*}\]


\[\begin{align*} \sum_{n=1}^\infty \frac{1}{n^2} & < \infty \\ \sum_{n=1}^\infty \frac{1}{n^3} & < \infty \\ \sum_{n=1}^\infty \frac{1}{n^\frac{3}{2}} & < \infty \\ \sum_{n=1}^\infty \frac{1}{n!} & = e \\ \sum_{n=1}^\infty \frac{1}{c^n} & = \frac{c}{c-1} \\ \sum_{n=1}^\infty (-1)^n \frac{1}{c^n} & = \frac{c}{c+1} \\ \sum_{n=1}^\infty (-1)^n \frac{1}{n} & = \ln 2 \end{align*}\]


\[\begin{align*} \sum_{n=1}^\infty \frac{1}{n} = \infty \\ \sum_{n=1}^\infty \frac{1}{\sqrt n} = \infty \end{align*}\]


Useful reference: http://math.hawaii.edu/~ralph/Classes/242/SeriesConvTests.pdf

\(a_n\to 0\) does not imply \(\sum a_n < \infty\). Counterexample: the harmonic series.

Absolute convergence \(\implies\) convergence

\[\begin{align*} \limsup a_i \to 0 \implies \sum a_i \text{ converges } \end{align*}\]

The Big Tests

\[\begin{align*} R =\lim_{n\to\infty} {\left\lvert {\frac{a_{n+1}}{a_n}} \right\rvert} \end{align*}\]

\[\begin{align*} R = \limsup_{n \to \infty} \sqrt[n]{{\left\lvert {a_n} \right\rvert}} \end{align*}\] - \(R < 1\): convergent - \(R > 1\): divergent - \(R = 1\): inconclusive

\[\begin{align*} f(n) = a_n \implies \sum a_n < \infty \iff \int_1^\infty f(x) dx < \infty \end{align*}\]

\[\begin{align*} \lim_{n\to\infty}\frac{a_n}{b_n} = L < \infty \implies \sum a_n < \infty \iff \sum b_n < \infty \end{align*}\]

\[\begin{align*} a_n \downarrow 0 \implies \sum (-1)^n a_n < \infty \end{align*}\]

\[\begin{align*} \sum_{n=1}^\infty {\left\lVert {f_n} \right\rVert}_\infty < \infty \implies \exists f\text{ such that } {\left\lVert { \sum_{n=1}^\infty f_n - f} \right\rVert}_\infty \to 0 \end{align*}\] In other words, the series converges uniformly.

Slogan: Convergence of the sup norms implies uniform convergence"

The \(M\) in the name comes from defining \(\sup\left\{{f_k(x)}\right\} \mathrel{\vcenter{:}}= M_n\) and requiring \(\sum {\left\lvert {M_n} \right\rvert} < \infty\).


Some Pattern Recognition:

Radius of Convergence

Use the fact that \[\begin{align*} \lim_{k\to\infty} {\left\lvert {\frac{a_{k+1}x^{k+1}}{a_kx^k}} \right\rvert} = {\left\lvert {x} \right\rvert}\lim_{k\to\infty} {\left\lvert {\frac{a_{k+1}}{a_k}} \right\rvert} < 1 \implies \sum a_k x^k < \infty ,\end{align*}\] so take \(L \mathrel{\vcenter{:}}=\lim_{k\to\infty} \frac{a_{k+1}}{a_k}\) and then obtain the radius as \[\begin{align*} R = \frac{1}{L} = \lim_{k\to\infty} {a_k \over a_{k+1}} \end{align*}\]

Real Analysis


A function is continuously differentiable iff \(f\) is differentiable and \(f'\) is continuous.


\[\begin{align*} f && \text{a functional }{\mathbb{R}}^n \to {\mathbb{R}}\\ \mathbf{f} && \text{a function } {\mathbb{R}}^n\to {\mathbb{R}}^m \\ A, E, U, V && \text{open sets} \\ A' && \text{the limit points of }A \\ \mkern 1.5mu\overline{\mkern-1.5muA\mkern-1.5mu}\mkern 1.5mu && \text{the closure of }A \\ A^\circ\mathrel{\vcenter{:}}= A\setminus A' && \text{the interior of }A \\ K && \text{a compact set} \\ \mathcal{R}_A && \text{the space of Riemann integral functions on }A \\ C^j(A) && \text{the space of }j\text{ times continuously differentiable functions }f: {\mathbb{R}}^n \to {\mathbb{R}}\\ \left\{{f_n}\right\} && \text{a sequence of functions} \\ \left\{{x_n}\right\} && \text{a sequence of real numbers}\\ f_n \to f && \text{pointwise convergence} \\ f_n \rightrightarrows f && \text{uniform convergence} \\ x_n \nearrow x && x_i\leq x_j \text{ and }x_j\text{ converges to }x \\ x_n \searrow x && x_i\geq x_j \text{ and }x_j\text{ converges to }x \\ \sum_{k\in {\mathbb{N}}} f_k && \text{a series}\\ D(f) && \text{the set of discontinuities of }f .\end{align*}\]

Big Ideas

Summary for GRE:

\[\begin{align*} f,g\text{ differentiable on } [a,b] \implies \exists c\in[a,b] : \left[f ( b ) - f ( a ) \right] g' ( c ) = \left[g ( b ) - g ( a )\right] f' ( c ) \end{align*}\]


Important Examples


Commuting Limits

Then consider the following possible ways to commute various limiting operations:

Does taking the derivative of the integral of a function always return the original function? \[\begin{align*} [\frac{\partial}{\partial x}, \int dx]:\qquad\qquad \frac{\partial}{\partial x}\int f(x, t)dt =_? \int \frac{\partial}{\partial x} f(x, t)dt\\ \text{} \end{align*}\]

Answer: Sort of (but possibly not).

Counterexample: \[\begin{align*} f(x) = \begin{cases} 1 & x > 0 \\ -1 & x \leq 0 \end{cases} \implies \int f \approx {\left\lvert {x} \right\rvert}, \end{align*}\] which is not differentiable. (This is remedied by the so-called “weak derivative”)

Sufficient Condition: If \(f\) is continuous, then both are always equal to \(f(x)\) by the FTC.

Is the derivative of a continuous function always continuous? \[\begin{align*} [\frac{\partial}{\partial x}, \lim_{x_i\to x}]:\qquad\qquad \lim_{x_i \to x} f'(x_n) =_? f'(\lim_{x_i\to x} x) \end{align*}\] Answer: No.

Counterexample: \[\begin{align*} f ( x ) = \left\{ \begin{array} { l l } { x ^ { 2 } \sin ( 1 / x ) } & { \text { if } x \neq 0 } \\ { 0 } & { \text { if } x = 0 } \end{array} \right. \implies f ^ { \prime } ( x ) = \left\{ \begin{array} { l l } { 2 x \sin \left( \frac { 1 } { x } \right) - \cos \left( \frac { 1 } { x } \right) } & { \text { if } x \neq 0 } \\ { 0 } & { \text { if } x = 0 } \end{array} \right. \end{align*}\] which is discontinuous at zero.

Sufficient Condition: There doesn’t seem to be a general one (which is perhaps why we study \(C^k\) functions).

Is the limit of a sequence of differentiable functions differentiable and the derivative of the limit?

\[\begin{align*} [\frac{\partial}{\partial x}, \lim_{f_n \to f}]:\qquad\qquad \lim_{f_n \to f}\frac{\partial}{\partial x}f_n(x) =_? \frac{\partial }{\partial x}\lim_{f_n \to f} f_n(x) \end{align*}\] Answer: Super no – even the uniform limit of differentiable functions need not be differentiable!

Counterexample: \(f_n(x) = \frac{\sin(nx)}{\sqrt{n}} \rightrightarrows f = 0\) but \(f_n' \not\to f' = 0\)

Sufficient Condition: \(f_n \rightrightarrows f\) and \(f_n \in C^1\).

Is the limit of a sequence of integrable functions integrable and the integral of the limit?

\[\begin{align*} [\int dx, \lim_{f_n \to f}](f):\qquad\qquad \lim_{f_n \to f}\int f_n(x) dx =_? \int \lim_{f_n \to f} f_n(x) dx \end{align*}\]

Answer: No.

Counterexample: Order \({\mathbb{Q}}\cap[0,1]\) as \(\left\{{q_i}\right\}_{i\in{\mathbb{N}}}\), then take \[\begin{align*} f_n(x) = \sum_{i=1}^n \indic{q_n} \to \indic{{{\mathbb{Q}}\cap[0,1]}} \end{align*}\] where each \(f_n\) integrates to zero (only finitely many discontinuities) but \(f\) is not Riemann-integrable.

Sufficient Condition: - \(f_n \rightrightarrows f\), or - \(f\) integrable and \(\exists M: \forall n, {\left\lvert {f_n} \right\rvert} < M\) (\(f_n\) uniformly bounded)

Is the integral of a continuous function also continuous?

\[\begin{align*} [\int dx, \lim_{x_i \to x}]:\qquad\qquad \lim_{x_i \to x} F(x_i) =_? F(\lim_{x_i \to x} x_i) \end{align*}\]

Answer: Yes.

Proof: \(|f(x)| < M\) on \(I\), so given \(c\) pick a sequence \(x\to c\). Then \[\begin{align*} {\left\lvert {f(x)} \right\rvert} < M \implies \left\vert \int_c^x f(t)dt \right\vert < \int_c^x M dt \implies {\left\lvert {F(x) - F(c)} \right\rvert} < M(b-a) \to 0 \end{align*}\]

Is the limit of a sequence of continuous functions also continuous?

\[\begin{align*} [\lim_{x_i \to x}, \lim_{f_n \to f}]: \qquad\qquad \lim_{f_n \to f}\lim_{x_i \to x} f(x_i) =_? \lim_{x_i \to x}\lim_{f_n \to f} f_n(x_i)\\ \text{}\\ \end{align*}\]

Answer: No.

Counterexample: \(f_n(x) = x^n \to \delta(1)\)

Sufficient Condition: \(f_n \rightrightarrows f\)

Does a sum of differentiable functions necessarily converge to a differentiable function?

\[\begin{align*} \left[\frac{\partial}{\partial x}, \sum_{f_n}\right]: \qquad\qquad \frac{\partial}{\partial x} \sum_{k=1}^\infty f_k =_? \sum_{k=1}^\infty \frac{\partial}{\partial x} f_k \\ \text{} \\ \text{}\\ \end{align*}\]

Answer: No.

Counterexample: \(f_n(x) = \frac{\sin(nx)}{\sqrt{n}} \rightrightarrows 0 \mathrel{\vcenter{:}}= f\), but \(f_n' = \sqrt{n}\cos(nx) \not\to 0 = f'\) (at, say, \(x=0\))

Sufficient Condition: When \(f_n \in C^1, \exists x_0: f_n(x_0) \to f(x_0)\) and \(\sum {\left\lVert {f_n'} \right\rVert}_\infty < \infty\) (continuously differentiable, converges at a point, and the derivatives absolutely converge)


\[\begin{align*} f\text{ continuous } \iff \lim_{x \to p} f(x) = f(p) \end{align*}\]

\[\begin{align*} f:(X, d_X) \to (Y, d_Y) \text{ continuous } \iff \forall \varepsilon,~ \exists \delta \mathrel{\Big|}~ d_X(x,y) < \delta \implies d_Y(f(x), f(y)) < \varepsilon \end{align*}\]

\[\begin{align*} f(x) = \sin\qty{ \frac{1}{x} } \implies 0\in D(f) \end{align*}\]

The Dirichlet function is nowhere continuous: \[\begin{align*} f(x) = \indic{{\mathbb{Q}}} \end{align*}\]

The following function continuous at infinitely many points and discontinuous at infinitely many points: \[\begin{align*} f(x) = \begin{cases} 0 & x\in{\mathbb{R}}\setminus{\mathbb{Q}}\\ \frac{1}{q} & x = \frac{p}{q} \in {\mathbb{Q}} \end{cases} \end{align*}\] Then \(f\) is discontinuous on \({\mathbb{Q}}\) and continuous on \({\mathbb{R}}\setminus{\mathbb{Q}}\).

\(f\) is continuous on \({\mathbb{Q}}\):

\(f\) is discontinuous on \({\mathbb{R}}\setminus{\mathbb{Q}}\):

There are no functions that are continuous on \({\mathbb{Q}}\) but discontinuous on \({\mathbb{R}}-{\mathbb{Q}}\)

A continuous function on a compact space attains its extrema.

Lipschitz Continuity


\[\begin{align*} f'(p) \mathrel{\vcenter{:}}=\frac{\partial f}{\partial x}(p) = \lim_{x\to p} \frac{f(x) - f(p)}{x-p} \end{align*}\]

Properties, strongest to weakest

\[\begin{align*} C^\infty \subsetneq C^k \subsetneq \text{ differentiable } \subsetneq C^0 \subset \mathcal{R}_K .\end{align*}\]

Proof that \(f\) differentiable \(\implies f \in C^0\): \[\begin{align*} f(x) - f(p) = \frac{f(x)-f(p)}{x-p}(x-p) \stackrel{\tiny\mbox{hypothesis}}{=} f'(p)(x-p) \stackrel{\tiny\mbox{$x\to p$}}\rightrightarrows 0 \end{align*}\]

Giant Table of Relations

Bold are assumed hypothesis, regular text is the strongest conclusion you can reach, strikeout denotes implications that aren’t necessarily true.

\[\begin{align*} f' && f && \therefore f && F \\ \hline \\ \cancel{\text{exists}} && \mathbf{continuous} && \text{K-integrable} && \text{exists} \\ \cancel{\text{continuous}} && \mathbf{differentiable} && \text{continuous} && \text{exists} \\ \cancel{\text{exists}} && \mathbf{integrable} && \cancel{\text{continuous}} && \text{differentiable} \\ \end{align*}\]

Explanation of items in table:


If \(F\) is a differentiable function on the interval \([a,b]\), and \(F'\) is bounded and continuous a.e., then \(F' \in L_R([a, b])\) and \[\begin{align*} \forall x\in [a,b]: \int_a^x F'(t)~dt=F(x)-F(a) \end{align*}\]

Suppose \(f\) bounded and continuous a.e. on \([a,b]\), and define \[\begin{align*} F(x) \mathrel{\vcenter{:}}=\int_a^x f(t)~dt \end{align*}\] Then \(F\) is absolutely continuous on \([a,b]\), and for \(p \in [a,b]\), \[\begin{align*} f \in C^0(p) \implies F \text{ differentiable at } p,~ F'(p) = f(p), \text{ and } F' \stackrel{\tiny\mbox{a.e}}{=} f. \end{align*}\]

The Dirichlet function is Lebesgue integrable but not Riemann integrable: \[\begin{align*} f(x) = \indic{x \in {\mathbb{Q}}} \end{align*}\]

List of Free Conclusions:


Sequences and Series of Functions

Define \[\begin{align*} s_n(x) \mathrel{\vcenter{:}}=\sum_{k=1}^n f_k(x) \end{align*}\] and \[\begin{align*} \sum_{k=1}^\infty f_k(x) \mathrel{\vcenter{:}}=\lim_{n\to\infty} s_n(x), \end{align*}\] which can converge pointwise, absolutely, uniformly, or not all.

If \(\limsup_{k\in {\mathbb{N}}} {\left\lvert {f_k(x)} \right\rvert} \neq 0\) then \(f_k\) is not convergent.

If \(f\) is injective, then \(f'\) is nonzero in some neighborhood of ???

Pointwise convergence

\[\begin{align*} f_n \to f = \lim_{n\to\infty} f_n .\end{align*}\] Summary: \[\begin{align*} \lim_{f_n \to f} \lim_{x_i \to x} f_n(x_i) \neq \lim_{x_i \to x} \lim_{f_n \to f} f_n(x_i) .\end{align*}\]

\[\begin{align*} \lim_{f_n \to f} \int_I f_n \neq \int_I \lim_{f_n \to f} f_n .\end{align*}\]

Pointwise convergence is strictly weaker than uniform convergence.

\(f_n(x) = x^n\) on \([0, 1]\) converges pointwise but not uniformly.

\[\begin{align*} f_n \text{ continuous} \centernot\implies f\mathrel{\vcenter{:}}=\lim_n f_n \text{ is continuous} .\end{align*}\]

Take \[\begin{align*} f_n(x) = x^n,\quad f_n(x) \to \indic[x = 1] .\end{align*}\]

\[\begin{align*} f_n \text{ differentiable} &\centernot\implies f'_n \text{ converges} \\ f'_n \text{ converges} &\not\implies \lim f'_n = f' .\end{align*}\]

Take \[\begin{align*} f_n(x) = \frac{1}{n}\sin(n^2 x) \to 0,&& \text{but } f'_n = n\cos(n^2 x) \text{ does not converge} .\end{align*}\]

\[\begin{align*} f_n\in \mathcal{R}_I \centernot\implies\lim_{f_n \to f} \int_I f_n \neq \int_I \lim_{f_n \to f} f_n .\end{align*}\]

May fail to converge to same value, take \[\begin{align*} f_n(x) = \frac{2n^2x}{(1+n^2x^2)^2} \to 0 && \text{but }\int_0^1 f_n = 1 - \frac{1}{n^2 + 1} \to 1\neq 0 .\end{align*}\]

Uniform Convergence

Notation: \[\begin{align*} f_n \rightrightarrows f= \lim_{n\to\infty} f_n \text{ and } \sum_{n=1}^\infty f_n \rightrightarrows S .\end{align*}\]

Summary: \[\begin{align*} \lim_{x_i \to x} \lim_{f_n \to f} f_n(x_i) = \lim_{f_n \to f} \lim_{x_i \to x} f_n(x_i) = \lim_{f_n \to f} f_n(\lim_{x_i \to x} x_i) .\end{align*}\]

\[\begin{align*} \lim_{f_n \to f} \int_I f_n = \int_I \lim_{f_n \to f} f_n .\end{align*}\]

\[\begin{align*} \sum_{n=1}^\infty \int_I f_n = \int_I \sum_{n=1}^\infty f_n .\end{align*}\]

“The uniform limit of a(n) \(x\) function is \(x\)”, for \(x \in\) {continuous, bounded}

\(\left\{{x_i}\right\} \to p \implies\) every subsequence also converges to \(p\).

Every convergent sequence in \(X\) is a Cauchy sequence.

The converse need not hold in general, but if \(X\) is complete, every Cauchy sequence converges. An example of a Cauchy sequence that doesn’t converge: take \(X={\mathbb{Q}}\) and set \(x_i = \pi\) truncated to \(i\) decimal places.

If any subsequence of a Cauchy sequence converges, the entire sequence converges.

\[\begin{align*} d(x,y) &\geq 0 && \text{Positive}\\ d(x,y) &= 0 \iff x = y && \text{Nondegenerate}\\ d(x,y) &= d(y,x) && \text{Symmetric}\\ d(x,y) &\leq d(x,p) + d(p,y) \quad \forall p && \text{Triangle Inequality} .\end{align*}\]




Open Set Characterization: Arbitrary unions and finite intersections of open sets are open.

Closed Set Characterization: Arbitrary intersections and finite unions of closed sets are closed.

The best source of examples and counterexamples is the open/closed unit interval in \(\mathbb{R}\). Always test against these first!

If \(f\) is a continuous function. the preimage of every open set is open and the preimage of every closed set is closed.

In \({\mathbb{R}}\), singleton sets and finite discrete sets are closed.

A singleton set can be written \[\begin{align*} \left\{{p_0}\right\} = (-\infty, p) \cup(p, \infty) .\end{align*}\] A finite discrete set \(\left\{{p_0}\right\}\), which wlog (by relabeling) can be assumed to satisfy \(p_0 < p_1 < \cdots\), can be written \[\begin{align*} \left\{{p_0, p_1, \cdots, p_n}\right\} = (-\infty, p_0) \cup(p_0, p_1) \cup\cdots \cup(p_n, \infty) .\end{align*}\]

This yields a good way to produce counterexamples to continuity.

In \(\mathbb{R}\), singletons are closed. This means any finite subset is closed, as a finite union of singleton sets!

If \(X\) is a compact metric space, then \(X\) is complete and bounded.

If \(X\) complete and \(X \subset Y\), then \(X\) closed in \(Y\).

The converse generally does not hold, and completeness is a necessary condition. Counterexample: \({\mathbb{Q}}\subset {\mathbb{Q}}\) is closed but \({\mathbb{Q}}\subset{\mathbb{R}}\) is not.

If \(X\) is compact, then \(Y \subset X \implies Y\) is compact \(\iff\) \(Y\) closed.

A topological space \(X\) is sequentially compact iff every sequence \(\left\{{x_n}\right\}\) has a subsequence converging to a point in \(X\).

If \(X\) is a metric space, \(X\) is compact iff \(X\) is sequentially compact.

Note that in general, neither form of compactness implies the other.


There are functions differentiable only at a single point. Example: \[\begin{align*} f(x) = \begin{cases} x^2 & x\in QQ\\ -x^2 & x\in {\mathbb{R}}\setminus{\mathbb{Q}} \end{cases} .\end{align*}\]

This is discontinuous everywhere except for \(x=0\), and you can compute \[\begin{align*} \lim_{h\to 0} {f(x+h) - f(x) \over h}\Big|_{x=0} = \lim_{h\to 0} \begin{cases} h & x\in {\mathbb{Q}}\\ -h & x\in {\mathbb{R}}\setminus{\mathbb{Q}} \end{cases} =0 .\end{align*}\]

The product of two non-differentiable functions can be differentiable: take \(f(x) = g(x) = {\left\lvert {x} \right\rvert}\) which are not differentiable at \(x=0\), then \(fg(x) = {\left\lvert {x} \right\rvert}^2\) is differentiable at \(x=0\).

A continuous function that is zero on a dense set \(A\subset X\) is identically zero.

Since \(A\) is dense, for any \(x\in X\setminus A\) take a sequence \(\left\{{x_n}\right\}\) in \(A\) converging to \(x\). Then \(0 = f(x_n) \to f(x)\) implies \(f(x) = 0\).

Point-Set Topology


The space \(\left\{{\frac{1}{n}}\right\}_{n\in {\mathbb{N}}}\).

List of properties preserved by continuous maps:

Checking if a map is homeomorphism:



\[\begin{align*} L^2(X) &= \left\{{f: X \to {\mathbb{R}}: \int_{\mathbb{R}}f(x) ~dx < \infty}\right\} &&\text{square integrable functions}\\ {\left\langle {g},~{f} \right\rangle}_{2} &= \int_{\mathbb{R}}g(x)f(x) ~dx &&\text{the } L^2 \text{ inner product}\\ {\left\lVert {f} \right\rVert}_2^2 &= {\left\langle {f},~{f} \right\rangle} = \int_{\mathbb{R}}f(x)^2 ~dx &&\text{norm}\\ E[{\,\cdot\,}] &= {\left\langle {{\,\cdot\,}},~{f} \right\rangle} &&\text{expectation}\\ (\tau_{p}f)(x) &= f(p- x) &&\text{translation}\\ (f \ast g)(p) &= \int_{\mathbb{R}}f(t)g(p-t)~dt = \int_{\mathbb{R}}f(t)(T_{p}g)(t) ~dt = {\left\langle {T_pg},~{f} \right\rangle} &&\text{convolution}\\ \end{align*}\]

For \((\Sigma, E, \mu)\) a probability space with sample space \(\Sigma\) and probability measure \(\mu\), a random variable is a function \(X: \Sigma \to {\mathbb{R}}\)

For any \(U \subset {\mathbb{R}}\), given by the relation \[\begin{align*} P(X \in U) = \int_U f(x) ~dx \\ \implies P(a \leq X \leq b) = \int_a^b f(x) ~dx \end{align*}\]

The antiderivative of the PDF \[\begin{align*} F(x) = P(X \leq x) = \int_{-\infty}^x f(x) ~dx .\end{align*}\] Yields \({\frac{\partial F}{\partial x}\,} = f(x)\)

\[\begin{align*} E[X] \mathrel{\vcenter{:}}={\left\langle {\text{id}},~{f} \right\rangle} = \int_{\mathbb{R}}x f(x) ~dx .\end{align*}\] Also denoted \(\mu_X\).

\[\begin{align*} E\left[\sum_{i\in{\mathbb{N}}} a_i X_i\right] = \sum_{i\in{\mathbb{N}}} a_i E[X_i] .\end{align*}\] Does not matter whether or not the \(X_i\) are independent.

\[\begin{align*} \mathrm{Var}(X) &= E[(X - E[X])^2] \\ &= \int (x - E[X])^2 f(x) ~dx \\ &= E[X^2] - E[X]^2 \\ &\mathrel{\vcenter{:}}=\sigma^2(X) \end{align*}\] where \(\sigma\) is the standard deviation. Can also defined as \({\left\langle {(\text{id}- {\left\langle {\text{id}},~{f} \right\rangle})^2},~{f} \right\rangle}\) Take the portion of the id function in the orthogonal complement of \(f\), squared, and project it back onto \(f\)?

\[\begin{align*} \mathrm{Var}(aX + b) &= a^2\mathrm{Var}(X) \\ \mathrm{Var}\qty{ \sum_{\mathbb{N}}X_i } &= \sum_i \mathrm{Var}(X_i) + 2 \sum_{i < j}\mathrm{Cov}(X_i, X_j) .\end{align*}\]

\[\begin{align*} \mathrm{Cov}(X,Y) &= E[(X-\mu_X)(Y-\mu_Y)] \\ &= E[XY] - E[X]E[Y] \end{align*}\]

\[\begin{align*} \mathrm{Cov}(X, X) &= \mathrm{Var}(X) \\ \mathrm{Cov}(aX, Y) &= a\mathrm{Cov}(X,Y) \\ \mathrm{Cov}(\sum_{{\mathbb{N}}} X_i, \sum_{\mathbb{N}}Y_j) &= \sum_i \sum_j\mathrm{Cov}(X_i, Y_j) \\ .\end{align*}\]

\[\begin{align*} k! \sim k^\frac{k+1}{2}e^{-k} \sqrt{2\pi} .\end{align*}\]

\[\begin{align*} P(X \geq a) \leq \frac 1 a E[X] \end{align*}\]

One-sided Markov: \[\begin{align*} P(X \in N_\varepsilon(\mu)) = 2\frac{\sigma^2}{\sigma^2 + a^2} .\end{align*}\]

\[\begin{align*} P({\left\lvert {X - \mu} \right\rvert} \geq a) \leq \left( \frac \sigma k \right)^2 \end{align*}\]

Apply Markov to the variable \((X-\mu)^2\) and \(a=k^2\)

For \(X_i\) i.i.d., \[\begin{align*} \lim_n \frac{\sum_{i=1}^n X_i - n\mu}{\sigma \sqrt n} \sim N(0, 1) .\end{align*}\]

\[\begin{align*} P(\frac{1}{n} \sum X_i \rightarrow \mu) = 1 .\end{align*}\]

For all \(t > 0\), \[\begin{align*} P(X \in N_\varepsilon(a)^c) \leq 2 e^{-at}M_X(t) \\ .\end{align*}\]

\[\begin{align*} E[f(X)] \geq f(E[X]) \end{align*}\]

\[\begin{align*} H(X) = - \sum p_i \ln p_i \end{align*}\]

Theory and Background