Differential Calculus

Big Theorems / Tools:

\[\begin{align*} \frac{\partial}{\partial x} \int_a^x f(t) dt = f(x) \\ \\ \end{align*}\]

\[\begin{align*} \frac{\partial}{\partial x} \int_{a(x)}^{b(x)} f(x, t) dt - \int_{a(x)}^{b(x)} \frac{\partial}{\partial x} f(x, t) dt &= f(x, t) \cdot \frac{\partial}{\partial x}(t) \bigg\rvert_{t=a(x)}^{t=b(x)} \\ \\ &= f(x, b(x))\cdot b'(x) - f(x, a(x))\cdot a'(x) \\ \\ \end{align*}\]

If \(f(x,t) = f(t)\) doesn’t depend on \(x\), then \({\frac{\partial f}{\partial x}\,} = 0\) and the second integral vanishes:

\[\begin{align*} \frac{\partial}{\partial x} \int_{a(x)}^{b(x)} f(t) dt &= f(b(x))\cdot b'(x) - f(a(x))\cdot a'(x) \end{align*}\]

Note that you can recover the original FTC by taking \[\begin{align*} a(x) &= c \\ b(x) &= x \\ f(x,t) &= f(t) .\end{align*}\]

\[\begin{align*} \frac{\partial}{\partial x} \int_{1}^{x} f(x, t) dt = \int_{1}^{x} \frac{\partial}{\partial x} f(x, t) dt + f(x, x) \end{align*}\]

Todo

\[\begin{align*} f \in C^0(I) &\implies \exists p\in I: f(b) - f(a) = f'(p)(b-a) \\ &\implies \exists p\in I: \int_a^b f(x)~dx = f(p)(b-a) .\end{align*}\]

If

\[\begin{align*} \lim_{x\to {\{\text{pt}\}}} f(x) = \lim_{x\to {\{\text{pt}\}}} g(x) \in \left\{{0, \pm \infty}\right\}, && \forall x \in I, g'(x) \neq 0, && \lim_{x\to{\{\text{pt}\}}} \frac{ f'(x)}{\ g'(x)} \text{ exists}, \\ \end{align*}\]

Then it is necessarily the case that \[\begin{align*} \lim _ { x \rightarrow {\{\text{pt}\}}} \frac { f ( x ) } { g ( x ) } = \lim _ { x \rightarrow {\{\text{pt}\}}} \frac { f ^ { \prime } ( x ) } { g ^ { \prime } ( x ) }. \end{align*}\]

Note that this includes the following indeterminate forms: \[\begin{align*} \frac{0}{0}, \quad \frac{\infty}{\infty}, \quad 0 \cdot \infty, \quad 0^{0}, \quad \infty^{0}, \quad 1^{\infty}, \quad \infty-\infty .\end{align*}\]

For \(0\cdot \infty\), can rewrite as \({0 \over {1\over \infty}} = {0\over 0},\) or alternatively \({\infty \over {1\over 0}} = {\infty \over \infty}.\)

For \(1^\infty, \infty^0,\) and \(0^0\), set \[\begin{align*} L \mathrel{\vcenter{:}}=\lim f^g \implies \ln L = \lim g \ln(f) \end{align*}\] to recover \(\infty\cdot 0, 0 \cdot \infty,\) or \(0\cdot 0\).

\[\begin{align*} T(a, x) &= \sum _ { n = 0 } ^ { \infty } \frac { f ^ { ( n ) } ( a ) } { n ! } ( x - a ) ^ { n } \\ &= f ( a ) + f'(a)( x - a ) + \frac { 1 } { 2 }f ^ { \prime \prime } ( a ) ( x - a ) ^ { 2 } \\ & \quad \quad + \frac { 1} { 6 } f ^ { \prime \prime \prime } ( a ) ( x - a ) ^ { 3 } + \frac{1}{24}f^{(4)}(a)(x-a)^4 + ~\cdots \end{align*}\] There is a bound on the error: \[\begin{align*} {\left\lvert {f(x) - T_k(a,x)} \right\rvert} \leq {\left\lvert {\frac{f^{(k+1)}(a)}{(k+1)!}} \right\rvert} \end{align*}\] where \(T_k(a, x) = \sum _ { n = 0 } ^ { k } \frac { f ^ { ( n ) } ( a ) } { n ! } ( x - a ) ^ { n }\) is the \(k\)th truncation.

Approximating change: \(\Delta y \approx f'(x) \Delta x\)

Limits

Tools for finding limits

How to find \(\lim_{x\to a} f(x)\)in order of difficulty:

Be careful: limits may not exist!! Example \(:\lim_{x\to 0} \frac{1}{x} \neq 0\).

Asymptotes

\[\begin{align*} f(x) = \frac{p(x)}{q(x)} = r(x) + \frac{s(x)}{t(x)} \sim r(x) \end{align*}\]

Recurrences

Derivatives

\[\begin{align*} {\frac{\partial }{\partial x}\,}(f\circ g) = (f' \circ g) \cdot g' \end{align*}\]

\[\begin{align*} {\frac{\partial }{\partial x}\,} f\cdot g =f'\cdot g + g' \cdot f \end{align*}\]

\[\begin{align*} {\frac{\partial }{\partial x}\,} \frac{f(x)}{g(x)} = \frac{f'g - g'f}{g^2} \end{align*}\]

Mnemonic: Low d-high minus high d-low

\[\begin{align*} {\frac{\partial f^{-1}}{\partial x}\,}(f(x_0)) = \left( {\frac{\partial f}{\partial x}\,} \right)^{-1}(x_0) = 1/f'(x_0) \end{align*}\]

Implicit Differentiation

\[\begin{align*} (f(x))' = f'(x)~dx, (f(y))' = f'(y)~dy \end{align*}\] - Often able to solve for \({\frac{\partial y}{\partial x}\,}\) this way.

General series of steps: want to know some unknown rate \(y_t\)

Example: ladder sliding down wall

Integral Calculus

Average Values

\[\begin{align*} \mu_f = \frac{1}{b-a}\int_a^b f(t) dt \end{align*}\]

Apply MVT to \(F(x)\).

Area Between Curves

Area in polar coordinates: \[\begin{align*} A = \int_{r_1}^{r_2} \frac{1}{2}r^2(\theta) ~d\theta \end{align*}\]

Solids of Revolution

\[\begin{align*} \text{Disks} && A = \int \pi r(t)^2 ~dt \\ \text{Cylinders} && A = \int 2\pi r(t)h(t) ~dt .\end{align*}\]

Arc Lengths

\[\begin{align*} L &= \int ~ds && ds = \sqrt{dx^2 + dy^2} \\ &= \int_{x_0}^{x_1}\sqrt{1 + {\frac{\partial y}{\partial x}\,}}~dx \\ &= \int_{y_0}^{y_1}\sqrt{{\frac{\partial x}{\partial y}\,} + 1}~dy \end{align*}\]

\[\begin{align*} SA = \int 2 \pi r(x) ~ds \end{align*}\]

Center of Mass

Given a density \(\rho(\mathbf x)\) of an object \(R\), the \(x_i\) coordinate is given by \[\begin{align*} x_i = \frac {\displaystyle\int_R x_i\rho(x) ~dx} {\displaystyle\int_R \rho(x)~dx} \end{align*}\]

Big List of Integration Techniques

Given \(f(x)\), we want to find an antiderivative \(F(x) = \int f\) satisfying \(\frac{\partial}{\partial x}F(x) = f(x)\)

Integration by Parts:

The standard form: \[\begin{align*} \int u dv = uv - \int v du \end{align*}\]

Shoelace Method

Derivatives Integrals Signs Result
\(u\) \(v\) NA NA
\(u'\) \(\int v\) \(+\) \(u\int v\)
\(u''\) \(\int\int v\) \(-\) \(-u'\int\int v\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)

Fill out until one column is zero (alternate signs). Get the result column by multiplying diagonally, then sum down the column.

Differentiating under the integral

\[\begin{align*} \frac{\partial}{\partial x} \int_{a(x)}^{b(x)} f(x, t) dt - \int_{a(x)}^{b(x)} \frac{\partial}{\partial x} f(x, t) dt &= f(x, \cdot)\frac{\partial}{\partial x}(\cdot) \bigg\rvert_{a(x)}^{b(x)} \\ &= f(x, b(x))~b'(x) - f(x, a(x))~a'(x) \end{align*}\]

Let \(F(x)\) be an antiderivative and compute \(F'(x)\) using the chain rule.

\[\begin{align*} \int_0^{\pi/2} \frac{1}{\sin \theta}~d\theta = 1/2 \end{align*}\]

Partial Fractions

Trigonometric Substitution

Other small but useful facts: \[\begin{align*} \int_0^{2\pi} \sin \theta~d\theta = \int_0^{2\pi} \cos \theta~d\theta = 0 .\end{align*}\]

Optimization

Vector Calculus

Notation: \[\begin{align*} \mathbf{v}, \mathbf{a}, \cdots && \text{vectors in }{\mathbb{R}}^n \\ \mathbf{R}, \mathbf{A}, \cdots && \text{matrices} \\ \mathbf{r}(t) && \text{A parameterized curve }\mathbf{r}: {\mathbb{R}}\to {\mathbb{R}}^n \\ \\ \widehat{\mathbf{v}} && {\mathbf{v} \over {\left\lVert {\mathbf{v}} \right\rVert}} .\end{align*}\]

Plane Geometry

\[\begin{align*} \mathbf{v} = [x, y] \in {\mathbb{R}}^2 \implies m = \frac{y}{x} .\end{align*}\]

\[\begin{align*} \mathbf{R}_\theta = \left[ \begin{array} { l l } { \cos \theta } & { - \sin \theta } \\ { \sin \theta } & { \cos \theta } \end{array} \right] \implies \mathbf{R}_{\frac{\pi}{2}} = \left[ \begin{array} { l l } { 0 } & { - 1 } \\ { 1 } & { 0 } \end{array}\right] .\end{align*}\]

\[\begin{align*} \mathbf{R}_{\frac{\pi}{2}} \mathbf{x} \mathrel{\vcenter{:}}= \mathbf{R}_{\frac{\pi}{2}} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} -y \\ x \end{bmatrix} \in \mathbf{\mathbb{R}}\mathbf{x}^\perp .\end{align*}\] Thus if a planar line is defined by the span of \({\left[ {x, y} \right]}\) and a slope of \(m = y/x\), a normal vector is given by the span of \({\left[ {-y, x} \right]}\) of slope \(-{1 \over m} = -x/y\).

Given \(\mathbf{v}\), the rotated vector \(\mathbf{R}_{\frac{\pi}{2}} \mathbf{v}\) is orthogonal to \(\mathbf{v}\), so this can be used to obtain normals and other orthogonal vectors in the plane.

There is a direct way to come up with one orthogonal vector to any given vector: \[\begin{align*} \mathbf{v} = [a,b,c] \implies \mathbf{y} \mathrel{\vcenter{:}}= \begin{cases} \mathbf{[}-(b+c), a, a] & \mathbf{v} = [-1,-1,0], \\ [c,c, -(a+b)] & \text{else} \end{cases} \in {\mathbb{R}}\mathbf{v}^\perp .\end{align*}\]

Projections

For a subspace given by a single vector \(\mathbf{a}\): \[\begin{align*} \mathrm{proj}_\mathbf{a}( \textcolor{Aquamarine}{\textbf{x}} ) = {\left\langle {\textcolor{Aquamarine}{\textbf{x}} },~{\mathbf{a}} \right\rangle}\mathbf{\widehat{a}} \hspace{8em} \mathrm{proj}_{\mathbf{a}}^\perp(\textcolor{Aquamarine}{\textbf{x}}) = \textcolor{Aquamarine}{\textbf{x}} - \mathrm{proj}_\mathbf{a}(\textcolor{Aquamarine}{\textbf{x}}) = \textcolor{Aquamarine}{\textbf{x}} - {\left\langle {\textcolor{Aquamarine}{\textbf{x}}},~{\mathbf{a}} \right\rangle}\widehat{\mathbf{a}} \end{align*}\]

In general, for a subspace \(\operatorname{colspace}(A) = \left\{{\mathbf{a}_1, \cdots \mathbf{a}_n}\right\}\),

\[\begin{align*} \mathrm{proj}_A(\textcolor{Aquamarine}{\textbf{x}}) = \sum_{i=1}^n {\left\langle {\textcolor{Aquamarine}{\textbf{x}} },~{\mathbf{a}_i} \right\rangle}\mathbf{\widehat{a}_i} = A(A^T A)^{-1}A^T\textcolor{Aquamarine}{\textbf{x}} \end{align*}\]

Lines

\[\begin{align*} \text{General Equation} && Ax + By + C = 0 \\ \\ \text{Parametric Equation} && \mathbf{r}(t) = t\mathbf{x} + \mathbf{b} .\end{align*}\]

Characterized by an equation in inner products: \[\begin{align*} \mathbf{y} \in L \iff {\left\langle {\mathbf{y}},~{\mathbf{n}} \right\rangle} = 0 \end{align*}\]

Given \(\mathbf{p}_0, \mathbf{p}_1\), take \(\mathbf{x} = \mathbf{p}_1 - \mathbf{p}_0\) and \(\mathbf{b} = \mathbf{p}_i\) for either \(i\): \[\begin{align*} \mathbf{r}(t) = t(\mathbf{p}_1 - \mathbf{p}_0) + \mathbf{p}_0 && = t\mathbf{p}_1 + (1-t) \mathbf{p}_0 .\end{align*}\]

If a line \(L\) is given by \[\begin{align*} \mathbf{r}(t) = t {\left[ {x_1, x_2, x_3} \right]} + {\left[ {p_1, p_2, p_3} \right]} ,\end{align*}\] then \[\begin{align*} (x, y, z) \in L \iff \frac{x-p_1}{x_1} = \frac{y-p_{2}}{x_2} = \frac{z-p_{3}}{x_3} .\end{align*}\]

The symmetric equation of the line through \({\left[ {2,1,-3} \right]}\) and \({\left[ {1,4,-3} \right]}\) is given by \[\begin{align*} \frac{x-2}{1}=\frac{y+1}{-5}=\frac{z-3}{6} .\end{align*}\]

Tangent Lines / Planes

Key idea: just need a point and a normal vector, and the gradient is normal to level sets.

For any locus \(f(\mathbf{x}) = 0\), we have \[\begin{align*} \mathbf{x} \in T_f(\mathbf{p}) \implies {\left\langle {\nabla f(\mathbf{p})},~{\mathbf{x}-\mathbf{p}} \right\rangle} = 0 .\end{align*}\]

Normal Lines

Key idea: the gradient is normal.

To find a normal line, you just need a single point \(\mathbf{p}\) and a normal vector \(\mathbf{n}\); then \[\begin{align*} L = \left\{{\mathbf{x} \mathrel{\Big|}\mathbf{x} = \mathbf{p} + t\mathbf{v}}\right\} .\end{align*}\]

Planes

\[\begin{align*} \text{General Equation} && A x + B y + C z + D = 0 \\ \\ \text{Parametric Equation} &&\mathbf{y}(t,s) = t\mathbf{x}_1 + s\mathbf{x}_2 + \mathbf{b} \\ \\ .\end{align*}\]

Characterized by an equation in inner products: \[\begin{align*} \mathbf{y} \in P \iff {\left\langle {\mathbf{y} - \mathbf{p}_0},~{\mathbf{n}} \right\rangle} = 0 \end{align*}\]

Determined by a point \(\mathbf{p}_0\) and a normal vector \(\mathbf{n}\)

Given \(\mathbf{v}_0, \mathbf{v}_1\), set \(\mathbf{n} = \mathbf{v}_0 \times\mathbf{v}_1\).

Finding a Normal Vector

Distance from origin to plane

Distance from point to plane

Curves

\[\begin{align*} \mathbf{r}(t) = [x(t), y(t), z(t)] .\end{align*}\]

Tangent line to a curve

We have an equation for the tangent vector at each point: \[\begin{align*} \widehat{ \mathbf{T} }(t) = \widehat{\mathbf{r}'}(t) ,\end{align*}\] so we can write \[\begin{align*} \mathbf{L}_{T}(t) = \mathbf{r}(t_0) + t \widehat{ \mathbf{T}}(t_0) \mathrel{\vcenter{:}}=\mathbf{r}(t_0) + t \widehat{\mathbf{r}'}(t_0) .\end{align*}\]

Normal line to a curve

Special case: planar graphs of functions

Suppose \(y = f(x)\). Set \(g(x, y) = f(x) - y\), then \[\begin{align*} \nabla g = [f_x(x), -1]\implies m = -\frac{1}{f_x(x)} \end{align*}\]

Minimal Distances

Fix a point \(\mathbf{p}\). Key idea: find a subspace and project onto it.

Key equations: projection and orthogonal projection of \(\mathbf{b}\) onto \(\mathbf{a}\): \[\begin{align*} \mathrm{proj}_\mathbf{a}(\mathbf{b}) = {\left\langle {\mathbf{b}},~{\mathbf{a}} \right\rangle}\mathbf{\widehat{a}} \hspace{8em} \mathrm{proj}_{\mathbf{a}}^\perp(\mathbf{b}) = \mathbf{b} - \mathrm{proj}_\mathbf{a}(\mathbf{b}) = \mathbf{b} - {\left\langle {\mathbf{b}},~{\mathbf{a}} \right\rangle}\widehat{\mathbf{a}} \end{align*}\]

Point to plane

\[\begin{align*} d = {\left\lVert {\mathrm{proj}_{\mathbf{n}}(\mathbf{q} - \mathbf{p})} \right\rVert} = {\left\lVert {{\left\langle {\mathbf{q} - \mathbf{p}},~{\mathbf{n}} \right\rangle} \widehat{\mathbf{n}}} \right\rVert} = {\left\langle {\mathbf{q} - \mathbf{p}},~{\mathbf{n}} \right\rangle} .\end{align*}\]

Origin to plane

Special case: if \(\mathbf{p} = \mathbf{0}\), \[\begin{align*} d = {\left\lVert {\mathrm{proj}_{\mathbf{n}}(\mathbf{q})} \right\rVert} = {\left\lVert {{\left\langle {\mathbf{p}},~{\mathbf{n}} \right\rangle} \widehat{\mathbf{n}}} \right\rVert} = {\left\langle {\mathbf{p}},~{\mathbf{n}} \right\rangle}. .\end{align*}\]

Point to line

Point to curve

Line to line

Given \(\mathbf{r}_1(t) = \mathbf{p}_1 + t\mathbf{v}_2\) and \(\mathbf{r}_2(t) = \mathbf{p}_2 + t\mathbf{v}_2\), let \(d\) be the desired distance.

Surfaces

\[\begin{align*} S = \left\{{(x,y,z) \mathrel{\Big|}f(x,y, z) = 0}\right\} \hspace{10em} z = f(x,y) \end{align*}\]

Tangent plane to a surface

Surfaces of revolution

Multivariable Calculus

Given a function \(f: {\mathbb{R}}^n \to {\mathbb{R}}\), let \(S_k \mathrel{\vcenter{:}}=\left\{{\mathbf{p}\in {\mathbb{R}}^n ~{\text{s.t.}}~f(\mathbf{p}) = k}\right\}\) denote the level set for \(k\in {\mathbb{R}}\). Then \[\begin{align*} \nabla f(\mathbf{p}) \in S_k^\perp .\end{align*}\]

Notation

\[\begin{align*} \mathbf{v} &= [v_1, v_2, \cdots] && \text{a vector} \\ \\ \mathbf{e}_i &= [0, 0, \cdots, \overbrace{1}^{i \text{th term}}, \cdots, 0] && \text{the } i \text{th standard basis vector} \\ \\ \phi: {\mathbb{R}}^n &\to {\mathbb{R}} && \text{a functional on } {\mathbb{R}}^n\\ \phi(x_1, x_2, \cdots) &= \cdots && \\ \\ \mathbf{F}: {\mathbb{R}}^n &\to {\mathbb{R}}^n && \text{a multivariable function}\\ \mathbf{F}(x_1,x_2,\cdots) &= [\mathbf{F}_1(x_1, x_2, \cdots), \mathbf{F}_2(x_1, x_2, \cdots), \cdots, \mathbf{F}_n(x_1, x_2, \cdots)] \end{align*}\]

Partial Derivatives

For a functional \(f:{\mathbb{R}}^n\to {\mathbb{R}}\), the partial derivative of \(f\) with respect to \(x_i\) is \[\begin{align*} {\frac{\partial f}{\partial x_i}\,}(\mathbf p) \mathrel{\vcenter{:}}=\lim_{h\to 0}\frac{f(\mathbf p + h\mathbf e_i) - f(\mathbf p)}{h} \end{align*}\]

\[\begin{align*} f: {\mathbb{R}}^2 &\to {\mathbb{R}}\\ {\frac{\partial f}{\partial x}\,}(x_0,y_0) &= \lim_{h \to 0} \frac{f(x_0+h, y_0) - f(x_0,y_0)}{h} \end{align*}\]

General Derivatives

A function \(f: {\mathbb{R}}^n \to {\mathbb{R}}^m\) is differentiable iff there exists a linear transformation \(D_f: {\mathbb{R}}^n \to {\mathbb{R}}^m\) such that the following limit exists \[\begin{align*} \lim _ { \mathbf x \rightarrow \mathbf{p} } \frac { \left\| f (\mathbf x ) - f (\mathbf{p} ) - D_f (\mathbf x - \mathbf{p} ) \right\| } { \| \mathbf x - \mathbf{p} \| } = 0 .\end{align*}\]

\(D_f\) is the “best linear approximation” to \(f\).

When \(f\) is differentiable, \(D_f\) can be given in coordinates by
\[\begin{align*} (D_f)_{ij} = {\frac{\partial f_i}{\partial x_j}\,} \end{align*}\]

This yields the Jacobian of \(f\): \[\begin{align*} D_f(\mathbf{\mathbf}{p}) \begin{bmatrix} \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ \nabla f_1(\mathbf{p}) & \nabla f_2(\mathbf{p}) & \cdots & \nabla f_m(\mathbf{p}) \\ \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex} \end{bmatrix}^T = \left[ \begin{array} { c c c c } { \frac { \partial f _ { 1 } } { \partial x _ { 1 } } ( \mathbf{p} ) } & { \frac { \partial f _ { 1 } } { \partial x _ { 2 } } ( \mathbf{p} ) } & { \ldots } & { \frac { \partial f _ { 1 } } { \partial x _ { n } } ( \mathbf{p} ) } \\ { \frac { \partial f _ { 2 } } { \partial x _ { 1 } } ( \mathbf{p} ) } & { \frac { \partial f _ { 2 } } { \partial x _ { 2 } } ( \mathbf{p} ) } & { \dots } & { \frac { \partial f _ { 2 } } { \partial x _ { n } } ( \mathbf{p} ) } \\ { \vdots } & { \vdots } & { \ddots } & { \vdots } \\ { \frac { \partial f _ { m } } { \partial x _ { 1 } } ( \mathbf{p} ) } & { \frac { \partial f _ { m } } { \partial x _ { 2 } } ( \mathbf{p} ) } & { \cdots } & { \frac { \partial f _ { m } } { \partial x _ { n } } ( \mathbf{p} ) } \end{array} \right]. \end{align*}\]

This is equivalent to

For a function \(f: {\mathbb{R}}^n \to {\mathbb{R}}\), the Hessian is a generalization of the second derivative, and is given in coordinates by \[\begin{align*} (H_f)_{ij} = {\frac{\partial ^2f}{\partial x_i x_j}\,} \end{align*}\]

Explicitly, we have \[\begin{align*} H_f(\mathbf{p}) = \begin{bmatrix} \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ D \nabla f_1(\mathbf{p}) & D\nabla f_2(\mathbf{p}) & \cdots & D\nabla f_m(\mathbf{p}) \\ \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex} \end{bmatrix}^T = \left[ \begin{array} { c c c } { \frac { \partial ^ { 2 } f } { \partial x _ { 1 } \partial x _ { 1 } } ( \mathbf { a } ) } & { \dots } & { \frac { \partial ^ { 2 } f } { \partial x _ { 1 } \partial x _ { n } } ( \mathbf { a } ) } \\ { \vdots } & { \ddots } & { \vdots } \\ { \frac { \partial ^ { 2 } f } { \partial x _ { n } \partial x _ { 1 } } ( \mathbf { a } ) } & { \cdots } & { \frac { \partial ^ { 2 } f } { \partial x _ { n } \partial x _ { n } } ( \mathbf { a } ) } \end{array} \right]. \end{align*}\]

Mnemonic: make matrix with \(\nabla f\) as the columns, and then differentiate variables left to right.

The Chain Rule

Write out tree of dependent variables:

Then sum each possible path.

Let subscripts denote which variables are held constant, then \[\begin{align*} \left({\frac{\partial z}{\partial x}\,}\right)_y &= \left({\frac{\partial z}{\partial x}\,}\right)_{u,y,v} \\ & + \left({\frac{\partial z}{\partial v}\,}\right)_{x,y,u} \left({\frac{\partial v}{\partial x}\,}\right)_y \\ & + \left({\frac{\partial z}{\partial u}\,}\right)_{x,y,v} \left({\frac{\partial u}{\partial x}\,}\right)_{v,y} \\ & + \left({\frac{\partial z}{\partial u}\,}\right)_{x,y,v} \left({\frac{\partial u}{\partial v}\,}\right)_{x,y} \left({\frac{\partial v}{\partial x}\,}\right)_y \end{align*}\]

Approximation

Let \(z = f(x,y)\), then to approximate near \(\mathbf{p}_0 = {\left[ {x_0, y_0} \right]}\), \[\begin{align*} f(\textcolor{Aquamarine}{\textbf{x}}) &\approx f(\mathbf{p}) + \nabla f (\textcolor{Aquamarine}{\textbf{x}} - \mathbf{p}_0) \\ \implies f(x,y) &\approx f(\mathbf{p}) + f_x(\mathbf{p})(x-x_0) + f_y(\mathbf{p})(y-y_0) \\ .\end{align*}\]

Optimization

Classifying Critical Points

Critical points of \(f\) given by points \(\mathbf{p}\) such that the derivative vanishes: \[\begin{align*} \operatorname{crit}(f) = \left\{{\mathbf{p}\in {\mathbb{R}}^n ~{\text{s.t.}}~D_f({\mathbf p}) = 0}\right\} \end{align*}\]

. Compute \[\begin{align*} {\left\lvert {H_f(\mathbf p)} \right\rvert} \mathrel{\vcenter{:}}=\left| \begin{array} { l l } { f _ { x x } } & { f _ { x y } } \\ { f _ { y x } } & { f _ { y y } } \end{array} \right| ({ \mathbf p }) \end{align*}\] 2. Check by cases:

What’s really going on?

Lagrange Multipliers

The setup: \[\begin{align*} \text{Optimize } f(\mathbf x) &\quad \text{subject to } g(\mathbf x) = c \\ \implies \nabla f &= \lambda \nabla g \end{align*}\] 1. Use this formula to obtain a system of equations in the components of \(x\) and the parameter \(\lambda\).

  1. Use \(\lambda\) to obtain a relation involving only components of \(\mathbf{x}\).

  2. Substitute relations back into constraint to obtain a collection of critical points.

  3. Evaluate \(f\) at critical points to find max/min.

Change of Variables

For any \(f: {\mathbb{R}}^n \to {\mathbb{R}}^n\) and region \(R\), \[\begin{align*} \int _ { g ( R ) } f ( \mathbf { x } ) ~d V = \int _ { R } (f \circ g) ( \mathbf { x } ) \cdot {\left\lvert {D_g ( \mathbf { x })} \right\rvert} ~d V \end{align*}\]

Vector Calculus

Notation

\(R\) is a region, \(S\) is a surface, \(V\) is a solid.

\[\begin{align*} \oint _ { \partial S } \mathbf { F } \cdot d \mathbf { r } = \oint _ { \partial S } [\mathbf{F}_1, \mathbf{F}_2, \mathbf{F}_3] \cdot [dx, dy, dz] = \oint_{{\partial}S} \mathbf{F}_1dx + \mathbf{F}_2dy + \mathbf{F}_3dz \end{align*}\]

The main vector operators \[\begin{align*} \nabla: ({\mathbb{R}}^n \to {\mathbb{R}}) &\to ({\mathbb{R}}^n \to {\mathbb{R}}^n) \\ \phi &\mapsto \nabla \phi \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial \phi}{\partial x_i} ~\mathbf{e}_i \\ \\ \text{} \mathrm{div}(\mathbf{F}): ({\mathbb{R}}^n \to {\mathbb{R}}^n) &\to ({\mathbb{R}}^n \to {\mathbb{R}}) \\ \mathbf{F} &\mapsto \nabla \cdot \mathbf{F} \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial \mathbf{F}_i}{\partial x_i} \\ \\ \text{} \mathrm{curl}(\mathbf{F}): ({\mathbb{R}}^3 \to {\mathbb{R}}^3) &\to ({\mathbb{R}}^3 \to {\mathbb{R}}^3) \\ \mathbf{F} &\mapsto \nabla \times\mathbf{F} \\ \\ \text{} \end{align*}\] Some terminology: \[\begin{align*} \text{Scalar Field} && \phi:&~ X \to {\mathbb{R}}\\ \text{Vector Field} && \mathbf{F}:&~ X\to {\mathbb{R}}^n\\ \text{Gradient Field} && \mathbf{F}:&~ X \to {\mathbb{R}}^n \mathrel{\Big|}\exists \phi: X\to {\mathbb{R}}\mathrel{\Big|}\nabla \phi = F \end{align*}\]

\[\begin{align*} \mathbf x \cdot \mathbf y = {\left\langle {\mathbf x},~{\mathbf y} \right\rangle} = \sum_{i=1}^n {x_i y_i} = x_1y_1 + x_2y_2 + \cdots && \text{inner/dot product} \\ {\left\lVert {\mathbf x} \right\rVert} = \sqrt{{\left\langle {\mathbf x},~{\mathbf x} \right\rangle}} = \sqrt{\sum_{i=1}^n x_i^2} = \sqrt{x_1^2 + x_2^2 + \cdots} && \text{norm} \\ \mathbf a \times\mathbf b = \mathbf{\widehat{n}} {\left\lVert {\mathbf a} \right\rVert}{\left\lVert {\mathbf b} \right\rVert}\sin\theta_{\mathbf a,\mathbf b} = \left| \begin{array}{ccc} \mathbf{\widehat{x}} & \mathbf{\widehat{y}} & \mathbf{\widehat{z}} \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \end{array}\right| && \text{cross product} \\ \\ D_\mathbf{u}(\phi) = \nabla \phi \cdot \mathbf{\widehat{u}} && \text{directional derivative} \\ \\ \nabla \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial}{\partial x_i} \mathbf{e}_i = \left[\frac{\partial}{\partial x_1}, \frac{\partial}{\partial x_2}, \cdots, \frac{\partial}{\partial x_n}\right] && \text{del operator} \\ \\ \nabla \phi \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial \phi}{\partial x_i} ~\mathbf{e}_i = {\left[ { \frac{\partial \phi}{\partial x_1}, \frac{\partial \phi}{\partial x_2}, \cdots, \frac{\partial \phi}{\partial x_n}} \right]} && \text{gradient} \\ \\ \Delta \phi \mathrel{\vcenter{:}}=\nabla\cdot\nabla \phi \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial^2 \phi}{\partial x_i^2} = \frac{\partial^2 \phi}{\partial x_1^2} + \frac{\partial^2 \phi}{\partial x_2} + \cdots + \frac{\partial^2 \phi}{\partial x_n^2} && \text{Laplacian} \\ \\ \nabla \cdot \mathbf{F} \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial \mathbf{F}_i}{\partial x_i} = \frac{\partial \mathbf{F}_1}{\partial x_1} + \frac{\partial \mathbf{F}_2}{\partial x_2} + \cdots + \frac{\partial \mathbf{F}_n}{\partial x_n} && \text{divergence} \\ \\ \nabla \times \mathbf { F } = \left| \begin{array} { c c c } { \mathbf { e }_1 } & { \mathbf { e }_2 } & { \mathbf { e }_3 } \\ { \frac { \partial } { \partial x } } & { \frac { \partial } { \partial y } } & { \frac { \partial } { \partial z } } \\ { \mathbf{F} _ { 1 } } & { \mathbf{F} _ { 2 } } & { \mathbf{F} _ { 3 } } \end{array} \right| = [\mathbf{F}_{3y} - \mathbf{F}_{2z}, \mathbf{F}_{1z}- \mathbf{F}_{3x}, \mathbf{F}_{2x} -\mathbf{F}_{1y}] && \text{curl} \\ \iint _ { S } ( \nabla \times \mathbf { F } ) \cdot d \mathbf { S } = \iint _ { S } ( \nabla \times \mathbf { F } ) \cdot \mathbf { n } ~dS && \text{surface integral} \end{align*}\]

Big Theorems

Stokes’ and Consequences

\[\begin{align*} \oint _ { \partial S } \mathbf { F } \cdot d \mathbf { r } = \iint _ { S } ( \nabla \times \mathbf { F } ) \cdot d \mathbf { S } .\end{align*}\]

Note that if \(S\) is a closed surface, so \({\partial}S = \emptyset\), this integral vanishes.

\[\begin{align*} \oint _ { {\partial}R } ( L ~d x + M ~d y ) = \iint _ { R } \left( \frac { \partial M } { \partial x } - \frac { \partial L } { \partial y } \right) d x d y .\end{align*}\]

Recovering Green’s Theorem from Stokes’ Theorem:

Let \(\mathbf{F} = [L, M, 0]\), then \(\nabla\times\mathbf{F} = [0, 0, \frac{\partial M}{\partial x} - \frac{\partial L}{\partial y}]\)

\[\begin{align*} \iint_ { \partial V } \mathbf { F } \cdot d \mathbf { S } = \iiint _ { V } ( \nabla \cdot \mathbf { F } ) ~d V .\end{align*}\]

Directional Derivatives

\[\begin{align*} D_{\mathbf{v}} f(\mathbf{p}) \mathrel{\vcenter{:}}={\frac{\partial f}{\partial t}\,}(\mathbf{p} + t\mathbf{v}) \Big|_{t=0} .\end{align*}\]

Note that the directional derivative uses a normalized direction vector!

Suppose \(f:{\mathbb{R}}^n\to {\mathbb{R}}\) and \(\mathbf{v}\in {\mathbb{R}}^n\). Then \[\begin{align*} D_{\mathbf{v}}f(\mathbf{p}) = {\left\langle {\nabla f(\mathbf{p})},~{\mathbf{v}} \right\rangle} .\end{align*}\]

We first use the fact that we can find \(L\), the best linear approximation to \(f\): \[\begin{align*} L(\mathbf{x}) &\mathrel{\vcenter{:}}= f(\mathbf{p}) + D_f(\mathbf{p})(\mathbf{x} - \mathbf{p}) \\ \\ D_{\mathbf{v}}f(\mathbf{p}) &= D_{\mathbf{v}} L(\mathbf{p}) \\ &= \lim_{t\to 0} {L(\mathbf{p} + t\mathbf{v}) - L(\mathbf{p}) \over t}\\ &= \lim_{t\to 0} { f(\mathbf{p}) + D_f(\mathbf{p})(\mathbf{p} + t\mathbf{v} - \mathbf{p}) -\qty{f(\mathbf{p}) + D_f(\mathbf{p})(\mathbf{p} - \mathbf{p})} \over t }\\ &= \lim_{t\to 0} { D_f(\mathbf{p})(t\mathbf{v}) \over t} \\ &= D_f(\mathbf{p})\mathbf{v} \\ &\mathrel{\vcenter{:}}=\nabla f(\mathbf{p}) \cdot \mathbf{v} .\end{align*}\]

Computing Integrals

Changing Coordinates

Multivariable Chain Rule

Polar and Cylindrical Coordinates

\[\begin{align*} x = r\cos\theta \\ y = r\sin\theta \\ dV \mapsto r \quad dr~d\theta \end{align*}\]

Spherical Coordinates

\[\begin{align*} x = r\cos\theta = \rho\sin\phi\cos\theta \\ y = r\sin\theta = \rho\sin\phi\sin\theta \\ dV \mapsto r^2 \sin\phi \quad dr~d\phi~d\theta \end{align*}\]

Line Integrals

Curves

\[\begin{align*} \int_C f ~ds &\mathrel{\vcenter{:}}=\int_a^b (f\circ \mathbf{r})(t) ~{\left\lVert {\mathbf{r}'(t)} \right\rVert}~dt \\ &= \int_a^b f(x(t), y(t), z(t)) \sqrt{x_t^2 + y_t^2 + z_t^2} ~dt \end{align*}\]

Vector Fields

The function \(\phi\) can be found using the same method from ODEs.

\[\begin{align*} \int_a^b \mathbf F_1 ~dx + \mathbf F_2 ~dy + \cdots \mathrel{\vcenter{:}}=\int_C \mathbf F \cdot d\mathbf r \end{align*}\] in which case \([dx, dy, \cdots] \mathrel{\vcenter{:}}=[x_t, y_t, \cdots] = \mathbf r'(t)\).

Flux

\[\begin{align*} \iint_S \mathbf{F}\cdot d\mathbf{S} = \iint_S \mathbf{F}\cdot \mathbf{\widehat{n}} ~dS .\end{align*}\]

Area

Given \(R\) and \(f(x,y) = 0\), \[\begin{align*} A(R) = \oint _ { {\partial}R } x ~d y = - \oint _ { {\partial}R } y ~d x = \frac { 1 } { 2 } \oint _ { {\partial}R } - y ~d x + x ~d y . \end{align*}\]

Compute \[\begin{align*} \oint_{{\partial}R} x ~dy = - \oint_{{\partial}R} y ~dx \\ = \frac{1}{2} \oint_{{\partial}R} -y~dx + x~dy = \frac{1}{2} \iint_R 1 - (-1) ~dA =\iint_R 1 ~dA \end{align*}\]

Surface Integrals

Other Results

\(\nabla \cdot \mathbf{F} = 0 \not \implies \exists G:~ \mathbf{F} = \nabla\times G\). A counterexample

\[\begin{align*} \mathbf{F}(x,y,z) =\frac{1}{\sqrt{x^2+y^2+z^2}}[x, y, z]~,\quad S = S^2 \subset {\mathbb{R}}^3 \\ \implies \nabla \mathbf{F} = 0 \text{ but } \iint_{S^2}\mathbf{F}\cdot d\mathbf{S} = 4\pi \neq 0 \end{align*}\] Where by Stokes’ theorem, \[\begin{align*} \mathbf{F} = \nabla\times\mathbf{G}\implies \iint_{S^2} \mathbf{F} &= \iint_{S^2} \nabla\times\mathbf{G} \\ \\ &= \oint_{{\partial}S^2}\mathbf{G}~d\mathbf{r} && \text{by Stokes}\\ &= 0 \end{align*}\] since \({\partial}S^2 = \emptyset\).

Sufficient condition: if \(\mathbf{F}\) is everywhere \(C^1\), \[\begin{align*} \exists \mathbf{G}:~ \mathbf{F} = \nabla \times\mathbf{G} \iff \iint_S \mathbf{F}\cdot d\mathbf{S} = 0 \text{ for all closed surfaces }S .\end{align*}\]

Linear Algebra

The underlying field will be assumed to be \({\mathbb{R}}\) for this section.

Notation

\[\begin{align*} \operatorname{Mat}(m, n) && \text{the space of all } m\times n\text{ matrices} \\ T && \text{a linear map } {\mathbb{R}}^n \to {\mathbb{R}}^m \\ A\in \operatorname{Mat}(m, n)&& \text{an } m\times n \text{ matrix representing }T \\ A^t\in \operatorname{Mat}(n, m) && \text{an } n\times m \text{ transposed matrix} \\ \mathbf{a} && \text{a } 1\times n \text{ column vector} \\ \mathbf{a}^t && \text{an } n\times 1 \text{ row vector} \\ A = {\left[ {\mathbf{a}_1, \cdots, \mathbf{a}_n} \right]} && \text{a matrix formed with } \mathbf{a}_i \text{ as the columns} \\ V, W && \text{vector spaces} \\ |V|, \dim(W) && \text{dimensions of vector spaces} \\ \det(A) && \text{the determinant of }A \\ \begin{bmatrix} A &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& \mathbf{b} \end{bmatrix} \mathrel{\vcenter{:}}={\left[ {\mathbf{a}_1, \mathbf{a}_2, \cdots \mathbf{a}_n, \mathbf{b}} \right]} && \text{augmented matrices} \\ \begin{bmatrix} A &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& B \end{bmatrix} \mathrel{\vcenter{:}}={\left[ {\mathbf{a}_1, \cdots \mathbf{a}_n, \mathbf{b}_1, \cdots, \mathbf{b}_m} \right]} && \text{block matrices}\\ \operatorname{Spec}(A) && \text{the multiset of eigenvalues of } A \\ A\mathbf{x} = \mathbf{b} && \text{a system of linear equations} \\ r\mathrel{\vcenter{:}}=\operatorname{rank}(A) && \text{the rank of }A\\ r_b = \operatorname{rank}\qty{ \begin{bmatrix} A &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& \mathbf{b} \\ \end{bmatrix} } && \text{the rank of }A\text{ augmented by }\mathbf{b} .\end{align*}\]

Big Theorems

\[\begin{align*} {\left\lvert {\ker(A)} \right\rvert} + {\left\lvert {\operatorname{im}(A)} \right\rvert} = {\left\lvert {\operatorname{dom}(A)} \right\rvert} ,\end{align*}\] where \(\operatorname{nullspace}(A) = {\left\lvert {\operatorname{im}{A}} \right\rvert}, \operatorname{rank}(A) = {\left\lvert {\operatorname{im}(A)} \right\rvert},\) and \(n\) is the number of columns in the corresponding matrix.

Generalization: the following sequence is always exact: \[\begin{align*} 0 \to \ker(A) \xhookrightarrow{\text{id}} \operatorname{dom}(A) % \xrightarrow[]{A}\mathrel{\mkern-14mu}\rightarrow \operatorname{im}(A) \to 0 .\end{align*}\] Moreover, it always splits, so \(\operatorname{dom}A = \ker A \oplus \operatorname{im}A\) and thus \({\left\lvert {\operatorname{dom}(A)} \right\rvert} = {\left\lvert {\ker(A)} \right\rvert} + {\left\lvert {\operatorname{im}(A)} \right\rvert}\).

We also have \[\begin{align*} \dim(\operatorname{rowspace}(A)) = \dim(\operatorname{colspace}(A)) = \operatorname{rank}(A) .\end{align*}\]

Big List of Equivalent Properties

Let \(A\) be an \(m\times n\) matrix. TFAE: - \(A\) is invertible and has a unique inverse \(A^{-1}\) - \(A^T\) is invertible - \(\det(A) \neq 0\) - The linear system \(A\mathbf{x} = \mathbf{b}\) has a unique solution for every \(b\ \in {\mathbb{R}}^m\) - The homogeneous system \(A\mathbf{x} = 0\) has only the trivial solution \(\mathbf{x} = 0\) - \(\operatorname{rank}(A) = n\) - i.e. \(A\) is full rank - \(\mathrm{nullity}(A) \mathrel{\vcenter{:}}=\dim\mathrm{nullspace}(A) = 0\) - \(A = \prod_{i=1}^k E_i\) for some finite \(k\), where each \(E_i\) is an elementary matrix. - \(A\) is row-equivalent to the identity matrix \(I_n\) - \(A\) has exactly \(n\) pivots - The columns of \(A\) are a basis for \({\mathbb{R}}^n\) - i.e. \(\operatorname{colspace}(A) = {\mathbb{R}}^n\) - The rows of \(A\) are a basis for \({\mathbb{R}}^m\) - i.e. \(\mathrm{rowspace}(A) = {\mathbb{R}}^m\) - \(\left(\operatorname{colspace}(A)\right)^\perp= \left(\mathrm{rowspace}A\right)^\perp= \left\{{\mathbf{0}}\right\}\) - Zero is not an eigenvalue of \(A\). - \(A\) has \(n\) linearly independent eigenvectors - The rows of \(A\) are coplanar.

Similarly, by taking negations, TFAE:

Reformulated in terms of linear maps \(T\), TFAE: - \(T^{-1}: {\mathbb{R}}^m \to {\mathbb{R}}^n\) exists - \(\operatorname{im}(T) = {\mathbb{R}}^n\) - \(\ker(T) = 0\) - \(T\) is injective - \(T\) is surjective - \(T\) is an isomorphism - The system \(A\mathbf{x} = 0\) has infinitely many solutions

Vector Spaces

Linear Transformations

It is common to want to know the range and kernel of a specific linear transformation \(T\). \(T\) can be given in many ways, but a general strategy for deducing these properties involves:

Useful fact: if \(V\leq W\) is a subspace and \(\dim(V) \geq \dim(W)\), then \(V=W\).

If \(V\subseteq W\), then \(V\) is a subspace of \(W\) if the following hold:

\[\begin{align*} (1) && \mathbf{0}\in V \\ (2) && \mathbf{a}, \mathbf{b}\in V\implies t\mathbf{a} + \mathbf{b}\in V .\end{align*}\]

Linear Independence

Any set of two vectors \(\left\{{\mathbf{v}, \mathbf{w}}\right\}\) is linearly dependent \(\iff \exists \lambda :~\mathbf{v} = \lambda \mathbf{w}\), i.e. one is not a scalar multiple of the other.

Bases

A set \(S\) forms a basis for a vector space \(V\) iff

  1. \(S\) is a set of linearly independent vectors, so \(\sum \alpha_i \vec{s_i} = 0 \implies \alpha_i = 0\) for all \(i\).
  2. \(S\) spans \(V\), so \(\vec{v} \in V\) implies there exist \(\alpha_i\) such that \(\sum \alpha_i \vec{s_i} = \vec{v}\)

In this case, we define the dimension of \(V\) to be \({\left\lvert {S} \right\rvert}\).

The Inner Product

The point of this section is to show how an inner product can induce a notion of “angle”, which agrees with our intuition in Euclidean spaces such as \({\mathbb{R}}^n\), but can be extended to much less intuitive things, like spaces of functions.

The Euclidean inner product is defined as \[\begin{align*} {\left\langle {\mathbf{a}},~{\mathbf{b}} \right\rangle} = \sum_{i=1}^n a_i b_i = a_1b_1 + a_2b_2 + \cdots + a_nb_n .\end{align*}\]

Also sometimes written as \(\mathbf{a}^T\mathbf{b}\) or \(\mathbf{a} \cdot \mathbf{b}\).

Yields a norm \[\begin{align*} {\left\lVert {\mathbf{x}} \right\rVert} \mathrel{\vcenter{:}}=\sqrt{{\left\langle {\mathbf{x}},~{\mathbf{x}} \right\rangle}} \end{align*}\]

which has a useful alternative formulation \[\begin{align*} {\left\langle {\mathbf{x}},~{\mathbf{x}} \right\rangle} = {\left\lVert {\mathbf{x}} \right\rVert}^2 .\end{align*}\]

This leads to a notion of angle: \[\begin{align*} {\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle} = {\left\lVert {\mathbf{x}} \right\rVert} {\left\lVert {\mathbf{y}} \right\rVert} \cos\theta_{x,y} \implies \cos \theta_{x,y} \mathrel{\vcenter{:}}=\frac{{\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle}}{{\left\lVert {\mathbf{x}} \right\rVert} {\left\lVert {\mathbf{y}} \right\rVert}} = {\left\langle {\widehat{\mathbf{x}}},~{\widehat{\mathbf{y}}} \right\rangle} \end{align*}\] where \(\theta_{x,y}\) denotes the angle between the vectors \(\mathbf{x}\) and \(\mathbf{y}\).

Since \(\cos \theta=0\) exactly when \(\theta = \pm \frac \pi 2\), we can can declare two vectors to be orthogonal exactly in this case: \[\begin{align*} \mathbf{x} \in \mathbf{y}^\perp\iff {\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle} = 0 .\end{align*}\]

Note that this makes the zero vector orthogonal to everything.

Given a subspace \(S \subseteq V\), we define its orthogonal complement \[\begin{align*} S^\perp= \left\{{\mathbf{v}\in V {~\mathrel{\Big|}~}\forall \mathbf{s}\in S,~ {\left\langle {\mathbf{v}},~{\mathbf{s}} \right\rangle} = 0}\right\}. \end{align*}\]

Any choice of subspace \(S\subseteq V\) yields a decomposition \(V = S \oplus S^\perp\).

A useful formula is \[\begin{align*} {\left\lVert {\mathbf{x} + \mathbf{y}} \right\rVert}^2 = {\left\lVert {\mathbf{x}} \right\rVert}^2 + 2{\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle} + {\left\lVert {\mathbf{y}} \right\rVert}^2, .\end{align*}\]

When \(\mathbf{x}\in \mathbf{y}^\perp\), this reduces to \[\begin{align*} {\left\lVert {\mathbf{x} + \mathbf{y}} \right\rVert}^2 = {\left\lVert {\mathbf{x}} \right\rVert}^2 + {\left\lVert {\mathbf{y}} \right\rVert}^2 .\end{align*}\]

. Bilinearity: \[\begin{align*} {\left\langle {\sum_j \alpha_j \mathbf{a}_j},~{\sum_k \beta_k \mathbf{b}_k} \right\rangle} = \sum_j \sum_i \alpha_j \beta_i {\left\langle {\mathbf{a}_j},~{\mathbf{b}_i} \right\rangle}. \end{align*}\]

  1. Symmetry: \[\begin{align*} {\left\langle {\mathbf{a}},~{\mathbf{b}} \right\rangle} = {\left\langle {\mathbf{b}},~{\mathbf{a}} \right\rangle} \end{align*}\]

  2. Positivity: \[\begin{align*} \mathbf{a} \neq \mathbf{0} \implies {\left\langle {\mathbf{a}},~{\mathbf{a}} \right\rangle} > 0 \end{align*}\]

  3. Nondegeneracy: \[\begin{align*} \mathbf{a} = \mathbf{0} \iff {\left\langle {\mathbf{a}},~{\mathbf{a}} \right\rangle} = 0 \end{align*}\]

Gram-Schmidt Process

Extending a basis \(\left\{{\mathbf{x}_i}\right\}\) to an orthonormal basis \(\left\{{\mathbf{u}_i}\right\}\)

\[\begin{align*} \mathbf{u}_1 &= N(\mathbf{x_1}) \\ \mathbf{u}_2 &= N(\mathbf{x}_2 - {\left\langle {\mathbf{x}_2},~{\mathbf{u}_1} \right\rangle}\mathbf{u}_1)\\ \mathbf{u}_3 &= N(\mathbf{x}_3 - {\left\langle {\mathbf{x}_3},~{\mathbf{u}_1} \right\rangle}\mathbf{u}_1 - {\left\langle {\mathbf{x}_3},~{\mathbf{u}_2} \right\rangle}\mathbf{u}_2 ) \\ \vdots & \qquad \vdots \\ \mathbf{u}_k &= N(\mathbf{x}_k - \sum_{i=1}^{k-1} {\left\langle {\mathbf{x}_k},~{\mathbf{u}_i} \right\rangle}\mathbf{u}_i) \end{align*}\]

where \(N\) denotes normalizing the result.

In more detail

The general setup here is that we are given an orthogonal basis \(\left\{{\mathbf{x}_i}\right\}_{i=1}^n\) and we want to produce an orthonormal basis from them.

Why would we want such a thing? Recall that we often wanted to change from the standard basis \(\mathcal{E}\) to some different basis \(\mathcal{B} = \left\{{\mathbf{b}_1, \mathbf{b}_2, \cdots}\right\}\). We could form the change of basis matrix \(B = [\mathbf{b}_1, \mathbf{b}_2, \cdots]\) acts on vectors in the \(\mathcal{B}\) basis according to \[\begin{align*} B[\mathbf{x}]_\mathcal{B} = [\mathbf{x}]_{\mathcal{E}} .\end{align*}\]

But to change from \(\mathcal{E}\) to \(\mathcal{B}\) requires computing \(B^{-1}\), which acts on vectors in the standard basis according to \[\begin{align*} B^{-1}[\mathbf{x}]_\mathcal{E} = [\mathbf{x}]_{\mathcal{B}} .\end{align*}\]

If, on the other hand, the \(\mathbf{b}_i\) are orthonormal, then \(B^{-1} = B^T\), which is much easier to compute. We also obtain a rather simple formula for the coordinates of \(\mathbf{x}\) with respect to \(\mathcal B\). This follows because we can write \[\begin{align*} \mathbf{x} = \sum_{i=1}^n {\left\langle {\mathbf{x}},~{\mathbf{b}_i} \right\rangle} \mathbf{b}_i \mathrel{\vcenter{:}}=\sum_{i=1}^n c_i \mathbf{b}_i, .\end{align*}\]

and we find that \[\begin{align*} [\mathbf{x}]_\mathcal{B} = \mathbf{c} \mathrel{\vcenter{:}}=[c_1, c_2, \cdots, c_n]^T. .\end{align*}\]

This also allows us to simplify projection matrices. Supposing that \(A\) has orthonormal columns and letting \(S\) be the column space of \(A\), recall that the projection onto \(S\) is defined by \[\begin{align*} P_S = Q(Q^TQ)^{-1}Q^T .\end{align*}\]

Since \(Q\) has orthogonal columns and satisfies \(Q^TQ = I\), this simplifies to \[\begin{align*} P_S = QQ^T. .\end{align*}\]

The Algorithm

Given the orthogonal basis \(\left\{{\mathbf{x}_i}\right\}\), we form an orthonormal basis \(\left\{{\mathbf{u}_i}\right\}\) iteratively as follows.

First define \[\begin{align*} N: {\mathbb{R}}^n &\to S^{n-1} \\ \mathbf{x} &\mapsto \widehat{\mathbf{x}} \mathrel{\vcenter{:}}=\frac {\mathbf{x}} {{\left\lVert {\mathbf{x}} \right\rVert}} \end{align*}\]

which projects a vector onto the unit sphere in \({\mathbb{R}}^n\) by normalizing. Then,

\[\begin{align*} \mathbf{u}_1 &= N(\mathbf{x_1}) \\ \mathbf{u}_2 &= N(\mathbf{x}_2 - {\left\langle {\mathbf{x}_2},~{\mathbf{u}_1} \right\rangle}\mathbf{u}_1)\\ \mathbf{u}_3 &= N(\mathbf{x}_3 - {\left\langle {\mathbf{x}_3},~{\mathbf{u}_1} \right\rangle}\mathbf{u}_1 - {\left\langle {\mathbf{x}_3},~{\mathbf{u}_2} \right\rangle}\mathbf{u}_2 ) \\ \vdots & \qquad \vdots \\ \mathbf{u}_k &= N(\mathbf{x}_k - \sum_{i=1}^{k-1} {\left\langle {\mathbf{x}_k},~{\mathbf{u}_i} \right\rangle}\mathbf{u}_i) \end{align*}\]

In words, at each stage, we take one of the original vectors \(\mathbf{x}_i\), then subtract off its projections onto all of the \(\mathbf{u}_i\) we’ve created up until that point. This leaves us with only the component of \(\mathbf{x}_i\) that is orthogonal to the span of the previous \(\mathbf{u}_i\) we already have, and we then normalize each \(\mathbf{u}_i\) we obtain this way.

Alternative Explanation:

Given a basis \[\begin{align*} S = \left\{\mathbf{v_1, v_2, \cdots v_n}\right\}, \end{align*}\]

the Gram-Schmidt process produces a corresponding orthogonal basis \[\begin{align*} S' = \left\{\mathbf{u_1, u_2, \cdots u_n}\right\} \end{align*}\] that spans the same vector space as \(S\).

\(S'\) is found using the following pattern: \[\begin{align*} \mathbf{u_1} &= \mathbf{v_1} \\ \mathbf{u_2} &= \mathbf{v_2} - \text{proj}_{\mathbf{u_1}} \mathbf{v_2}\\ \mathbf{u_3} &= \mathbf{v_3} - \text{proj}_{\mathbf{u_1}} \mathbf{v_3} - \text{proj}_{\mathbf{u_2}} \mathbf{v_3}\\ \end{align*}\]

where \[\begin{align*} \text{proj}_{\mathbf{u}} \mathbf{v} = (\text{scal}_{\mathbf{u}} \mathbf{v})\frac{\mathbf{u}}{\mathbf{{\left\lVert {u} \right\rVert}}} = \frac{\langle \mathbf{v,u} \rangle}{{\left\lVert {\mathbf{u}} \right\rVert}}\frac{\mathbf{u}}{\mathbf{{\left\lVert {u} \right\rVert}}} = \frac{{\left\langle {\mathbf{v}},~{\mathbf{u}} \right\rangle}}{{\left\lVert {\mathbf{u}} \right\rVert}^2}\mathbf{u} \end{align*}\] is a vector defined as the

Image

The orthogonal set \(S'\) can then be transformed into an orthonormal set \(S''\) by simply dividing the vectors \(s\in S'\) by their magnitudes. The usual definition of a vector’s magnitude is \[\begin{align*} {\left\lVert {\mathbf{a}} \right\rVert} = \sqrt{{\left\langle {\mathbf{a}},~{\mathbf{a}} \right\rangle}} \text{ and } {\left\lVert {\mathbf{a}} \right\rVert}^2 = {\left\langle {\mathbf{a}},~{\mathbf{a}} \right\rangle} \end{align*}\]

As a final check, all vectors in \(S'\) should be orthogonal to each other, such that \[\begin{align*} {\left\langle {\mathbf{v_i}},~{\mathbf{v_j}} \right\rangle} = 0 \text{ when } i \neq j \end{align*}\]

and all vectors in \(S''\) should be orthonormal, such that \[\begin{align*} {\left\langle {\mathbf{v_i}},~{\mathbf{v_j}} \right\rangle} = \delta_{ij} \end{align*}\]

The Fundamental Subspaces Theorem

Given a matrix \(A \in \mathrm{Mat}(m, n)\), and noting that \[\begin{align*} A &: {\mathbb{R}}^n \to {\mathbb{R}}^m,\\ A^T &: {\mathbb{R}}^m \to {\mathbb{R}}^n \end{align*}\]

We have the following decompositions: \[\begin{align*} &{\mathbb{R}}^n &\cong \ker A &\oplus \operatorname{im}A^T &\cong \mathrm{nullspace}(A) &\oplus~ \mathrm{colspace}(A^T) \\ &{\mathbb{R}}^m &\cong \operatorname{im}A &\oplus \ker A^T &\cong \mathrm{colspace}(A) &\oplus~ \mathrm{nullspace}(A^T) \end{align*}\]

Computing change of basis matrices

Matrices

An \(m\times n\) matrix is a map from \(n\)-dimensional space to \(m\)-dimensional space. The number of rows tells you the dimension of the codomain, the number of columns tells you the dimension of the domain.

The space of matrices is not an integral domain! Counterexample: if \(A\) is singular and nonzero, there is some nonzero \(\mathbf{v}\) such that \(A \mathbf{v} = \mathbf{0}\). Then setting \(B = {\left[ {\mathbf{v}, \mathbf{v}, \cdots} \right]}\) yields \(AB = 0\) with \(A\neq 0, B\neq 0\).

The rank of a matrix \(A\) representing a linear transformation \(T\) is \(\dim \operatorname{colspace}(A)\), or equivalently \(\dim \operatorname{im}T\).

\(\operatorname{rank}(A)\) is equal to the number of nonzero rows in \(\operatorname{RREF}(A)\).

\[\begin{align*} \mathrm{Trace}(A) = \sum_{i=1}^m A_{ii} \end{align*}\]

The following are elementary row operations on a matrix:

If \(A = {\left[ {\mathbf{a}_1, \mathbf{a}_2, \cdots} \right]} \in \mathrm{Mat}(m, n)\) and \(B = {\left[ {\mathbf{b}_1, \mathbf{b}_2, \cdots} \right]} \in\mathrm{Mat}(n, p)\), then \[\begin{align*} C \mathrel{\vcenter{:}}= AB \implies c_{ij} = \sum_{k=1}^n a_{ik}b_{kj} = {\left\langle {\mathbf{a_i}},~{\mathbf{b_j}} \right\rangle} \end{align*}\] where \(1\leq i \leq m\) and \(1\leq j \leq p\). In words, each entry \(c_{ij}\) is obtained by dotting row \(i\) of \(A\) against column \(j\) of \(B\).

Systems of Linear Equations

A system of linear equations is consistent when it has at least one solution. The system is inconsistent when it has no solutions.

?

Homogeneous systems are always consistent, i.e. there is always at least one solution.

There are three possibilities for a system of linear equations:

  1. No solutions (inconsistent)
  2. One unique solution (consistent, square or tall matrices)
  3. Infinitely many solutions (consistent, underdetermined, square or wide matrices)

These possibilities can be check by considering \(r\mathrel{\vcenter{:}}=\operatorname{rank}(A)\):

Determinants

\[\begin{align*} \det{(A \mod p}) \mod p \equiv (\det{A}) \mod p \end{align*}\]

For \(2\times 2\) matrices, \[\begin{align*} A^{-1} = \left( \begin{array}{cc} a & b \\ c & d \end{array}\right)^{-1} = \frac{1}{\det{A}}\left( \begin{array}{cc} d & -b \\ -c & a \end{array}\right) \end{align*}\] In words, swap the main diagonal entries, and flip the signs on the off-diagonal.

Let \(A \in \mathrm{Mat}(m, n)\), then there is a function \[\begin{align*} \det: \operatorname{Mat}(m, m) &\to {\mathbb{R}}\\ A &\mapsto \det(A) \end{align*}\] satisfying the following properties:

TFAE:

Computing Determinants

Useful shortcuts:

The minor \(M_{ij}\) of \(A\in \operatorname{Mat}(n, n)\) is the determinant of the \((n-1) \times (n-1)\) matrix obtained by deleting the \(i\)th row and \(j\)th column from \(A\).

The cofactor \(C_{ij}\) is the scalar defined by \[\begin{align*} C_{ij} \mathrel{\vcenter{:}}=(-1)^{i+j} M_{ij} .\end{align*}\]

For any fixed \(i\), there is a formula \[\begin{align*} \det(A) = \sum_{j=1}^n a_{ij} C_{ij} .\end{align*}\]

Let \[\begin{align*} A = \left[\begin{array}{lll} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{array}\right] .\end{align*}\]

Then \[\begin{align*} \det A = 1 \cdot\left|\begin{array}{ll} 5 & 6 \\ 8 & 9 \end{array}\right|-2 \cdot\left|\begin{array}{ll} 4 & 6 \\ 7 & 9 \end{array}\right|+3 \cdot\left|\begin{array}{ll} 4 & 5 \\ 7 & 8 \end{array}\right| = 1 \cdot(-3)-2 \cdot(-6)+3 \cdot(-3) = 0 .\end{align*}\]

\(\det(A)\) can be computed by reducing \(A\) to \(\operatorname{RREF}(A)\) (which is upper triangular) and keeping track of the following effects:

Inverting a Matrix

Given a linear system \(A\mathbf{x} = \mathbf{b}\), writing \(\mathbf{x} = {\left[ {x_1, \cdots, x_n} \right]}\), there is a formula \[\begin{align*} x_i = \frac{\det(B_i)}{\det(A)} \end{align*}\] where \(B_i\) is \(A\) with the \(i\)th column deleted and replaced by \(\mathbf{b}\).

Under the equivalence relation of elementary row operations, there is an equivalence of augmented matrices: \[\begin{align*} \begin{bmatrix} A &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& I \end{bmatrix} \sim \begin{bmatrix} I &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& A^{-1} \end{bmatrix} \end{align*}\] where \(I\) is the \(n\times n\) identity matrix.

\[\begin{align*} A^{-1} = {1\over \det(A)} {\left[ {C_{ij}} \right]}^t .\end{align*}\] where \(C_{ij}\) is the cofactor() at position \(i,j\).1

\[\begin{align*} \left(\begin{array}{cc} a& b \\ c& d \end{array}\right)^{-1} = {1 \over a d - b c} \left(\begin{array}{rr} d & -b \\ -c & a \end{array}\right) \quad \text{ where } ad-bc \ne 0 \end{align*}\]

What’s the pattern?

  1. Always divide by determinant
  2. Swap the diagonals
  3. Hadamard product with checkerboard

\[\begin{align*} \begin{bmatrix} + & - \\ - & + \end{bmatrix} \end{align*}\]

\[\begin{align*} A^{-1} \mathrel{\vcenter{:}}= \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix} ^{-1} = {1 \over {\det A}} \begin{bmatrix} e i - f h & -(b i - c h) & b f - c e \\ -(d i - f g) &a i - c g &-(a f -c d) \\ d h - e g & -(a h - b g)& a e - b d \end{bmatrix} .\end{align*}\]

The pattern:

  1. Divide by determinant
  2. Each entry is determinant of submatrix of \(A\) with corresponding col/row deleted
  3. Hadamard product with checkerboard

\[\begin{align*} \begin{bmatrix} + & - & + \\ - & + & - \\ \ + & - & + \end{bmatrix} \end{align*}\]

  1. Transpose at the end!!

Bases for Spaces of a Matrix

Let \(A\in \operatorname{Mat}(m, n)\) represent a map \(T:{\mathbb{R}}^n\to {\mathbb{R}}^m\).

?

\[\begin{align*} \dim \operatorname{rowspace}(A) = \dim \operatorname{colspace}(A) .\end{align*}\]

The row space

\[\begin{align*} \operatorname{im}(T)^\vee= \operatorname{rowspace}(A) \subset {\mathbb{R}}^n .\end{align*}\]

Reduce to RREF, and take nonzero rows of \(\mathrm{RREF}(A)\).

The column space

\[\begin{align*} \operatorname{im}(T) = \operatorname{colspace}(A) \subseteq {\mathbb{R}}^m \end{align*}\]

Reduce to RREF, and take columns with pivots from original \(A\).

Not enough pivots implies columns don’t span the entire target domain

The nullspace

\[\begin{align*} \ker(T) = \operatorname{nullspace}(A) \subseteq {\mathbb{R}}^n \end{align*}\]

Reduce to RREF, zero rows are free variables, convert back to equations and pull free variables out as scalar multipliers.

Eigenspaces

For each \(\lambda \in \operatorname{Spec}(A)\), compute a basis for \(\ker(A - \lambda I)\).

Eigenvalues and Eigenvectors

A vector \(\mathbf{v}\) is said to be an eigenvector of \(A\) with eigenvalue \(\lambda\in \operatorname{Spec}(A)\) iff \[\begin{align*} A\mathbf{v} = \lambda \mathbf{v} \end{align*}\] For a fixed \(\lambda\), the corresponding eigenspace \(E_\lambda\) is the span of all such vectors.

For \(\lambda\in \operatorname{Spec}(A)\), \[\begin{align*} \mathbf{v}\in E_\lambda \iff \mathbf{v} \in \ker(A-I\lambda) .\end{align*}\]

Some miscellaneous useful facts:

Finding generalized eigenvectors

Diagonalizability

An \(n\times n\) matrix \(P\) is diagonalizable iff its eigenspace is all of \(\mathbb{R}^n\) (i.e. there are \(n\) linearly independent eigenvectors, so they span the space.)

\(A\) is diagonalizable if there is a basis of eigenvectors for the range of \(P\).

Useful Counterexamples

\[\begin{align*} A \mathrel{\vcenter{:}}=\left[ \begin{array} { c c } { 1 } & { 1 } \\ { 0 } & { 1 } \end{array} \right] &\implies A^n = \left[ \begin{array} { c c } { 1 } & { n } \\ { 0 } & { 1 } \end{array} \right], && \operatorname{Spec}(A) = [1,1] \\ A \mathrel{\vcenter{:}}=\left[ \begin{array} { c c } { 1 } & { 1 } \\ { 0 } & { - 1 } \end{array} \right] &\implies A^2 = I_2, && \operatorname{Spec}(A) = [1, -1] \end{align*}\]

Example Problems

Determine a basis for \[\begin{align*} S = \left\{a_0 + a_1 x + a_2 x^2\mathrel{\Big|}a_0,a_1,a_2 \in \mathbb{R} \land a_0 - a_1 -2a_2 =0\right\}. \end{align*}\]

Let \(a_2=t, a_1=s, a_0=s+2t\), then \[\begin{align*} S &= \left\{ (s+2t) + (sx+tx^2)\mathrel{\Big|}s,t\in\mathbb{R} \right\} \\ &= \left\{ (s+sx) + (2t+tx^2)\mathrel{\Big|}s,t\in\mathbb{R} \right\} \\ &= \left\{ s(1+x) + t(2+x^2)\mathrel{\Big|}s,t\in\mathbb{R} \right\} \\ &= \text{span}\left\{(1+x),(2+x^2)\right\} \end{align*}\]

and a basis for \(S\) is \[\begin{align*} \left\{(1+x), (2+x^2)\right\} \end{align*}\]

: If \(V\) is an \(n\)-dimensional vector space, then every set \(S\) with fewer than \(n\) vectors can be extended to a basis for \(V\).

Only sets with fewer than \(n\) vectors can be extended to form a basis for \(V\).

: The set of all 3 x 3 matrices forms a three-dimensional subspace of \(M_{3}(\mathbb{R})\).

This set forms a 6-dimensional subspace. A basis for this space would require six elements.

Given \(A=\) \[\begin{align*}\begin{bmatrix} 1 & 3 ` \\ -2 & -6 \end{bmatrix}\end{align*}\] what is the dimension of the null space of \(A\)?

The augmented matrix for the system \(A\mathbf{x} = \mathbf{0}\) is \[\begin{align*}\begin{bmatrix}[cc|c] 1 & 3 & 0 \\ 0 & 0 & 0 \end{bmatrix}\end{align*}\] which has one free variable.

Writing one variable in terms of another results in \(x_1 + 3x_2 = 0 \Rightarrow x_1 = 3x_2\).

Let \(x_2 = r\) where \(r \in R\), then \(S = \left\{ x \in \mathbb{R}^2 : \mathbf{x} = r(3,1), r \in \mathbb{R}\right\} = \text{span}\left\{(3,1)\right\}\).

So, the set \(B = \left\{(3,1)\right\}\) is a basis for the null space of \(A\) and\ .

Let \(S\) be the subspace of \(\mathbb{R}^3\) that consists of all solutions to the equation \(x-3y+z = 0\). Determine a basis for \(S\), and find dim[\(S\)].

The first goal is to find a way to express the set of 3-tuples that satisfy this equation.

Let \(y=r\) and \(z=s\), then \(x=r-s\). Then vectors \(\mathbf{v}\) that satisfy the equation are all of the form \[\begin{align*} \mathbf{v} = (3r-s, r, s) = (3r,r,0)+(-s,0,s) = r(3,1,0) + s(-1,0,1). \end{align*}\] (Note - the goal here is to separate the dependent variables into different vectors so they can be written as a linear combination of something.)

The set \(S\) that satisfies this equation is then \[\begin{align*} S &= \left\{ \mathbf{v} \in \mathbb{R}^3 : \mathbf{v} =r(3,1,0) + s(-1,0,1) \land r,s\in\mathbb{R} \right\} \\ &= \text{span}\left\{ (3,1,0), (-1,0,1)\right\} \end{align*}\]

All that remains is to check that the vectors in this span are linearly independent. This can be done by showing that \[\begin{align*} a(3,1,0) + b(-1,0,1) = (0,0,0) \end{align*}\] \(a=b=0\).

Since the two vectors are linearly independent and span the solution set \(S\), they form a basis for \(S\) of dimension 2.

Determine a basis for the subspace of \(M_2(\mathbb{R})\) spanned by \[\begin{align*}\begin{bmatrix} -1 & 3 \\ -1 & 2 \end{bmatrix}\end{align*}\]

\[\begin{align*}\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}\end{align*}\]

\[\begin{align*}\begin{bmatrix} -1 & 4 \\ 1 & 1 \end{bmatrix}\end{align*}\]

\[\begin{align*}\begin{bmatrix} 5 & 6 \\ -5 & 1 \end{bmatrix}\end{align*}\].

Note that because the set contains the zero matrix, it is linearly dependent. So only consider the other three, as they span the same subspace as the original set.

First, determine if \(\left\{ A_1, A_2, A_3\right\}\) is linearly independent. Start with the equation \[\begin{align*} c_1A_1 + c_2A_2 + c_3A_3 = 0_2 \end{align*}\]

which gives \[\begin{align*} c_1 - c_2 + 5c_3 &= 0 \\ 3c_1 + 4c_2 - 6c_3 &= 0 \\ -c_1 + c_2 - 5c_3 &= 0 \\ 2c_1 + c_2 + c_3 &= 0 \end{align*}\]

which has the solution \((-2r,3r,r)\). So the set is linearly dependent by the relation \[\begin{align*} -2A_1 + 3A_2 + A_3 = 0 \text{ or }\\ A_3 = 2A_1 - 3A_2 \end{align*}\]

So \(\left\{A_1, A_2\right\}\) spans the same subspace as the original set. It is also linearly independent, and therefore forms a basis for the original subspace.

Let \(A, B, C \in M_2 (\mathbb{R})\). Define \(\langle A,B\rangle = a_{11}b_{11}+2a_{12}b{12}+3a_{21}b_{21}\). Does this define an inner product on \(M_2 (\mathbb{R})\)?

Instead, let \(\langle A,B\rangle = a_{11} + b_{22}\). Does this define an inner product on \(M_2(\mathbb{R})\)?

Let \(p=a_0 + a_1 x + a_2 x^2\) and \(q=b_0 + b_1 x + b_2 x^2\). Define \(\langle p,q\rangle = \sum_{i=0}^{2}(i+1)a_i b_i\). Does this define an inner product on \(P_2\)?

Linear Algebra: Advanced Topics

Changing Basis

The transition matrix from a given basis \(\mathcal{B} = \left\{{\mathbf{b}_i}\right\}_{i=1}^n\) to the standard basis is given by \[\begin{align*} A\mathrel{\vcenter{:}}= \begin{bmatrix} \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_n \\ \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ \end{bmatrix} ,\end{align*}\] and the transition matrix from the standard basis to \(\mathcal{B}\) is \(A^{-1}\).

Orthogonal Matrices

Given a notion of orthogonality for vectors, we can extend this to matrices. A square matrix is said to be orthogonal iff \(QQ^T = Q^TQ = I\). For rectangular matrices, we have the following characterizations: \[\begin{align*} QQ^T = I \implies &\text{The rows of } Q \text { are orthogonal,} \\ Q^TQ = I \implies &\text{The columns of } Q \text{ are orthogonal.} \end{align*}\]

To remember which condition is which, just recall that matrix multiplication \(AB\) takes the inner product between the rows of \(A\) and the columns of \(B\). So if, for example, we want to inspect whether or not the columns of \(Q\) are orthogonal, we should let \(B=Q\) in the above formulation – then we just note that the rows of \(Q^T\) are indeed the columns of \(Q\), so \(Q^TQ\) computes the inner products between all pairs of the columns of \(Q\) and stores them in a matrix.

Projections

A projection \(P\) induces a decomposition \[\begin{align*} \operatorname{dom}(P) = \ker(P) \oplus \ker(P)^\perp .\end{align*}\]

Distance from a point \(\mathbf{p}\) to a line \(\mathbf{a} + t\mathbf{b}\): let \(\mathbf{w} = \mathbf{p} - \mathbf{a}\), then: \({\left\lVert {\mathbf{w} - P(\mathbf{w}, \mathbf{v})} \right\rVert}\)

\[\begin{align*} \operatorname{Proj}_{\operatorname{range}(A)}(\mathbf{x}) = A(A^t A)^{-1} A^t \mathbf{x} .\end{align*}\]

Mnemonic: \[\begin{align*} P \approx {A^t A \over AA^t} .\end{align*}\]

With an inner product in hand and a notion of orthogonality, we can define a notion of orthogonal projection of one vector onto another, and more generally of a vector onto a subspace spanned by multiple vectors.

Projection Onto a Vector

Say we have two vectors \(\mathbf{x}\) and \(\mathbf{y}\), and we want to define “the component of \(\mathbf{x}\) that lies along \(\mathbf{y}\)”, which we’ll call \(\mathbf{p}\). We can work out what the formula should be using a simple model:

We notice that whatever \(p\) is, it will in the direction of \(\mathbf{y}\), and thus \(\mathbf{p} = \lambda \widehat{\mathbf{y}}\) for some scalar \(\lambda\), where in fact \(\lambda = {\left\lVert {\mathbf{p}} \right\rVert}\) since \({\left\lVert {\widehat{\mathbf{y}}} \right\rVert} = 1\). We will find that \(\lambda = {\left\langle {\mathbf{x}},~{\widehat{\mathbf{y}}} \right\rangle}\), and so \[\begin{align*} \mathbf{p} = {\left\langle {\mathbf{x}},~{\widehat{\mathbf{y}}} \right\rangle}\widehat{\mathbf{y}} = \frac{{\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle}}{{\left\langle {\mathbf{y}},~{\mathbf{y}} \right\rangle}} \mathbf{y} .\end{align*}\]

Notice that we can then form a “residual” vector \(\mathbf{r} = \mathbf{x} - \mathbf{p}\), which should satisfy \(\mathbf{r} ^\perp\mathbf{p}\). If we were to let \(\lambda\) vary as a function of a parameter \(t\) (making \(\mathbf{r}\) a function of \(t\) as well) we would find that this particular choice minimizes \({\left\lVert {\mathbf{r} (t)} \right\rVert}\).

Projection Onto a Subspace

In general, supposing one has a subspace \(S = \mathrm{span}\left\{{\mathbf{y}_1, \mathbf{y}_2, \cdots, \mathbf{y}_n}\right\}\) and (importantly!) the \(\mathbf{y}_i\) are orthogonal, then the projection of \(\mathbf{p}\) of \(x\) onto \(S\) is given by the sum of the projections onto each basis vector, yielding

\[\begin{align*} \mathbf{p} = \sum_{i=1}^n \frac{{\left\langle {\mathbf{x}},~{\mathbf{y}_i} \right\rangle}}{{\left\langle {\mathbf{y}_i},~{\mathbf{y}_i} \right\rangle}} \mathbf{y}_i = \sum_{i=1}^n {\left\langle {\mathbf{x}},~{\mathbf{y}_i} \right\rangle} \widehat{\mathbf{y}_i} .\end{align*}\]

Note: this is part of why having an orthogonal basis is desirable!

Letting \(A = [\mathbf{y}_1, \mathbf{y}_2, \cdots]\), then the following matrix projects vectors onto \(S\), expressing them in terms of the basis \(\mathbf{y}_i\)2: \[\begin{align*} \tilde P_A = (AA^T)^{-1}A^T, \end{align*}\]

while this matrix performs the projection and expresses it in terms of the standard basis: \[\begin{align*} P_A = A(AA^T)^{-1}A^T. \end{align*}\]

Equation of a plane: given a point \(\mathbf{p}_0\) on a plane and a normal vector \(\mathbf{n}\), any vector \(\mathbf{x}\) on the plane satisfies \[\begin{align*} {\left\langle {\mathbf{x} - \mathbf{p}_0},~{\mathbf{n}} \right\rangle} = 0 \end{align*}\]

To find the distance between a point \(\mathbf{a}\) and a plane, we need only project \(\mathbf{a}\) onto the subspace spanned by the normal \(\mathbf{n}\): \[\begin{align*} d = {\left\langle {\mathbf{a}},~{\mathbf{n}} \right\rangle} .\end{align*}\]

One important property of projections is that for any vector \(\mathbf{v}\) and for any subspace \(S\), we have \(\mathbf{v} - P_S(\mathbf{v}) \in S^\perp\). Moreover, if \(\mathbf{v} \in S^\perp\), then \(P_s(\mathbf{v})\) must be zero. This follows by noting that in equation \(\ref{projection_equation}\), every inner product appearing in the sum vanishes, by definition of \(\mathbf{v} \in S^\perp\), and so the projection is zero.

Least Squares

\(\mathbf{x}\) is a least squares solution to \(A\mathbf{x} = \mathbf{b}\) iff \[\begin{align*} A^t A \mathbf{x} = A^t \mathbf{b} \end{align*}\]

The general setup here is that we would like to solve \(A\mathbf{x} = \mathbf{b}\) for \(\mathbf{x}\), where \(\mathbf{b}\) is not in fact in the range of \(A\). We thus settle for a unique “best” solution \(\tilde{\mathbf{x}}\) such that the error \({\left\lVert {A\tilde{\mathbf{x}} - \mathbf{b}} \right\rVert}\) is minimized.

Geometrically, the solution is given by projecting \(\mathbf{b}\) onto the column space of \(A\). To see why this is the case, define the residual vector \(\mathbf{r} = A\tilde{\mathbf{x}} - \mathbf{b}\). We then seek to minimize \({\left\lVert {\mathbf{r}} \right\rVert}\), which happens exactly when \(\mathbf{r} ^\perp\operatorname{im}A\). But this happens exactly when \(\mathbf{r} \in (\operatorname{im}A)^\perp\), which by the fundamental subspaces theorem, is equivalent to \(\mathbf{r} \in \ker A^T\).

From this, we get the equation \[\begin{align*} A^T \mathbf{r} = \mathbf{0} \\ \implies A^T(A \tilde{\mathbf{x}} - \mathbf{b}) = \mathbf{0}\\ \implies A^TA\tilde{\mathbf{x}} = A^T \mathbf{b}, \end{align*}\]

where the last line is described as the normal equations.

If \(A\) is an \(m\times n\) matrix and is of full rank, so it has \(n\) linearly independent columns, then one can show that \(A^T A\) is nonsingular, and we thus arrive at the least-squares solution \[\begin{align*} \tilde{\mathbf{x}} = (A^TA)^{-1}A^T \mathbf{b} \hfill\blacksquare \end{align*}\]

These equations can also be derived explicitly using Calculus applied to matrices, vectors, and inner products. This requires the use of the following formulas: \[\begin{align*} {\frac{\partial }{\partial \mathbf{x}}\,} {\left\langle {\mathbf{x}},~{\mathbf{a}} \right\rangle} &= \mathbf{a} \\ {\frac{\partial }{\partial \mathbf{x}}\,} {\left\langle {\mathbf{x}},~{\mathbf{A}\mathbf{x}} \right\rangle} &= (A+A^T)\mathbf{x} \end{align*}\]

as well as the adjoint formula \[\begin{align*} {\left\langle {A\mathbf{x}},~{\mathbf{x}} \right\rangle} = {\left\langle {\mathbf{x}},~{A^T \mathbf{x}} \right\rangle}. .\end{align*}\]

From these, by letting \(A=I\) we can derive \[\begin{align*} {\frac{\partial }{\partial \mathbf{x}}\,} {\left\lVert {\mathbf{x}} \right\rVert}^2 = {\frac{\partial }{\partial \mathbf{x}}\,} {\left\langle {\mathbf{x}},~{\mathbf{x}} \right\rangle} = 2\mathbf{x}\\ .\end{align*}\]

The derivation proceeds by solving the equation \[\begin{align*} {\frac{\partial }{\partial \mathbf{x}}\,} {\left\lVert {\mathbf{b} - A\mathbf{x}} \right\rVert}^2 = \mathbf{0}. .\end{align*}\]

Normal Forms

Every square matrix is similar to a matrix in Jordan canonical form.

Decompositions

The QR Decomposition

Gram-Schmidt is often computed to find an orthonormal basis for, say, the range of some matrix \(A\). With a small modification to this algorithm, we can write \(A = QR\) where \(R\) is upper triangular and \(Q\) has orthogonal columns.

Why is this useful? One reason is that this also allows for a particularly simple expression of least-squares solutions. If \(A=QR\), then \(R\) will be invertible, and a bit of algebraic manipulation will show that \[\begin{align*} \tilde{\mathbf{x}} = R^{-1}Q^T\mathbf{b}. .\end{align*}\]

How does it work? You simply perform Gram-Schmidt to obtain \(\left\{{\mathbf{u}_i}\right\}\), then \[\begin{align*}Q = [\mathbf{u}_1, \mathbf{u}_2, \cdots ].\end{align*}\]

The matrix \(R\) can then be written as

\[\begin{align*} r_{ij} = \begin{cases} {\left\langle {\mathbf{u}_i},~{\mathbf{x}_j} \right\rangle}, & i\leq j, \\ 0, & \text{else}. \end{cases} \end{align*}\]

Explicitly, this yields the matrix \[\begin{align*} R = \begin{bmatrix} {\left\langle {\mathbf{u}_1},~{\mathbf{x}_1} \right\rangle} & {\left\langle {\mathbf{u}_1},~{\mathbf{x}_2} \right\rangle} & {\left\langle {\mathbf{u}_1},~{\mathbf{x}_3} \right\rangle} & \cdots & \\ 0 & {\left\langle {\mathbf{u}_2},~{\mathbf{x}_2} \right\rangle} & {\left\langle {\mathbf{u}_2},~{\mathbf{x}_3} \right\rangle} & \cdots & \\ 0 & 0 & {\left\langle {\mathbf{u}_3},~{\mathbf{x}_3} \right\rangle} & \cdots & \\ \vdots & \vdots & \vdots & \ddots \\ \end{bmatrix} \end{align*}\]

Appendix: Lists of things to know

Textbook: Leon, Linear Algebra with Applications

Topics

Definitions

Lower-division review

Things to compute

Things to prove

Ordinary Differential Equations

Techniques Overview

\[\begin{align*} p(y)y' = q(x) && \hspace{10em} \text{separable} \\ \\ y'+p(x)y = q(x) && \text{integrating factor} \\ \\ y' = f(x,y), f(tx,ty) = f(x,y) && y = xV(x)\text{ COV reduces to separable} \\ \\ y' +p(x)y = q(x)y^n && \text{Bernoulli, divide by $y^n$ and COV $u = y^{1-n}$} \\ \\ M(x,y)dx + N(x,y)dy = 0 && M_y = N_x: \phi(x,y) = c (\phi_x =M, \phi_y = N) \\ \\ P(D)y = f(x,y) && x^ke^{rx} \text{ for each root } \end{align*}\]

Where \(e^{zx}\) yields \(e^{ax}\cos bx, e^{ax}\sin bx\)

Types of Equations

Linear Homogeneous

General form: \[\begin{align*} y^{(n)} + c_{n-1} y^{(n-1)} + \cdots + c_2y'' + cy' + cy = 0 \\ p(D)y = \prod (D-r_i)^{m_i} y= 0 \end{align*}\] where \(p\) is a polynomial in the differential operator \(D\) with roots \(r_i\):

Example: by cases, second order equation of the form \[\begin{align*}ay'' + by' + cy = 0\end{align*}\] - Two distinct roots: \(c_1 e^{r_1 x} + c_2 e^{r_2 x}\) - One real root: \(c_1 e^{rx} + c_2 x e^{rx}\) - Complex conjugates \(\alpha \pm i \beta\): \(e^{\alpha x}(c_1 \cos \beta x + c_2 \sin \beta x)\)

Linear Inhomogeneous

General form: \[\begin{align*} y^{(n)} + c_{n-1} y^{(n-1)} + \cdots + c_2y'' + cy' + cy = F(x) \\ p(D)y = \prod (D-r_i)^{m_i} y= 0 \end{align*}\]

Then solutions are of the form \(y_c + y_p\), where \(y_c\) is the solution to the associated homogeneous system and \(y_p\) is a particular solution.

Methods of obtaining particular solutions

Undetermined Coefficients

Useful Annihilators: \[\begin{align*} &F(x) = p(x): & D^{\deg(p)+1} \\ &F(x) = p(x)e^{ax}: & (D-a)^{\deg(p)+1}\\ &F(x) = \cos(ax) + \sin(ax): & D^2 + a^2\\ &F(x) = e^{ax}(a_0\cos(bx) + b_0\sin(bx)): & (D-z)(D-{\overline{{z}}}) = D^2 -2aD + a^2 + b^2 \\ &F(x) = p(x)e^{ax}\cos(bx) + p(x)e^{ax}\cos(bx): & \left( (D-z)(D-{\overline{{z}}})\right)^{\max(\deg(p), \deg(q))+ 1} \end{align*}\]

Variation of Parameters

Reduction of Order

Systems of Differential Equations

General form: \[\begin{align*} \frac{\partial \mathbf{x}(t) }{\partial t} = A\mathbf{x}(t) + \mathbf{b}(t) \iff \mathbf{x}'(t) = A\mathbf{x}(t) + \mathbf{b}(t) \end{align*}\]

General solution to homogeneous equation: \[\begin{align*} c_1\mathbf{x_1}(t) + c_2\mathbf{x_2}(t)+ \cdots +c_n\mathbf{x_n}(t) = \mathbf{X}(t)\mathbf{c} \end{align*}\]

If \(A\) is a matrix of constants: \(\mathbf{x}(t) = e^{\lambda_i t}~\mathbf{v}_i\) is a solution for each eigenvalue/eigenvector pair \((\lambda_i, \mathbf{v}_i)\) - If \(A\) is defective, you’ll need generalized eigenvectors.

Inhomogeneous Equation: particular solutions given by \[\begin{align*} \mathbf{x}_p(t) = \mathbf{X}(t) \int^t \mathbf{X}^{-1}(s)\mathbf{b}(s) ~ds \end{align*}\]

Laplace Transforms

Definitions: \[\begin{align*} H_ { a } ( t ) \mathrel{\vcenter{:}}=\left\{ \begin{array} { l l } { 0 , } & { 0 \leq t < a } \\ { 1 , } & { t \geq a } \end{array} \right.\\ \delta(t): \int_{\mathbb{R}}\delta(t-a)f(t)~dt &= f(a),\quad \int_{\mathbb{R}}\delta(t-a)~dt = 1\\ (f \ast g )(t) &= \int_0^t f(t-s)g(s)~ds \\ L[f(t)] &= L[f] =\int_0^\infty e^{-st}f(t)dt = F(s) .\end{align*}\] Useful property: for \(a\leq b\), \(H_a(t) - H_b(t) = \indic{[a,b]}\). \[\begin{align*} t^n, n\in{\mathbb{N}}\quad&\iff &\frac{n!}{s^{n+1}},\quad &s > 0 \\ t^{-\frac{1}{2}} \quad&\iff &\sqrt{\pi} s^{-\frac{1}{2}}\quad & s>0\\ e^{at} \quad&\iff &\frac{1}{s-a},\quad &s > a \\ \cos(bt) \quad&\iff &\frac{s}{s^2+b^2},\quad &s>0 \\ \sin(bt) \quad&\iff &\frac{b}{s^2+b^2},\quad &s>0 \\ \cosh(bt) \quad&\iff &\frac{s}{s^2 - b^2},\quad &? \\ \sinh(bt) \quad&\iff &\frac{b}{s^2-b^2},\quad &? \\ \delta(t-a) \quad&\iff &e^{-as} \quad& \\ H_a(t) \quad&\iff &s^{-1}e^{-as}\quad& \\ e^{at}f(t) \quad&\iff &F(s-a)\quad & \\ H_a(t)f(t-a) \quad&\iff &e^{-as}F(s)& \\ f'(t) \quad&\iff & sL(f) - f(0) & \\ f''(t) \quad&\iff &s^2L(f) -sf(0) - f'(0) &\\ f^{(n)}(t) \quad&\iff & s^nL(f) - \sum_{i=0}^{n-1} s^{n-1-i}f^{(i)}(0) & \\ f(t)g(t) \quad&\iff &F(s) \ast G(s)\quad & \end{align*}\]

\[\begin{align*} p(y)y' = q(x) & & \hspace{10em} \text{separable} \\ \\ y'+p(x)y = q(x) & & \text{integrating factor} \\ \\ y' = f(x,y), f(tx,ty) = f(x,y) & & y = xV(x)\text{ COV reduces to separable} \\ \\ y' +p(x)y = q(x)y^n & & \text{Bernoulli, divide by $y^n$ and COV $u = y^{1-n}$} \\ \\ M(x,y)dx + N(x,y)dy = 0 & & M_y = N_x: \phi(x,y) = c (\phi_x =M, \phi_y = N) \\ \\ P(D)y = f(x,y) & & x^ke^{rx} \text{ for each root } \end{align*}\]

\[\begin{align*} L[e^{at} f(t)] = \int_0^\infty e^{(a-s)}f(t)dt = F(s-a), .\end{align*}\]

The general technique for solving differential equations with Laplace Transforms: - Take the Laplace Transform of all terms on both sides. - Solve for \(L[y]\) in terms of \(s\). - Attempt an inverse Laplace Transformations - This may involve partial fraction decomposition, completing the square, and splitting numerators to match terms with known inverse transformations.

Systems of Differential Equations

For a collection of \(n\) functions \(f_i: {\mathbb{R}}^n \to {\mathbb{R}}\), define the \(n\times 1\) column vector \[\begin{align*} W(f_i)(\mathbf{p}) \mathrel{\vcenter{:}}= \begin{bmatrix} f_i(\mathbf{p}) \\ f_i'(\mathbf{p}) \\ f_i''(\mathbf{p}) \\ \vdots \\ f^{(n-1)}(\mathbf{p}) \end{bmatrix} .\end{align*}\]

The Wronskian of this collection is defined as \[\begin{align*} W(f_1, \cdots, f_n)(\mathbf{p}) \mathrel{\vcenter{:}}= \det \begin{bmatrix} \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ W(f_1)(\mathbf{p}) & W(f_2)(\mathbf{p}) & \cdots & W(f_n)(\mathbf{p})\\ \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ \end{bmatrix} .\end{align*}\]

A set of functions \(\left\{{f_i}\right\}\) is linearly independent on \(I \iff \exists x_0 \in I: W(x_0) \neq 0\).

\(W \equiv 0\) on \(I\) does not imply that \(\left\{{f_i}\right\}\) is linearly dependent! Counterexample: \(\left\{{x, x+x^2, 2x-x^2}\right\}\) where \(W \equiv 0\) but \(x+x^2 = 3(x) + (2x-x^2)\) is a linear combination of the other two functions.

Linear Equations of Order \(n\)

The standard form of such equations is \[\begin{align*} y^{(n)} + a_1y^{(n-1)} + a_2y^{(n-2)} + \cdots +a_ny'' + a_{n-1}y' + y = F(x). \end{align*}\]

All solutions will be the sum of the solution to the associated homogeneous equation and a single particular solution.

In the homogeneous case, examine the discriminant of the characteristic polynomial. Three cases arise:

That is, every real root contributes a term of \(ce^{rx}\), while a multiplicity of \(m\) multiplies the solution by a polynomial in \(x\) of degree \(m-1\).

Every pair of complex roots contributes a term \(ce^r(a\cos \omega x + b\sin \omega x)\), where \(r\) is the real part of the roots and \(\omega\) is the complex part.

In the nonhomogeneous case, assume a solution in the most general form of \(F(x)\), and substitute it into the equation to solve for constant terms. For example,

Use to reduce a nonhomogeneous equation to a homogeneous one as a polynomial in the operator \(D\).

\(F(x)\) of the form \(e^{ax}sin(kx)\) can be rewritten as \(e^{(a+ki)x}\)

Algebra

To Sort

Big List of Notation

\[\begin{align*} C(x) = && \left\{{g\in G : gxg^{-1} = x}\right\} && \subseteq G && \text{Centralizer} \\ C_G(x) = && \left\{{gxg^{-1} : g\in G}\right\} && \subseteq G && \text{Conjugacy Class} \\ G_x = && \left\{{g.x : x\in X}\right\} && \subseteq X && \text{Orbit} \\ x_0 = && \left\{{g\in G : g.x = x}\right\} && \subseteq G && \text{Stabilizer} \\ Z(G) = && \left\{{x\in G: \forall g\in G,~ gxg^{-1} = x}\right\} && \subseteq G && \text{Center} \\ \mathrm{Inn}(G) = && \left\{{\phi_g(x) = gxg^{-1} }\right\} && \subseteq {\operatorname{Aut}}(G) && \text{Inner Aut.} \\ \mathrm{Out}(G) = && {\operatorname{Aut}}(G) / \mathrm{Inn}(G) && \hookrightarrow{\operatorname{Aut}}(G) && \text{Outer Aut.} \\ N(H) = && \left\{{g\in G: gHg^{-1} = H}\right\} && \subseteq G && \text{Normalizer} \end{align*}\]

Group Theory

Notation: \(H < G\) a subgroup, \(N < G\) a normal subgroup, concatenation is a generic group operation.

Big Theorems

\[\begin{align*} \phi: G \to G’ \implies && \frac{G}{\ker{\phi}} \cong &~ \phi(G) \\ H {~\trianglelefteq~}G,~ K < G \implies && \frac{K}{H\cap K} \cong &~ \frac{HK}{H} \\ H,K {~\trianglelefteq~}G,~ K < H \implies && \frac{G/K}{H/K} \cong &~ \frac{G}{H} \end{align*}\]

Cyclic Groups

The Symmetric Group

Ring Theory

Number Theory

Notation and Basic Definitions

\[\begin{align*} (a, b) \mathrel{\vcenter{:}}=\gcd(a, b) && \text{the greatest common divisor} \\ {\mathbb{Z}}_n && \text{the ring of integers} \mod n \\ {\mathbb{Z}}_n^{\times}&& \text{the group of units}\mod n .\end{align*}\]

A function \(f:{\mathbb{Z}}\to {\mathbb{Z}}\) is said to be multiplicative iff \[\begin{align*} (a, b) = 1 \implies f(ab) = f(a) f(b) .\end{align*}\]

Big Theorems

Primes

Every \(n\in {\mathbb{Z}}\) can be written uniquely as \[\begin{align*} n = \prod_{i=1}^m p_i^{k_i} \end{align*}\] where the \(p_i\) are the \(m\) distinct prime divisors of \(n\).

Note that the number of distinct prime factors is \(m\), while the total number of factors is \(\prod_{i=1}^m(k_i + 1)\).

Divisibility

\[\begin{align*} a{~\Bigm|~}b \iff b \equiv 0 \mod a \iff \exists k \text{ such that } ak = b \end{align*}\]

\(\gcd, \operatorname{lcm}\)

\(\gcd(a, b)\) can be computed by taking prime factorizations of \(a\) and \(b\), intersecting the primes occurring, and taking the lowest exponent that appears. Dually, \(\operatorname{lcm}(a, b)\) can be computed by taking the union and the highest exponent.

\[\begin{align*} xy = \gcd{(x,y)}~\mathrm{lcm}{(x,y)} \end{align*}\]

If \(d\mathrel{\Big|}x\) and \(d\mathrel{\Big|}y\), then \[\begin{align*} \gcd(x,y) &= d\cdot \gcd\qty{ \frac x d, \frac y d} \\ \operatorname{lcm}(x,y) &= d\cdot \operatorname{lcm}\qty{ \frac x d, \frac y d} \end{align*}\]

\[\begin{align*} \gcd(x, y, z) &= \gcd(\gcd(x,y), z) \\ \gcd(x, y) &= \gcd(x\bmod y, y) \\ \gcd(x,y) &= \gcd(x-y, y) .\end{align*}\]

The Euclidean Algorithm

\(\gcd(a, b)\) can be computed via the Euclidean algorithm, taking the final bottom-right coefficient.

Modular Arithmetic

Generally concerned with the multiplicative group \(({\mathbb{Z}}_n, \times)\).

The system \[\begin{align*} \begin{array} { c } { x \equiv a _ { 1 } \mod m _ { 1 } } \\ { x \equiv a _ { 2 } \mod m _ { 2 } } \\ { \vdots } \\ { x \equiv a _ { r } \mod m _ { r } } \end{array} \end{align*}\]

has a unique solution \(x \mod \prod m_i \iff (m_i, m_j) = 1\) for each pair \(i,j\), given by \[\begin{align*} x = \sum_{j=1}^r a_j \frac{\prod_i m_i}{m_j} \left[ \frac{\prod_i m_i}{m_j} \right]^{-1}_{\mod m_j} .\end{align*}\]

\[\begin{align*} a^{\phi(p)} \equiv 1 \mod n .\end{align*}\]

\[\begin{align*} x^{p} &\equiv x \mod p \\ x^{p-1} &\equiv 1 \mod p \quad \text{ if } p \nmid a \end{align*}\]

Diophantine Equations

Consider \(ax + by = c\). This has solutions iff \(c = 0 \mod (a,b) \iff \gcd(a,b) \text{ divides } c\).

Computations

If \(x\equiv 0 \mod n\), then \(x\equiv 0 \mod p^k\) for all \(p^k\) appearing in the prime factorization of \(n\).

If there are factors of the modulus in the equation, peel them off with addition, using the fact that \(nk \equiv 0 \mod n\). \[\begin{align*} x &\equiv nk + r \mod n \\ &\equiv r \mod n \end{align*}\]

So take \(x=463, n = 4\), then use \(463 = 4\cdot 115 + 4\) to write \[\begin{align*} 463 &\equiv y \mod 4 \\ \implies 4\cdot 115 + 3 &\equiv y \mod 4 \\ \implies 3&\equiv y\mod 4 .\end{align*}\]

For any \(n\), \[\begin{align*} x^k \mod n \equiv (x^{k/d} \bmod n)^d \mod n .\end{align*}\]

\[\begin{align*} 2^{25} &\equiv (2^5 \mod 5)^5 \mod 5 \\ &\equiv 2^5 \mod 5 \\ &\equiv 2 \mod 5 \end{align*}\]

Make things easier with negatives! For example, \(\mod 5\), \[\begin{align*} 4^{25} &\equiv (-1)^{25} \mod 5\\ &\equiv (-1) \mod 5\\ &\equiv 4 \mod 5 \end{align*}\]

Invertibility

\[\begin{align*} xa = xb \mod n \implies a = b \mod \frac{n}{(x,n)} .\end{align*}\]

\[\begin{align*} x\in {\mathbb{Z}}_n^{\times}\iff (x, n) = 1 ,\end{align*}\] and thus \[\begin{align*} {\mathbb{Z}}_n^\times = \left\{{1\leq x \leq n : (x,n) = 1}\right\} \end{align*}\] and \({\left\lvert {{\mathbb{Z}}_n^\times} \right\rvert} = \phi(n)\).

One can reduce equations by dividing through by a unit. Pick any \(x\) such that \(x{~\Bigm|~}a\) and \(x{~\Bigm|~}b\) with \((x,n) = 1\), then \[\begin{align*} a =b \mod n \implies \frac a x = \frac b x \mod n .\end{align*}\]

The Totient Function

\[\begin{align*} \phi(n) = {\left\lvert {\left\{{1\leq x \leq n : (x,n) = 1}\right\}} \right\rvert} \end{align*}\]

\[\begin{align*} \phi(1) & = {\left\lvert {\left\{{1}\right\}} \right\rvert} = 1 \\ \phi(2) & = {\left\lvert {\left\{{1}\right\}} \right\rvert} = 1 \\ \phi(3) & = {\left\lvert {\left\{{1,2}\right\}} \right\rvert} = 2 \\ \phi(4) & = {\left\lvert {\left\{{1,3}\right\}} \right\rvert} = 2 \\ \phi(5) & = {\left\lvert {\left\{{1,2,3,4}\right\}} \right\rvert} = 4 \end{align*}\]

\[\begin{align*} \phi(p) & = p-1 \\ \phi(p^k) & = p^{k-1}(p-1) \\ \phi(n) &= n\prod_{i=1}^{?} \qty{1 - {1\over p_i}} \\ n &= \sum_{\tiny d{~\Bigm|~}n} \phi(d) \end{align*}\]

All numbers less than \(p\) are coprime to \(p\); there are \(p^k\) numbers less than \(p^k\) and the only numbers not coprime to \(p^k\) are multiples of \(p\), i.e. \(\left\{{p, p^2, \cdots p^{k-1}}\right\}\) of which there are \(k-1\), yielding \(p^k - p^{k-1}\)

Along with the fact that \(\phi\) is multiplicative, so \((p,q) = 1 \implies \phi(pq) = \phi(p)\phi(q)\), compute this for any \(n\) by taking the prime factorization.

With these properties, one can compute: \[\begin{align*} \phi(n) &= \phi\qty{ \prod_i p_i^{k_i}} \\ &= \prod_i p_i^{k_i-1}(p_i-1) \\ &= n \left(\frac{\prod_i (p_i-1)}{\prod_i p_i}\right) \\ &= n\prod_i \qty{ 1 - \frac{1}{p_i}} \end{align*}\]

\todo[inline]{Check and explain}

Quadratic Residues

\(x\) is a quadratic residue \(\bmod n\) iff there exists an \(a\) such that \(a^2 = x \mod n\).

In \({\mathbb{Z}}_p\), exactly half of the elements (even powers of generator) are quadratic residues.

\[\begin{align*} -1\text{ is a quadratic residue in } {\mathbb{Z}}_p \iff p = 1 \mod 4 .\end{align*}\]

Primality Tests

If \(n\) is prime, then \[\begin{align*} a^{n-1} = 1 \mod n \end{align*}\]

\(n\) is prime iff \[\begin{align*} x^2 = 1 \mod n \implies x = \pm 1 \end{align*}\]

Sequences in Metric Spaces

Every bounded sequence has a convergent subsequence.

In \({\mathbb{R}}^n, X\) is compact \(\iff X\) is closed and bounded.

Necessity of \({\mathbb{R}}^n\): \(X = ({\mathbb{Z}}, d(x,y) = 1)\) is closed, complete, bounded, but not compact since \(\left\{{1,2,\cdots}\right\}\) has no convergent subsequence

Converse holds iff bounded is replaced with totally bounded

Sequences

Notation: \(\left\{{a_n}\right\}_{n\in{\mathbb{N}}}\) is a sequence, \(\displaystyle\sum_{i\in{\mathbb{N}}} a_i\) is a series.

Known Examples

Convergence

A sequence \(\left\{{x_j}\right\}\) converges to \(L\) iff \[\begin{align*} \forall \varepsilon > 0,\, \exists N > 0 \text{ such that } \quad n\geq N \implies {\left\lvert {x_n - L} \right\rvert} < \varepsilon .\end{align*}\]

\[\begin{align*} b_n \leq a_n \leq c_n \text{ and } b_n,c_n \to L \implies a_n \to L \end{align*}\]

If \(\left\{{a_j}\right\}\) monotone and bounded, then \(a_j \to L = \lim\sup a_i < \infty\).

\({\left\lvert {a_m - a_n} \right\rvert} \to 0 \in {\mathbb{R}}\implies \left\{{a_i}\right\}\) converges.

Checklist

Sums (“Series”)

A series is an function of the form \[\begin{align*} f(x) = \sum_{j=1}^\infty c_j x^j .\end{align*}\]

Known Examples

Conditionally Convergent

\[\begin{align*} \sum_{k=1}^\infty k^p &< \infty &&\iff p \leq 1 \\ \sum_{k=1}^\infty \frac{1}{k^p} &< \infty &&\iff p > 1 \\ \sum_{k=1}^\infty \frac{1}{k} &= \infty && \end{align*}\]

Convergent

\[\begin{align*} \sum_{n=1}^\infty \frac{1}{n^2} & < \infty \\ \sum_{n=1}^\infty \frac{1}{n^3} & < \infty \\ \sum_{n=1}^\infty \frac{1}{n^\frac{3}{2}} & < \infty \\ \sum_{n=1}^\infty \frac{1}{n!} & = e \\ \sum_{n=1}^\infty \frac{1}{c^n} & = \frac{c}{c-1} \\ \sum_{n=1}^\infty (-1)^n \frac{1}{c^n} & = \frac{c}{c+1} \\ \sum_{n=1}^\infty (-1)^n \frac{1}{n} & = \ln 2 \end{align*}\]

Divergent

\[\begin{align*} \sum_{n=1}^\infty \frac{1}{n} = \infty \\ \sum_{n=1}^\infty \frac{1}{\sqrt n} = \infty \end{align*}\]

Convergence

Useful reference: http://math.hawaii.edu/~ralph/Classes/242/SeriesConvTests.pdf

\(a_n\to 0\) does not imply \(\sum a_n < \infty\). Counterexample: the harmonic series.

Absolute convergence \(\implies\) convergence

\[\begin{align*} \limsup a_i \to 0 \implies \sum a_i \text{ converges } \end{align*}\]

The Big Tests

\[\begin{align*} R =\lim_{n\to\infty} {\left\lvert {\frac{a_{n+1}}{a_n}} \right\rvert} \end{align*}\]

\[\begin{align*} R = \limsup_{n \to \infty} \sqrt[n]{{\left\lvert {a_n} \right\rvert}} \end{align*}\] - \(R < 1\): convergent - \(R > 1\): divergent - \(R = 1\): inconclusive

\[\begin{align*} f(n) = a_n \implies \sum a_n < \infty \iff \int_1^\infty f(x) dx < \infty \end{align*}\]

\[\begin{align*} \lim_{n\to\infty}\frac{a_n}{b_n} = L < \infty \implies \sum a_n < \infty \iff \sum b_n < \infty \end{align*}\]

\[\begin{align*} a_n \downarrow 0 \implies \sum (-1)^n a_n < \infty \end{align*}\]

\[\begin{align*} \sum_{n=1}^\infty {\left\lVert {f_n} \right\rVert}_\infty < \infty \implies \exists f\text{ such that } {\left\lVert { \sum_{n=1}^\infty f_n - f} \right\rVert}_\infty \to 0 \end{align*}\] In other words, the series converges uniformly.

Slogan: Convergence of the sup norms implies uniform convergence"

The \(M\) in the name comes from defining \(\sup\left\{{f_k(x)}\right\} \mathrel{\vcenter{:}}= M_n\) and requiring \(\sum {\left\lvert {M_n} \right\rvert} < \infty\).

Checklist

Some Pattern Recognition:

Radius of Convergence

Use the fact that \[\begin{align*} \lim_{k\to\infty} {\left\lvert {\frac{a_{k+1}x^{k+1}}{a_kx^k}} \right\rvert} = {\left\lvert {x} \right\rvert}\lim_{k\to\infty} {\left\lvert {\frac{a_{k+1}}{a_k}} \right\rvert} < 1 \implies \sum a_k x^k < \infty ,\end{align*}\] so take \(L \mathrel{\vcenter{:}}=\lim_{k\to\infty} \frac{a_{k+1}}{a_k}\) and then obtain the radius as \[\begin{align*} R = \frac{1}{L} = \lim_{k\to\infty} {a_k \over a_{k+1}} \end{align*}\]

Real Analysis

Notation

A function is continuously differentiable iff \(f\) is differentiable and \(f'\) is continuous.

Conventions:

\[\begin{align*} f && \text{a functional }{\mathbb{R}}^n \to {\mathbb{R}}\\ \mathbf{f} && \text{a function } {\mathbb{R}}^n\to {\mathbb{R}}^m \\ A, E, U, V && \text{open sets} \\ A' && \text{the limit points of }A \\ \mkern 1.5mu\overline{\mkern-1.5muA\mkern-1.5mu}\mkern 1.5mu && \text{the closure of }A \\ A^\circ\mathrel{\vcenter{:}}= A\setminus A' && \text{the interior of }A \\ K && \text{a compact set} \\ \mathcal{R}_A && \text{the space of Riemann integral functions on }A \\ C^j(A) && \text{the space of }j\text{ times continuously differentiable functions }f: {\mathbb{R}}^n \to {\mathbb{R}}\\ \left\{{f_n}\right\} && \text{a sequence of functions} \\ \left\{{x_n}\right\} && \text{a sequence of real numbers}\\ f_n \to f && \text{pointwise convergence} \\ f_n \rightrightarrows f && \text{uniform convergence} \\ x_n \nearrow x && x_i\leq x_j \text{ and }x_j\text{ converges to }x \\ x_n \searrow x && x_i\geq x_j \text{ and }x_j\text{ converges to }x \\ \sum_{k\in {\mathbb{N}}} f_k && \text{a series}\\ D(f) && \text{the set of discontinuities of }f .\end{align*}\]

Big Ideas

Summary for GRE:

\[\begin{align*} f,g\text{ differentiable on } [a,b] \implies \exists c\in[a,b] : \left[f ( b ) - f ( a ) \right] g' ( c ) = \left[g ( b ) - g ( a )\right] f' ( c ) \end{align*}\]

?

Important Examples

Limits

Commuting Limits

Then consider the following possible ways to commute various limiting operations:

Does taking the derivative of the integral of a function always return the original function? \[\begin{align*} [\frac{\partial}{\partial x}, \int dx]:\qquad\qquad \frac{\partial}{\partial x}\int f(x, t)dt =_? \int \frac{\partial}{\partial x} f(x, t)dt\\ \text{} \end{align*}\]

Answer: Sort of (but possibly not).

Counterexample: \[\begin{align*} f(x) = \begin{cases} 1 & x > 0 \\ -1 & x \leq 0 \end{cases} \implies \int f \approx {\left\lvert {x} \right\rvert}, \end{align*}\] which is not differentiable. (This is remedied by the so-called “weak derivative”)

Sufficient Condition: If \(f\) is continuous, then both are always equal to \(f(x)\) by the FTC.


Is the derivative of a continuous function always continuous? \[\begin{align*} [\frac{\partial}{\partial x}, \lim_{x_i\to x}]:\qquad\qquad \lim_{x_i \to x} f'(x_n) =_? f'(\lim_{x_i\to x} x) \end{align*}\] Answer: No.

Counterexample: \[\begin{align*} f ( x ) = \left\{ \begin{array} { l l } { x ^ { 2 } \sin ( 1 / x ) } & { \text { if } x \neq 0 } \\ { 0 } & { \text { if } x = 0 } \end{array} \right. \implies f ^ { \prime } ( x ) = \left\{ \begin{array} { l l } { 2 x \sin \left( \frac { 1 } { x } \right) - \cos \left( \frac { 1 } { x } \right) } & { \text { if } x \neq 0 } \\ { 0 } & { \text { if } x = 0 } \end{array} \right. \end{align*}\] which is discontinuous at zero.

Sufficient Condition: There doesn’t seem to be a general one (which is perhaps why we study \(C^k\) functions).


Is the limit of a sequence of differentiable functions differentiable and the derivative of the limit?

\[\begin{align*} [\frac{\partial}{\partial x}, \lim_{f_n \to f}]:\qquad\qquad \lim_{f_n \to f}\frac{\partial}{\partial x}f_n(x) =_? \frac{\partial }{\partial x}\lim_{f_n \to f} f_n(x) \end{align*}\] Answer: Super no – even the uniform limit of differentiable functions need not be differentiable!

Counterexample: \(f_n(x) = \frac{\sin(nx)}{\sqrt{n}} \rightrightarrows f = 0\) but \(f_n' \not\to f' = 0\)

Sufficient Condition: \(f_n \rightrightarrows f\) and \(f_n \in C^1\).


Is the limit of a sequence of integrable functions integrable and the integral of the limit?

\[\begin{align*} [\int dx, \lim_{f_n \to f}](f):\qquad\qquad \lim_{f_n \to f}\int f_n(x) dx =_? \int \lim_{f_n \to f} f_n(x) dx \end{align*}\]

Answer: No.

Counterexample: Order \({\mathbb{Q}}\cap[0,1]\) as \(\left\{{q_i}\right\}_{i\in{\mathbb{N}}}\), then take \[\begin{align*} f_n(x) = \sum_{i=1}^n \indic{q_n} \to \indic{{{\mathbb{Q}}\cap[0,1]}} \end{align*}\] where each \(f_n\) integrates to zero (only finitely many discontinuities) but \(f\) is not Riemann-integrable.

Sufficient Condition: - \(f_n \rightrightarrows f\), or - \(f\) integrable and \(\exists M: \forall n, {\left\lvert {f_n} \right\rvert} < M\) (\(f_n\) uniformly bounded)


Is the integral of a continuous function also continuous?

\[\begin{align*} [\int dx, \lim_{x_i \to x}]:\qquad\qquad \lim_{x_i \to x} F(x_i) =_? F(\lim_{x_i \to x} x_i) \end{align*}\]

Answer: Yes.

Proof: \(|f(x)| < M\) on \(I\), so given \(c\) pick a sequence \(x\to c\). Then \[\begin{align*} {\left\lvert {f(x)} \right\rvert} < M \implies \left\vert \int_c^x f(t)dt \right\vert < \int_c^x M dt \implies {\left\lvert {F(x) - F(c)} \right\rvert} < M(b-a) \to 0 \end{align*}\]


Is the limit of a sequence of continuous functions also continuous?

\[\begin{align*} [\lim_{x_i \to x}, \lim_{f_n \to f}]: \qquad\qquad \lim_{f_n \to f}\lim_{x_i \to x} f(x_i) =_? \lim_{x_i \to x}\lim_{f_n \to f} f_n(x_i)\\ \text{}\\ \end{align*}\]

Answer: No.

Counterexample: \(f_n(x) = x^n \to \delta(1)\)

Sufficient Condition: \(f_n \rightrightarrows f\)


Does a sum of differentiable functions necessarily converge to a differentiable function?

\[\begin{align*} \left[\frac{\partial}{\partial x}, \sum_{f_n}\right]: \qquad\qquad \frac{\partial}{\partial x} \sum_{k=1}^\infty f_k =_? \sum_{k=1}^\infty \frac{\partial}{\partial x} f_k \\ \text{} \\ \text{}\\ \end{align*}\]

Answer: No.

Counterexample: \(f_n(x) = \frac{\sin(nx)}{\sqrt{n}} \rightrightarrows 0 \mathrel{\vcenter{:}}= f\), but \(f_n' = \sqrt{n}\cos(nx) \not\to 0 = f'\) (at, say, \(x=0\))

Sufficient Condition: When \(f_n \in C^1, \exists x_0: f_n(x_0) \to f(x_0)\) and \(\sum {\left\lVert {f_n'} \right\rVert}_\infty < \infty\) (continuously differentiable, converges at a point, and the derivatives absolutely converge)


Continuity

\[\begin{align*} f\text{ continuous } \iff \lim_{x \to p} f(x) = f(p) \end{align*}\]

\[\begin{align*} f:(X, d_X) \to (Y, d_Y) \text{ continuous } \iff \forall \varepsilon,~ \exists \delta \mathrel{\Big|}~ d_X(x,y) < \delta \implies d_Y(f(x), f(y)) < \varepsilon \end{align*}\]

\[\begin{align*} f(x) = \sin\qty{ \frac{1}{x} } \implies 0\in D(f) \end{align*}\]

The Dirichlet function is nowhere continuous: \[\begin{align*} f(x) = \indic{{\mathbb{Q}}} \end{align*}\]

The following function continuous at infinitely many points and discontinuous at infinitely many points: \[\begin{align*} f(x) = \begin{cases} 0 & x\in{\mathbb{R}}\setminus{\mathbb{Q}}\\ \frac{1}{q} & x = \frac{p}{q} \in {\mathbb{Q}} \end{cases} \end{align*}\] Then \(f\) is discontinuous on \({\mathbb{Q}}\) and continuous on \({\mathbb{R}}\setminus{\mathbb{Q}}\).

\(f\) is continuous on \({\mathbb{Q}}\):

\(f\) is discontinuous on \({\mathbb{R}}\setminus{\mathbb{Q}}\):

There are no functions that are continuous on \({\mathbb{Q}}\) but discontinuous on \({\mathbb{R}}-{\mathbb{Q}}\)

A continuous function on a compact space attains its extrema.

Lipschitz Continuity

Differentiability

\[\begin{align*} f'(p) \mathrel{\vcenter{:}}=\frac{\partial f}{\partial x}(p) = \lim_{x\to p} \frac{f(x) - f(p)}{x-p} \end{align*}\]

Properties, strongest to weakest

\[\begin{align*} C^\infty \subsetneq C^k \subsetneq \text{ differentiable } \subsetneq C^0 \subset \mathcal{R}_K .\end{align*}\]

Proof that \(f\) differentiable \(\implies f \in C^0\): \[\begin{align*} f(x) - f(p) = \frac{f(x)-f(p)}{x-p}(x-p) \stackrel{\tiny\mbox{hypothesis}}{=} f'(p)(x-p) \stackrel{\tiny\mbox{$x\to p$}}\rightrightarrows 0 \end{align*}\]

Giant Table of Relations

Bold are assumed hypothesis, regular text is the strongest conclusion you can reach, strikeout denotes implications that aren’t necessarily true.

\[\begin{align*} f' && f && \therefore f && F \\ \hline \\ \cancel{\text{exists}} && \mathbf{continuous} && \text{K-integrable} && \text{exists} \\ \cancel{\text{continuous}} && \mathbf{differentiable} && \text{continuous} && \text{exists} \\ \cancel{\text{exists}} && \mathbf{integrable} && \cancel{\text{continuous}} && \text{differentiable} \\ \end{align*}\]

Explanation of items in table:

Integrability

If \(F\) is a differentiable function on the interval \([a,b]\), and \(F'\) is bounded and continuous a.e., then \(F' \in L_R([a, b])\) and \[\begin{align*} \forall x\in [a,b]: \int_a^x F'(t)~dt=F(x)-F(a) \end{align*}\]

Suppose \(f\) bounded and continuous a.e. on \([a,b]\), and define \[\begin{align*} F(x) \mathrel{\vcenter{:}}=\int_a^x f(t)~dt \end{align*}\] Then \(F\) is absolutely continuous on \([a,b]\), and for \(p \in [a,b]\), \[\begin{align*} f \in C^0(p) \implies F \text{ differentiable at } p,~ F'(p) = f(p), \text{ and } F' \stackrel{\tiny\mbox{a.e}}{=} f. \end{align*}\]

The Dirichlet function is Lebesgue integrable but not Riemann integrable: \[\begin{align*} f(x) = \indic{x \in {\mathbb{Q}}} \end{align*}\]

List of Free Conclusions:

Convergence

Sequences and Series of Functions

Define \[\begin{align*} s_n(x) \mathrel{\vcenter{:}}=\sum_{k=1}^n f_k(x) \end{align*}\] and \[\begin{align*} \sum_{k=1}^\infty f_k(x) \mathrel{\vcenter{:}}=\lim_{n\to\infty} s_n(x), \end{align*}\] which can converge pointwise, absolutely, uniformly, or not all.

If \(\limsup_{k\in {\mathbb{N}}} {\left\lvert {f_k(x)} \right\rvert} \neq 0\) then \(f_k\) is not convergent.

If \(f\) is injective, then \(f'\) is nonzero in some neighborhood of ???

Pointwise convergence

\[\begin{align*} f_n \to f = \lim_{n\to\infty} f_n .\end{align*}\] Summary: \[\begin{align*} \lim_{f_n \to f} \lim_{x_i \to x} f_n(x_i) \neq \lim_{x_i \to x} \lim_{f_n \to f} f_n(x_i) .\end{align*}\]

\[\begin{align*} \lim_{f_n \to f} \int_I f_n \neq \int_I \lim_{f_n \to f} f_n .\end{align*}\]

Pointwise convergence is strictly weaker than uniform convergence.

\(f_n(x) = x^n\) on \([0, 1]\) converges pointwise but not uniformly.

\[\begin{align*} f_n \text{ continuous} \centernot\implies f\mathrel{\vcenter{:}}=\lim_n f_n \text{ is continuous} .\end{align*}\]

Take \[\begin{align*} f_n(x) = x^n,\quad f_n(x) \to \indic[x = 1] .\end{align*}\]

\[\begin{align*} f_n \text{ differentiable} &\centernot\implies f'_n \text{ converges} \\ f'_n \text{ converges} &\not\implies \lim f'_n = f' .\end{align*}\]

Take \[\begin{align*} f_n(x) = \frac{1}{n}\sin(n^2 x) \to 0,&& \text{but } f'_n = n\cos(n^2 x) \text{ does not converge} .\end{align*}\]

\[\begin{align*} f_n\in \mathcal{R}_I \centernot\implies\lim_{f_n \to f} \int_I f_n \neq \int_I \lim_{f_n \to f} f_n .\end{align*}\]

May fail to converge to same value, take \[\begin{align*} f_n(x) = \frac{2n^2x}{(1+n^2x^2)^2} \to 0 && \text{but }\int_0^1 f_n = 1 - \frac{1}{n^2 + 1} \to 1\neq 0 .\end{align*}\]

Uniform Convergence

Notation: \[\begin{align*} f_n \rightrightarrows f= \lim_{n\to\infty} f_n \text{ and } \sum_{n=1}^\infty f_n \rightrightarrows S .\end{align*}\]

Summary: \[\begin{align*} \lim_{x_i \to x} \lim_{f_n \to f} f_n(x_i) = \lim_{f_n \to f} \lim_{x_i \to x} f_n(x_i) = \lim_{f_n \to f} f_n(\lim_{x_i \to x} x_i) .\end{align*}\]

\[\begin{align*} \lim_{f_n \to f} \int_I f_n = \int_I \lim_{f_n \to f} f_n .\end{align*}\]

\[\begin{align*} \sum_{n=1}^\infty \int_I f_n = \int_I \sum_{n=1}^\infty f_n .\end{align*}\]

“The uniform limit of a(n) \(x\) function is \(x\)”, for \(x \in\) {continuous, bounded}

\(\left\{{x_i}\right\} \to p \implies\) every subsequence also converges to \(p\).

Every convergent sequence in \(X\) is a Cauchy sequence.

The converse need not hold in general, but if \(X\) is complete, every Cauchy sequence converges. An example of a Cauchy sequence that doesn’t converge: take \(X={\mathbb{Q}}\) and set \(x_i = \pi\) truncated to \(i\) decimal places.

If any subsequence of a Cauchy sequence converges, the entire sequence converges.

\[\begin{align*} d(x,y) &\geq 0 && \text{Positive}\\ d(x,y) &= 0 \iff x = y && \text{Nondegenerate}\\ d(x,y) &= d(y,x) && \text{Symmetric}\\ d(x,y) &\leq d(x,p) + d(p,y) \quad \forall p && \text{Triangle Inequality} .\end{align*}\]

?

?

Topology

Open Set Characterization: Arbitrary unions and finite intersections of open sets are open.

Closed Set Characterization: Arbitrary intersections and finite unions of closed sets are closed.

The best source of examples and counterexamples is the open/closed unit interval in \(\mathbb{R}\). Always test against these first!

If \(f\) is a continuous function. the preimage of every open set is open and the preimage of every closed set is closed.

In \({\mathbb{R}}\), singleton sets and finite discrete sets are closed.

A singleton set can be written \[\begin{align*} \left\{{p_0}\right\} = (-\infty, p) \cup(p, \infty) .\end{align*}\] A finite discrete set \(\left\{{p_0}\right\}\), which wlog (by relabeling) can be assumed to satisfy \(p_0 < p_1 < \cdots\), can be written \[\begin{align*} \left\{{p_0, p_1, \cdots, p_n}\right\} = (-\infty, p_0) \cup(p_0, p_1) \cup\cdots \cup(p_n, \infty) .\end{align*}\]

This yields a good way to produce counterexamples to continuity.

In \(\mathbb{R}\), singletons are closed. This means any finite subset is closed, as a finite union of singleton sets!

If \(X\) is a compact metric space, then \(X\) is complete and bounded.

If \(X\) complete and \(X \subset Y\), then \(X\) closed in \(Y\).

The converse generally does not hold, and completeness is a necessary condition. Counterexample: \({\mathbb{Q}}\subset {\mathbb{Q}}\) is closed but \({\mathbb{Q}}\subset{\mathbb{R}}\) is not.

If \(X\) is compact, then \(Y \subset X \implies Y\) is compact \(\iff\) \(Y\) closed.

A topological space \(X\) is sequentially compact iff every sequence \(\left\{{x_n}\right\}\) has a subsequence converging to a point in \(X\).

If \(X\) is a metric space, \(X\) is compact iff \(X\) is sequentially compact.

Note that in general, neither form of compactness implies the other.

Counterexamples

There are functions differentiable only at a single point. Example: \[\begin{align*} f(x) = \begin{cases} x^2 & x\in QQ\\ -x^2 & x\in {\mathbb{R}}\setminus{\mathbb{Q}} \end{cases} .\end{align*}\]

This is discontinuous everywhere except for \(x=0\), and you can compute \[\begin{align*} \lim_{h\to 0} {f(x+h) - f(x) \over h}\Big|_{x=0} = \lim_{h\to 0} \begin{cases} h & x\in {\mathbb{Q}}\\ -h & x\in {\mathbb{R}}\setminus{\mathbb{Q}} \end{cases} =0 .\end{align*}\]

The product of two non-differentiable functions can be differentiable: take \(f(x) = g(x) = {\left\lvert {x} \right\rvert}\) which are not differentiable at \(x=0\), then \(fg(x) = {\left\lvert {x} \right\rvert}^2\) is differentiable at \(x=0\).

A continuous function that is zero on a dense set \(A\subset X\) is identically zero.

Since \(A\) is dense, for any \(x\in X\setminus A\) take a sequence \(\left\{{x_n}\right\}\) in \(A\) converging to \(x\). Then \(0 = f(x_n) \to f(x)\) implies \(f(x) = 0\).

Point-Set Topology

Definitions

The space \(\left\{{\frac{1}{n}}\right\}_{n\in {\mathbb{N}}}\).

List of properties preserved by continuous maps:

Checking if a map is homeomorphism:

Probability

Definitions

\[\begin{align*} L^2(X) &= \left\{{f: X \to {\mathbb{R}}: \int_{\mathbb{R}}f(x) ~dx < \infty}\right\} &&\text{square integrable functions}\\ {\left\langle {g},~{f} \right\rangle}_{2} &= \int_{\mathbb{R}}g(x)f(x) ~dx &&\text{the } L^2 \text{ inner product}\\ {\left\lVert {f} \right\rVert}_2^2 &= {\left\langle {f},~{f} \right\rangle} = \int_{\mathbb{R}}f(x)^2 ~dx &&\text{norm}\\ E[{\,\cdot\,}] &= {\left\langle {{\,\cdot\,}},~{f} \right\rangle} &&\text{expectation}\\ (\tau_{p}f)(x) &= f(p- x) &&\text{translation}\\ (f \ast g)(p) &= \int_{\mathbb{R}}f(t)g(p-t)~dt = \int_{\mathbb{R}}f(t)(T_{p}g)(t) ~dt = {\left\langle {T_pg},~{f} \right\rangle} &&\text{convolution}\\ \end{align*}\]

For \((\Sigma, E, \mu)\) a probability space with sample space \(\Sigma\) and probability measure \(\mu\), a random variable is a function \(X: \Sigma \to {\mathbb{R}}\)

For any \(U \subset {\mathbb{R}}\), given by the relation \[\begin{align*} P(X \in U) = \int_U f(x) ~dx \\ \implies P(a \leq X \leq b) = \int_a^b f(x) ~dx \end{align*}\]

The antiderivative of the PDF \[\begin{align*} F(x) = P(X \leq x) = \int_{-\infty}^x f(x) ~dx .\end{align*}\] Yields \({\frac{\partial F}{\partial x}\,} = f(x)\)

\[\begin{align*} E[X] \mathrel{\vcenter{:}}={\left\langle {\text{id}},~{f} \right\rangle} = \int_{\mathbb{R}}x f(x) ~dx .\end{align*}\] Also denoted \(\mu_X\).

\[\begin{align*} E\left[\sum_{i\in{\mathbb{N}}} a_i X_i\right] = \sum_{i\in{\mathbb{N}}} a_i E[X_i] .\end{align*}\] Does not matter whether or not the \(X_i\) are independent.

\[\begin{align*} \mathrm{Var}(X) &= E[(X - E[X])^2] \\ &= \int (x - E[X])^2 f(x) ~dx \\ &= E[X^2] - E[X]^2 \\ &\mathrel{\vcenter{:}}=\sigma^2(X) \end{align*}\] where \(\sigma\) is the standard deviation. Can also defined as \({\left\langle {(\text{id}- {\left\langle {\text{id}},~{f} \right\rangle})^2},~{f} \right\rangle}\) Take the portion of the id function in the orthogonal complement of \(f\), squared, and project it back onto \(f\)?

\[\begin{align*} \mathrm{Var}(aX + b) &= a^2\mathrm{Var}(X) \\ \mathrm{Var}\qty{ \sum_{\mathbb{N}}X_i } &= \sum_i \mathrm{Var}(X_i) + 2 \sum_{i < j}\mathrm{Cov}(X_i, X_j) .\end{align*}\]

\[\begin{align*} \mathrm{Cov}(X,Y) &= E[(X-\mu_X)(Y-\mu_Y)] \\ &= E[XY] - E[X]E[Y] \end{align*}\]

\[\begin{align*} \mathrm{Cov}(X, X) &= \mathrm{Var}(X) \\ \mathrm{Cov}(aX, Y) &= a\mathrm{Cov}(X,Y) \\ \mathrm{Cov}(\sum_{{\mathbb{N}}} X_i, \sum_{\mathbb{N}}Y_j) &= \sum_i \sum_j\mathrm{Cov}(X_i, Y_j) \\ .\end{align*}\]

\[\begin{align*} k! \sim k^\frac{k+1}{2}e^{-k} \sqrt{2\pi} .\end{align*}\]

\[\begin{align*} P(X \geq a) \leq \frac 1 a E[X] \end{align*}\]

One-sided Markov: \[\begin{align*} P(X \in N_\varepsilon(\mu)) = 2\frac{\sigma^2}{\sigma^2 + a^2} .\end{align*}\]

\[\begin{align*} P({\left\lvert {X - \mu} \right\rvert} \geq a) \leq \left( \frac \sigma k \right)^2 \end{align*}\]

Apply Markov to the variable \((X-\mu)^2\) and \(a=k^2\)

For \(X_i\) i.i.d., \[\begin{align*} \lim_n \frac{\sum_{i=1}^n X_i - n\mu}{\sigma \sqrt n} \sim N(0, 1) .\end{align*}\]

\[\begin{align*} P(\frac{1}{n} \sum X_i \rightarrow \mu) = 1 .\end{align*}\]

For all \(t > 0\), \[\begin{align*} P(X \in N_\varepsilon(a)^c) \leq 2 e^{-at}M_X(t) \\ .\end{align*}\]

\[\begin{align*} E[f(X)] \geq f(E[X]) \end{align*}\]

\[\begin{align*} H(X) = - \sum p_i \ln p_i \end{align*}\]

Theory and Background

Given a sample space \(\Sigma\) with events \(S\), 1. \(\mu(\Sigma) = 1\) 1. Yields \(S \in \Sigma \implies 0 \leq P(S) \leq 1\) 2. For mutually exclusive events, \(P(\cup_{\mathbb{N}}S_i) = \sum_{\mathbb{N}}P(S_i)\) 1. Yields \(P(\emptyset) = 0\)

\[\begin{align*} P(F)P(E \mathrel{\Big|}F) = P(E \cap F) = P(E)P(F \mathrel{\Big|}E) \end{align*}\] Generalization: \[\begin{align*} P(\cap_{\mathbb{N}}E_i) = P(E_1) P(E_2 \mathrel{\Big|}E_1)P(E_3\mathrel{\Big|}E_1 \cap E_2) \cdots \end{align*}\]

\[\begin{align*} P(E) = P(F)P(E \mathrel{\Big|}F) + P(F^c)P(E \mathrel{\Big|}F^c) \\ P(E) = \sum_i P(A_i) P(E \mathrel{\Big|}A_i) \end{align*}\] Generalization: for \(\coprod_{i=1}^n A_i = \Sigma\) and \(A=A_i\) for some \(i\), \[\begin{align*} P(A \mathrel{\Big|}B) = \frac{P(A)P(B\mathrel{\Big|}A)}{\sum_{j = 1}^n P(B \mathrel{\Big|}A_j)} .\end{align*}\] The LHS: the posterior probability, while \(P(A_i)\) are the priors.

\[\begin{align*} P(A) / P(A^c) \end{align*}\] Conditional odds: \[\begin{align*} \frac{P(A \mathrel{\Big|}E)}{P(A^c \mathrel{\Big|}E)} = \frac{P(A)}{P(A^c)} \frac{P(E \mathrel{\Big|}A)}{P(E \mathrel{\Big|}A^c)} \end{align*}\]

\[\begin{align*} P(A \cap B) = P(A) P(B) \end{align*}\]

If \(g\) is differentiable and monotonic and \(Y=g(X)\), then \[\begin{align*} f_Y(y) = \begin{cases} (f_X \circ g^{-1})(y) {\left\lvert {{\frac{\partial }{\partial y}\,} g^{-1}(y)} \right\rvert} & y \in \operatorname{im}(g) \\ 0 & \text{else} \end{cases} \end{align*}\]

\[\begin{align*} f_{X+Y} = (F_X \ast f_y) \end{align*}\]

Distributions

Let \(X\) be a random variable, and \(f\) be its probability density function satisfying \(f(k) = P(X = k)\)

Uniform

Bernoulli

\[\begin{align*} f(k) &= \begin{cases} 1-p, & k = 0 \\ p, & k = 1 \end{cases} \\ \mu &= \quad p \\ \sigma^2 &= \quad p(1-p) \end{align*}\] - Examples: - A weighted coin with \(P(\text{Heads}) = p\)

Binomial

Poisson

Negative Binomial

Geometric

Hypergeometric

Normal

\[\begin{align*} f(x) = \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \end{align*}\]

\(z\) \(\Phi(z)\)
\(0\) \(0.5\)
\(1\) \(0.69\)
\(1.5\) \(0.84\)
\(2\) \(0.93\)
\(2.5\) \(0.97\)
\(>3\) \(0.99\)

Table of Distributions

Table: let \(q = 1-p\).

\[\begin{align*} \text{Distribution} & f(x) & & \mu & \sigma^2 & M(t) \\ \hline \\ B(n, p) & {n\choose x}p^x q^{n-x} & & np & npq & (pe^t + q)^n \\ P(\lambda) & \frac{\lambda^x}{x!}e^{-\lambda} & & \lambda & \lambda & e^{\lambda(e^t-1)} \\ G(p) & q^{x-1}p & & \frac{1}{p} & \frac{q}{p^2} & \frac{pe^t}{1-qe^t} \\ B^-(r, p) & {n-1 \choose r-1}p^rq^{n-r} & & \frac{r}{p} & \frac{rq}{p^2} & \left(\frac{pe^t}{1-qe^t}\right)^r \\ U(a, b) & \indic{a\leq x\leq b}\frac 1 {b-a} & & \frac{1}{2}(a+b) & \frac{1}{12}(b-a)^2 & \frac{e^{tb} - e^{ta}}{t(b-a)} \\ Exp(\lambda) & \indic{0 \leq x}\lambda e^{-\lambda x} & & \frac 1 \lambda & \frac 1 {\lambda^2} & \frac \lambda {\lambda - t} \\ \Gamma(s, \lambda) & \indic{0 \leq x} \frac{\lambda e^{-\lambda x} (\lambda x)^{s-1}}{\Gamma(s)} & & \frac s \lambda & \frac s {\lambda^2} & \left( \frac{\lambda}{\lambda - t} \right)^s \\ N(\mu, \sigma^2) & \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} & & \mu & \sigma^2 & e^{\mu t + \frac{1}{2}\sigma^2 t^2} \end{align*}\]

Common Problems

Notes, Shortcuts, Misc

\[\begin{align*} \Gamma(x+1) = \int_{{\mathbb{R}}^{>0}} e^{-t} t^x ~dt .\end{align*}\] Integrate by parts to obtain functional relation \(\Gamma(x+1) = x\Gamma(x)\)

\[\begin{align*} P(\cup_{\mathbb{N}}A_i) \leq \sum_{\mathbb{N}}P(A_i) \end{align*}\]

Define \[\begin{align*} M(t) = E[e^{Xt}] \end{align*}\]

Combinatorics

Notation

\[\begin{align*} S_n & = \left\{{1,2,\ldots n}\right\} & & \text{the symmetric group} \\ {n\choose k} & = \frac{n!}{k!(n-k)!} & & \text{binomial coefficient}\\ n^{\underline k} & = n(n-1) \cdots (n-k+1) = k!{n\choose k} & & \text{falling factorial}\\ n^{\overline k} & = n(n+1) \cdots (n+k-1) = k!{n + n - 1 \choose n} & & \text{rising factorial}\\ {n \choose {m_1, m_2, \cdots m_k}} & = \frac{n!}{\prod_{i=1}^k m_i!} & & \text{multinomial coefficient} \end{align*}\]

Note that the rising and falling factorials always have exactly \(k\) terms.

Multinomial: A set of \(n\) items divided into \(k\) distinct, disjoint subsets of sizes \(m_1 \cdots m_k\). Multinomial theorem: \[\begin{align*} (x_1 + x_2 + \cdots x_k )^n = \sum_{\substack{m_1, m_2, \cdots, m_k \\ ~~~\sum m_i = n}} {n \choose m_1,m_2,\cdots, m_k} x_1^{m_1}x_2^{m_2}\cdots x_k^{m_k} \end{align*}\] which contains \(n + r - 1 \choose r - 1\) terms.

Inclusion-Exclusion: \[\begin{align*} {\left\lvert {\cup_{i=1}^n A_i} \right\rvert} = \sum_i {\left\lvert {A} \right\rvert}_i - \sum_{i_1 < i_2}^{n\choose 2^2} {\left\lvert {A_{i_1} \cap A_{i_2}} \right\rvert} + \sum_{i_1 < i_2 < i_3}^{n \choose 2^3} {\left\lvert {A_{i_1} \cap A_{i_2} \cap A_{i_3}} \right\rvert} + \cdots +(-1)^{n+1} {\left\lvert {\cap_{i=1}^n A_i} \right\rvert} \\ = \sum_{k=1}^n ~\sum_{i_1 < \cdots < i_k} (-1)^{k+1} {\left\lvert {\cap_{j=1}^k A_{i_j}} \right\rvert} \end{align*}\]

The Important Numbers

Common Problems

Coupon Collectors Problem

The Twelvefold Way

Consider a function \(f: N \to K\) where \({\left\lvert {N} \right\rvert}=n, {\left\lvert {K} \right\rvert} = k\).

A number of valid interpretations: - \(f\) labels elements of \(N\) by elements of \(K\) - For each element of \(N\), \(f\) chooses an element of \(K\) - \(f\) partitions \(N\) into classes that are mapped to the same element of \(K\) - Throw each of \(N\) balls into some of \(K\) boxes

Dictionary: - No restrictions: - Assign \(n\) labels, repetition allowed - Form a multiset of \(K\) of size \(n\) - Injectivity - Assign \(n\) labels without repetition - Select \(n\) distinct elements from \(K\) - Number of \(n{\hbox{-}}\)combinations of \(k\) elements - No more than one ball per box - Surjectivity: - Use every label at least once - Every element of \(K\) is selected at least once - “Indistinguishable” - Quotient by the action of \(S_n\) or \(S_k\) - \(n{\hbox{-}}\)permutations = injective functions - \(n{\hbox{-}}\)combinations = injective functions / \(S_n\) - \(n{\hbox{-}}\)multisets = all functions / \(S^n\) - Partitions of \(N\) into \(k\) subsets = surjective functions / \(S_k\) - Compositions of \(n\) into \(k\) parts = surjective functions / \(S_n\)

\[\begin{align*} \begin{array}{c|c|c|c} \text{Permutations \ Restrictions} & N \xrightarrow{f} K & N \hookrightarrow K & N \twoheadrightarrow K \\ \hline f & k^n & k^{\underline{n}} & k! \genfrac\{\}{0pt}{}{n}{k} \\ f \circ \sigma_N & {n+k-1}\choose n & k\choose n & {n-1}\choose{n-k} \\ \sigma_X \circ f & \sum_{i=0}^k \genfrac\{\}{0pt}{}{n}{i} & \indic{n \leq k} & \genfrac\{\}{0pt}{}{n}{k}\\ \sigma_X \circ f \circ \sigma_N & p_k(n+k) & \indic{n \leq k} & p_k(n) \end{array} \end{align*}\]

In words (todo: explain)

Perm. / Rest. Injective Surjective
A sequence of any \(n\) elements of \(K\) Sequences of \(n\) distinct elements of \(K\) Compositions of \(N\) with exactly \(k\) subsets
Permutations of \(N\) Multisets of \(K\) with \(n\) elements An \(n{\hbox{-}}\)element subset of \(K\) Compositions of \(n\) with \(k\) terms
Permutations of \(X\) Partitions of \(N\) into \(\leq k\) subsets ? Partitions of \(N\) into exactly \(k\) nonempty subsets
Both Partitions of \(n\) into \(\leq k\) parts ? Partitions of \(n\) into exactly \(k\) parts

Proofs/Explanations: todo

Complex Analysis

Notation: \(z = a + ib, f(z) = u(x, y) + iv(x, y)\)

Useful Equations and Definitions

\[\begin{align*} {\left\lvert {z} \right\rvert} &= \sqrt{a^2 + b^2} \\ {\left\lvert {z} \right\rvert}^2 = z{\overline{{z}}} &= a^2 + b^2 \\ \frac{z{\overline{{z}}}}{{\left\lvert {z} \right\rvert}^2} &= \frac{(a+ib)(a-ib)}{a^2 + b^2} = 1 \\ \frac{1}{z} &= \frac{{\overline{{z}}}}{{\left\lvert {z} \right\rvert}^2} = \frac{a-ib}{a^2+b^2} \\ e^{zx} = e^{(a+ib)x} &= e^{ax}(\cos(bx) + i\sin(bx)) \\ x^z &\mathrel{\vcenter{:}}= e^{z\ln x} \\ \mathrm{Log}(z) &= \ln{\left\lvert {z} \right\rvert} + i~\mathrm{Arg}(z) \\ \cos z &= \frac{1}{2}(e^{iz} + e^{-iz}) \\ \sin z &= \frac{1}{2i}(e^{iz} - e^{-iz}) \\ (x-z)(x-{\overline{{z}}}) &= x^2 - 2{\mathcal{Re}({z})}x + (a^2+b^2) \\ \frac { \partial } { \partial z } &= \frac { 1 } { 2 } \left( \frac { \partial } { \partial x } - i \frac { \partial } { \partial y } \right) \\ \frac { \partial } { \partial \overline { z } } &= \frac { 1 } { 2 } \left( \frac { \partial } { \partial x } + i \frac { \partial } { \partial y } \right) \end{align*}\]

Complex Arithmetic and Calculus

Complex Differentiability

\[\begin{align*} z' = \lim_{h\to 0} \frac{f(z+h)-f(z)}{h} \end{align*}\] - A complex function that is not differentiable at a point: \(f(z) = z/\mkern 1.5mu\overline{\mkern-1.5muz\mkern-1.5mu}\mkern 1.5mu\) at \(z=0\)

Complex Integrals

The main theorem: \[\begin{align*} \oint_C f(z)~dz = 2\pi i \sum_k \mathrm{Res}(f, z_k) \end{align*}\]

Computing residues: \[\begin{align*} \operatorname { Res } ( f , c ) = \frac { 1 } { ( n - 1 ) ! } \lim _ { z \rightarrow c } \frac { d ^ { n - 1 } } { d z ^ { n - 1 } } \left( ( z - c ) ^ { n } f ( z ) \right) \\ f(z) = \frac{g(z)}{h(z)} \implies \operatorname { Res } ( f , c ) = \frac{g(c)}{h'(c)} \end{align*}\]

Definitions

Complex Analytic \(\implies\) smooth and all derivatives are analytic

Not true in real case, take the everywhere differentiable but not \(C^1\) function \[\begin{align*} f(x) = \begin{cases} -\frac{1}{2}x^2 & x < 0 \\ \frac{1}{2}x^2 & x \geq 0 \end{cases} \end{align*}\]

Definitions

In these notes, \(C\) generally denotes some closed contour, \(\mathbb{H}\) is the upper half-plane, \(C_R\) is a semicircle of radius \(R\) in \(\mathbb{H}\), \(f\) will denote a complex function.

  1. Analytic

    \(f\) is analytic at \(z_0\) if it can be expanded as a convergent power series in some neighborhood of \(z_0\).

  2. Holomorphic

    A function \(f\) is holomorphic at a point \(z_0\) if \(f'(z_0)\) exists in a neighborhood of \(z_0\).

    (Note - this is more than just being differentiable at a single point!)

    Big Theorem: \(f\) is a holomorphic complex function iff \(f\) is analytic.

  3. Meromorphic

    Holomorphic, except for possibly a finite number of singularities.

  4. Conformal

    \(f\) is conformal at \(z_0\) if \(f\) is analytic at \(z_0\) and \(f'(z_0) \neq 0\).

  5. Harmonic

    A function \(u(x,y)\) is harmonic if it satisfies Laplace’s equation, \[\begin{align*}\Delta u = u_{xx} + u_{yy} = 0\end{align*}\]

Some other notions to look up:

Preliminary Notions

What is the Complex Derivative?

In small neighborhoods, the derivative of a function at a point rotates it by an angle \(\Delta\theta\) and scales it by a real number \(\lambda\) according to \[\begin{align*}\Delta\theta = \arg f'(z_0), ~\lambda = |f'(z_0)|\end{align*}\]

\(n\)th roots of a complex number

The \(n\)th roots of \(z_0\) are given by writing \(z_0 = re^{i\theta}\), and are \[\begin{align*} \zeta = \left\{{ \sqrt[n]{r} \exp\qty{\left}[{i\left( \frac{\theta}{n} + \frac{2k\pi}{n}\right)}\right] ~{\text{s.t.}}~k = 0,1,2, \ldots, n-1 }\right\} \end{align*}\]

or equivalently

\[\begin{align*}\zeta = \left\{ \sqrt[n]{r}\omega_n^k \mathrel{\Big|}k = 0,1,2,\ldots, n-1\right\}~\text{where}~\omega_n = e^{\frac{2\pi i}{n}}\end{align*}\]

This can be derived by looking at \(\left( re^{i\theta + 2k\pi}\right)^{\frac{1}{n}}\).

It is also useful to immediately recognize that \(z^2+a = (z-i\sqrt{a})(z+i\sqrt{a})\).

The Cauchy-Riemann Equations

If \(f(x+iy) = u(x,y) + iv(x,y)\) or \(f(re^{i\theta}) = u(r,\theta) + iv(r,\theta)\), then \(f\) is complex differentiable if \(u,v\) satisfy

\[\begin{align*}\begin{aligned} u_x &= v_y &\quad u_y &= -v_x \\ r u_r &= v_\theta &\quad u_\theta &= -r v_r\end{aligned}\end{align*}\]

In this case, \[\begin{align*}f'(x+iy) = u_x(x,y) + iv_x(x,y)\end{align*}\] or in polar coordinates, \[\begin{align*}f'(re^{i\theta}) = e^{i\theta}(u_r(r,\theta) + iv_r(r,\theta))\end{align*}\]

Integration

The Residue Theorem

If \(f\) is meromorphic inside of a closed contour \(C\), then \[\begin{align*}\oint_C f(z) dz = 2\pi i \sum_{z_k} \underset{z=z_k}{\text{Res}} f(z)\end{align*}\]

where \(\underset{z=z_k}{\text{Res}} f(z)\) is the coefficient of \(z^{-1}\) in the Laurent expansion of \(f\).

If \(f\) is analytic everywhere in the interior of \(C\), then \(\oint_C f(z) dz = 0\).

If \(f\) is meromorphic inside of a contour \(C\) and analytic everywhere else, one can equivalently calculate the residue at infinity

\[\begin{align*}\oint_C f(z) dz = 2\pi i \sum_{z_k} \underset{z=0}{\text{Res}} ~z^{-2}f(z^{-1})\end{align*}\]

Computing Residues

Simple Poles

If \(z_0\) is a pole of order \(m\), define \(g(z) := (z-z_0)^m f(z)\).

If \(g(z)\) is analytic and \(g(z_0) \neq 0\), then \[\begin{align*}\underset{z=z_0}{\text{Res}} f(z) = \frac{\phi^{(m-1)}(z_0)}{(m-1)!}\end{align*}\]

In the case where \(m=1\), this reduces to \[\begin{align*}\underset{z=z_0}{\text{Res}} f(z) = \phi(z_0)\end{align*}\]

To compute residues this way, attempt to write \(f\) in the form

\[\begin{align*}f(z) = \frac{\phi(z)}{(z-z_0)^m}\end{align*}\]

where \(\phi\) only needs to be analytic at \(z_0\).

Rational Functions

If \(f(z) = \frac{p(z)}{q(z)}\) where

  1. \(p(z_0) \neq 0\)

  2. \(q(z_0) = 0\)

  3. \(q'(z_0) \neq 0\)

then the residue can be computed as

\[\begin{align*}\underset{z=z_0}{\text{Res}} \frac{p(z)}{q(z)} = \frac{p(z_0)}{q'(z_0)}\end{align*}\]

Computing Integrals

When computing real integrals, the following contours can be useful:

One often needs bounds, which can come from the following lemmas

The Arc Length Bound If \(|f(z)| \leq M\) everywhere on \(C\), then \[\begin{align*}|\oint_C f(z) dz | \leq M L_C\end{align*}\] where \(L_C\) is the length of \(C\).

Jordan’s Lemma: If \(f\) is analytic outside of a semicircle \(C_R\) and \(|f(z)| \leq M_R\) on \(C_R\) where \(M_R \rightarrow 0\), then \[\begin{align*}\int_{C_R} f(z) e^{iaz} dz \rightarrow 0\end{align*}\].

Can also be used for integrals of the form \(\int f(z) \cos az dz\) or \(\int f(z) \sin az dz\), just take real/imaginary parts of \(e^{iaz}\) respectively.

Conformal Maps

  1. Linear Fractional Transformations:

\[\begin{align*} f(z) = \frac{az+b}{cz+d}\qquad f^{-1}(z) = \frac{-dz+b}{cz-a} \end{align*}\]

  1. \([z_1, z_2, z_3] \mapsto [w_1, w_2, w_3]\)

    Every linear fractional transformation is determined by its action on three points. Given 3 pairs points \(z_i \mapsto w_i\), construct one using the implicit equation

\[\begin{align*} \frac{(w-w_1)(w_2-w_3)}{(w-w_3)(w_2-w_1)} = \frac{(z-z_1)(z_2-z_3)}{(z-z_3)(z_2-z_1)} \end{align*}\]

  1. \(z^k: \text{Wedge} \mapsto \mathbb{H}\)

    Just multiplies the angle by \(k\). If a wedge makes angle \(\theta\), use \(z^\frac{\pi}{\theta}\).

    It is useful to know that \(z\mapsto z^2\) is equivalent to \((x,y) \mapsto (x^2-y^2, 2xy)\).

  2. \(e^z: \mathbb{C} \mapsto \mathbb{C}\)

    Horizontal lines \(\mapsto\) rays from origin
    Vertical lines \(\mapsto\) circles at origin
    Rectangles \(\mapsto\) portions of wedges/sectors
    image
    image
    image
  3. \(\log: \mathbb{H} \mapsto \mathbb{R} + i[0, \pi]\)

    Just the inverse of what the exponential map does.

    Rays \(\mapsto\) Horizontal Lines
    Wedges \(\mapsto\) Horizontal Strips
    z \mapsto \log z
  4. \(\sin: [0, \pi/2] + i\mathbb{R} \mapsto \mathbb{H}_{\mathcal{R}(z)>0}\)

    Maps the infinite strip to the first quadrant.

    z \mapsfrom \sin w
  5. \(z\mapsto\frac{i-z}{i+z}: \mathbb{H} \mapsto D^\circ\).

    \(\mathbb{R}_{>0}\) \(\mapsto\) Upper half of \(D^\circ\)
    \(\mathbb{R}_{<0}\) \(\mapsto\) Bottom half of \(D^\circ\)

    Has inverse \(w \mapsto i\frac{1-w}{1+w}\)

  6. \(z\mapsto z + z^{-1}: \partial D \mapsto \mathbb{R}\)

    z \mapsto z+z^{-1}

    Maps the boundary of the circle to the real axis, and the plane to \(\mathbb{H}\).

Applications

It is mostly important to know that composing a harmonic function on one domain with an analytic function produces a new harmonic function on the new domain.

Similarly, composing the solution to a boundary value problem on a domain with a conformal map produces a new solution to a new boundary problem in the new domain, where the new boundary is given by the conformal image of the old one.

The general technique is use solutions to the boundary value problem on a simple domain \(D\), and compose one or several conformal maps to map a given problem into \(D\), then pull back the solution.

Heat Flow: Steady Temperatures

Generally interested in finding a harmonic function \(T(x,y)\) which represents the steady-state temperature at any point. Usually given as a Dirichlet problem on a domain \(D\) of the form

image

\[\begin{align*} \Delta T &= 0 \\ T(\partial D) &= f(\partial D) \end{align*}\]

where \(f\) is a given function that prescribes values on \(\partial D\), the boundary of \(D\).

Embed this in an analytic function with its harmonic conjugate to yield solutions of the form \(F(x+iy) = T(x,y) + iS(x,y)\).

The isotherms are given by \(T(x,y) = c\).

The lines of flow are given by \(S(x,y) = c\).

image

Any easy solution on the domain \(\mathbb{R} \times i[0,\pi]\) in the \(u,v\) plane, where

\[\begin{align*} T(x, 0) &= 0 \\ T(x, \pi) &= 1 \end{align*}\] is given by \(T(u,v) = \frac{1}{\pi}v\).

It is harmonic, as the imaginary part of the analytic \(F(u+iv) = \frac{1}{\pi}(u+iv)\), since every analytic function has harmonic component functions.

Similar methods work with different domains, just pick a smooth interpolation between the boundary conditions.

Fluid Flow

image

Write \(F(z) = \phi(x,y) + i\psi(x,y)\). Then \(F\) is the complex potential of the flow, \(\overline{F'}\) is the velocity, and setting \(\psi(x,y) = c\) yields the streamlines.

A solution in \(\mathbb{H}\) is \(F(z) = Az\) some some velocity \(A\). Apply conformal mapping appropriately.

image

Theorems

General Theorems

  1. Liouville’s Theorem:

    If \(f\) is entire and bounded on \(\mathbb{C}\), then \(f\) is constant.

  2. If \(f\) is continuous in a region \(D\), \(f\) is bounded in \(D\).

  3. If \(f\) is differentiable at \(z_0\), \(f\) is continuous at \(z_0\).

    Note - the converse need not hold!

  4. If \(f = u + iv\) , where \(u,v\) satisfy the Cauchy-Riemann equations and have continuous partials, then \(f\) is differentiable.

    Note - continuous partials are not enough, consider \(f(z) = |z|^2\).

  5. Rouché’s Theorem

    If \(p(z) = f(z) + g(z)\) and \(|g(z)| < |f(z)|\) everywhere on \(C\), then \(f\) and \(p\) have the same number of zeros with \(C\).

  6. The Argument Principle

    If \(f\) is analytic on a closed contour \(C\) and meromorphic within \(C\), then \[\begin{align*} W \mathrel{\vcenter{:}}=\frac{1}{2\pi}\Delta_C \arg f(z) = Z - P \end{align*}\]

    Proof: Evaluate the integral \(\oint_C \frac{f'(z)}{f(z)} dz\) first by parameterizing, changing to polar, and using the FTC, and second by using residues directly from the Laurent series.

  7. The Main Story: The following are equivalent

Theorems About Analytic Functions

  1. If \(f\) is analytic on \(D\), then \(\oint_C f(z) dz = 0\) for any closed contour \(C \subset D\).

    Note: this does not require \(f\) to be \(f'\) to be continuous on \(C\).

  2. Maximum Modulus Principle

    If \(f\) is analytic in a region \(D\) and not constant, then \(|f(z)|\) attains its maximum on \(\partial D\).

  3. If \(f\) is analytic, then \(f^{(n)}\) is analytic for every \(n\). If \(f = u(x,y) + iv(x,y)\), then all partials of \(u,v\) are continuous.

  4. If \(f\) is analytic at \(z_0\) and \(f'(z_0) \neq 0\), then \(f\) is conformal at \(z_0\).

  5. If \(f = u+iv\) is analytic, then \(u,v\) are harmonic conjugates.

  6. If \(f\) is holomorphic, \(f\) is \(C_\infty\) (smooth).

  7. If \(f\) is analytic, \(f\) is holomorphic.

    Proof: Since \(f\) has a power series expansion at \(z_0\), its derivative is given by the term-by-term differentiation of this series.

Some Useful Formulae

\[\begin{align*} f_{x_0}(x) = f(x_0) + f'(x_0)(x-x_0) + \frac{1}{2!}f''(x_0)(x-x_0)^2 + \ldots \end{align*}\]

\[\begin{align*} \frac{1}{1-z} = \sum_k z^k \end{align*}\]

\[\begin{align*} e^z = \sum_k \frac{1}{k!} z^k \end{align*}\]

\[\begin{align*} \left(\sum_i a_i z^i \right) \left( \sum_j b_j z^j\right) = \sum_n \left( \sum\limits_{i+j=n}a_ib_j \right) z^n \end{align*}\]

\[\begin{align*}\begin{aligned} % \cos z &= \frac{1}{2}(e^{iz} + e^{-iz}) & &= 1 - \frac{z^2}{2!} + \frac{z^4}{4!} - \ldots \\ % \cosh z &= \frac{1}{2}(e^{z} + e^{-z}) &= \cos iz &= 1 + \frac{z^2}{2!} + \frac{z^4}{4!} + \ldots \\ % \sin z &= \frac{1}{2i}(e^{iz} - e^{-iz}) & &= z - \frac{z^3}{3!} + \frac{z^4}{4!} - \ldots \\ % \sinh z &= \frac{1}{2}(e^{z} - e^{-z}) &= -i\sin iz &= z + \frac{z^3}{3!} + \frac{z^4}{4!} + \ldots \\\end{aligned}\end{align*}\] Mnemonic: just remember that cosine is an even function, and that the even terms of \(e^z\) are kept. Similarly, sine is an odd function, so keep the odd terms of \(e^z\).

Harmonic Conjugate \[\begin{align*}v(x,y) = \int_{(0,0)}^{(x,y)} -u_t(s,t)ds + u_s(s,t)dt\end{align*}\]

The Gamma Function \[\begin{align*}\Gamma(z) = \int_0^\infty x^{z-1} e^{-x} dx\end{align*}\]

Useful to know: \(\Gamma(\frac{1}{2}) = \sqrt\pi\).

Questions

  1. True or False: If \(f\) is analytic and bounded in \(\mathbb{H}\), then \(f\) is constant on \(\mathbb{H}\).

    False: Take \(f(z) = e^{-z}\), where \(|f(z)| \leq 1\) in \(\mathbb{H}\).

  2. Compute \(\int_{-\infty}^{\infty} \frac{\sin x}{x(x^2+a^2)}dx\)

    Two semicircles needed to avoid singularity at zero. Limit equals the residue at zero, solution is \(\pi (\frac{1}{a^2} - \frac{e^{-a}}{a^2})\).

  3. Compute \(\int_0^{2\pi} \frac{1}{2+\cos\theta}d\theta\)

    Cosine sub, solution is \(\frac{2\pi}{\sqrt{3}}\)

  4. Find the first three terms of the Laurent expansion of \(\frac{e^z+1}{e^z-1}\).

    Equals \(2z^{-1} + 0 + 6^{-1}z + \ldots\)

  5. Compute \(\int_{S_1} \frac{1}{z^2+z-1}dz\)

    Equals \(i\frac{2\pi}{5}\)

  6. True or false: If f is analytic on the unit disk \(E = \{z : |z| < 1\}\), then there exists an \(a \in E\) such that \(|f (a)| \geq |f (0)|\).

    True, by the maximum modulus principal. Suppose otherwise. Then \(f(0)\) is a maximum of \(f\) inside \(S_1\). But by the MMP, \(f\) must attain its maximum on \(\partial S_1\).

  7. Prove that if \(f(z)\) and \(f (\mkern 1.5mu\overline{\mkern-1.5muz\mkern-1.5mu}\mkern 1.5mu)\) are both analytic on a domain D, then f is constant on D

    Analytic \(\implies\) Cauchy-Riemann equations are satisfied. Also have the identity \(f' = u_x + iv_x\), and \(f' = 0\) \(\implies\) \(f\) is constant.

Common Mistakes

\[\begin{align*} -x^{-2} &\neq \int x^{-1} = \int \frac{1}{x} = \ln x \\ \frac{1}{x} &\neq \int \ln x = x\ln x - x \\ \int x^{-k} = \frac{1}{-k+1}x^{-k+1} &\neq \frac{1}{-(k+1)}x^{-(k+1)} \\ \text{ e.g. } \int x^{-2} = -x^{-1} &\neq -\frac{1}{3}x^{-3} \lim_{n\to\infty} \frac{n}{n+1} = 1 \neq 0\\ \frac{\partial}{\partial x}a^x = \frac{\partial}{\partial x}e^{x\ln a} = e^{x\ln a} \ln a = a^x \ln a. \end{align*}\]

Exponentials: when in doubt, write \(a^b = e^{b\ln a}\)

\[\begin{align*} \frac{\partial}{\partial x} x^{f(x)} = ? \end{align*}\]

\[\begin{align*} \sum x^k = \frac{1}{1-x} \neq \frac{1}{1+x} = \sum (-1)^k x^k \end{align*}\]

Appendix 1

Neat Tricks

\[\begin{align*} \frac{d}{dx} \int_{a(x)}^{b(x)} f(x,t) dt = f(x, b(x))\frac{d}{dx}b(x) - f(x, a(x))\frac{d}{dx}a(x) + \int_{a(x)}^{b(x)} \frac{\partial}{\partial x} f(x, t) dt \end{align*}\]

Big Derivative / Integral Table

\[\begin{align*} \frac{\partial f}{\partial{x}}\Leftarrow && f && \Rightarrow\int f dx \\ \hline \\ \frac{1}{2\sqrt{x}} && \sqrt{x} && \frac{2}{3}x^{\frac{3}{2}} \\ nx^{n-1} && x^n, n \neq -1 && \frac{1}{n+1}x^{n+1} \\ -nx^{-(n+1)} && \frac{1}{x^n}, n \neq 1 && -\frac{1}{n-1}x^{-(n-1)} \\ \frac{1}{x} && {\ln(x)} && x\ln(x) - x \\ a^x\ln(a) && a^x && \frac{a^x}{\ln a} \\ \cos(x) && \sin(x) && -\cos(x) \\ -\csc(x)\cot(x) && \csc(x) && \ln{\left\lvert {\csc(x)-\cot(x)} \right\rvert} \\ -\sin(x) && \cos(x) && \sin(x) \\ \sec(x)\tan(x) && \sec(x) && \ln{\left\lvert {\sec(x) + \tan(x)} \right\rvert} \\ \sec^2(x) && \tan(x) && \ln{\left\lvert {\frac{1}{\cos x}} \right\rvert} \\ -\csc^2(x) && \cot(x) && \ln {\left\lvert {\sin x} \right\rvert} \\ \frac{1}{1+x^2} && {\tan^{-1}(x)} && x\tan^{-1}x - \frac{1}{2}\ln(1+x^2) \\ \frac{1}{\sqrt{1-x^2}} && {\sin^{-1}(x)} && x\sin^{-1}x+ \sqrt{1-x^2} \\ -\frac{1}{\sqrt{1-x^2}} && {\cos^{-1}(x)} && x\cos^{-1}x -\sqrt{1-x^2} \\ \frac{1}{\sqrt{x^2+a}} && \ln{\left\lvert {x+\sqrt{x^2+a}} \right\rvert} && \cdot\\ 2\sin x\cos x && \sin^2(x) && \frac{1}{2}(x - \sin x \cos x) \\ -2\sin x\cos x && \cos^2(x) && \frac{1}{2}(x + \sin x \cos x) \\ 2\csc^2(x)\cot(x) && \csc^2(x) && -\cot(x) \\ 2\sec^2(x)\tan(x) && \sec^2(x) && \tan(x) \\ ? && ? && ? \\ ? && ? && ? \\ ? && ? && ? \\ ? && ? && ? \\ ? && ? && ? \\ ? && ? && ? \\ ? && ? && ? \\ (ax+1)e^{ax} && xe^{ax} && \frac { 1 } { a ^ { 2 } } ( a x - 1 ) e ^ { a x } \\ ? && e^{ax}\sin(bx) && \frac { 1 } { a ^ { 2 } + b ^ { 2 } } e ^ { a x } ( a \sin b x - b \cos b x ) \\ ? && e^{ax}\cos(bx) && \frac { 1 } { a ^ { 2 } + b ^ { 2 } } e ^ { a x } ( a \sin b x + b \cos b x ) \\ ? && ? && ? \\ \end{align*}\]

Useful Series and Sequences

Notation: \(\uparrow, \downarrow\): monotonically converges from below/above.

\[\begin{align*} f ( x ) = \sum _ { n = 0 } ^ { \infty } \frac { f ^ { ( n ) } \left( x _ { 0 } \right) } { n ! } \left( x - x _ { 0 } \right) ^ { n } \end{align*}\]

\[\begin{align*} \left( \sum_{k=0}^\infty a_k x^k \right)\left( \sum_{k=0}^\infty b_i x^n \right) = \sum_{k=0}^\infty \left( \sum_{i=0}^k a_{n} b_{n} \right)x^k \end{align*}\]

\[\begin{align*} \frac{\partial}{\partial x} \sum_{k=i}^\infty a_kx^k = \sum_{k=i+1}^\infty k\,a_k x^{k-1} \end{align*}\]

Partial Fraction Decomposition

Given \(R(x) = \frac{p(x)}{q(x)}\), factor \(q(x)\) into \(\prod q_i(x)\).

Properties of Norms

\[\begin{align*} {\left\lVert {t\mathbf x} \right\rVert} & = {\left\lvert {t} \right\rvert} {\left\lVert {\mathbf x} \right\rVert} \\ {\left\lvert {{\left\langle {\mathbf x},~{\mathbf y} \right\rangle}} \right\rvert} & \leq {\left\lVert {\mathbf x} \right\rVert} {\left\lVert {\mathbf y} \right\rVert} \\ {\left\lVert {\mathbf x+\mathbf y} \right\rVert} & \leq {\left\lVert {\mathbf x} \right\rVert} + {\left\lVert {\mathbf y} \right\rVert} \\ {\left\lVert {\mathbf x-\mathbf z} \right\rVert} & \leq {\left\lVert {\mathbf x-\mathbf y} \right\rVert} + {\left\lVert {\mathbf y-\mathbf z} \right\rVert} \end{align*}\]

Logic Identities

Set Identities

\[\begin{align*} A \cup B && = && A \cup(A^c \cap B) \\ A && = && (B\cap A) \cup(B^c \cap A) \\ (\cup_{\mathbb{N}}A_i)^c && = && \cap_{\mathbb{N}}A_i^c \\ (\cap_{\mathbb{N}}A_i)^c && = && \cup_{\mathbb{N}}A_i^c \\ A - B && = && A \cap B^c \\ (A-B)^c && = && A^c \cup B \\ (A\cup B) - C && = && (A-C) \cup (B-C) \\ (A\cap B) - C && = && (A-C) \cap (B-C) \\ A - (B \cup C) && = && (A - B) \cap (A - C) \\ A - (B \cap C) && = && (A-B) \cup (A-C) \\ A - (B - C) && = && (A-B) \cup (A \cap C) \\ (A-B) \cap C && = && (A \cap C) - B && = && A \cap (C-B) \\ (A-B) \cup C && = && (A \cup C) - (B-C) \\ A\cup(B\cap C) && = && (A\cup B) \cap (A\cup C) \\ A\cap(B\cup C) && = && (A\cap B) \cup (A \cap C) \\ A \subseteq C {\text{ and }}B \subseteq C &&\implies && A \cup B \subseteq C \\ C \subseteq A {\text{ and }}C \subseteq B &&\implies && C \subseteq A \cup B \\ A_k ~\text{countable} &&\implies && \prod_{k=1}^n A_k, ~ \cup_{k=1}^\infty A_k \quad\text{countable} \end{align*}\]

Preimage Identities

Summary

Preimage Equations

Image Equations

Equations Involving Both

Pascal’s Triangle:

\(n\) Sequence
3 \(1,2,1\)
4 \(1,3,3,1\)
5 \(1,4,6,4,1\)
6 \(1,5,10,10,5,1\)
7 \(1,6,15,20,15,16,1\)
8 \(1,7,21,35,35,21,7,1\)

Obtain new entries by adding in \(L\) pattern rotated by \(\pi\) (e.g. 7 = 1+6, 12 = 6 + 15, etc). Note that \(n\choose i\) is given by the entry in the \(n{\hbox{-}}\)th row, \(i{\hbox{-}}\)th column.

Table of Small Factorials

\(n\) \(n!\)
2 \(2\)
3 \(6\)
4 \(24\)
5 \(120\)
6 \(720\)
7 \(5040\)
8 \(40320\)
9 \(362880\)
10 \(3628800\)

\(\pi \approx 3.1415926535\) \(e \approx 2.71828\) \(\sqrt{2} \approx 1.4142135\)

Primes Under 100:

\[\begin{align*} & 2, 3, 5, 7, \\ & 11, 13, 17, 19, \\ & 23, 29, \\ & 31, 37, \\ & 41, 43, 47, \\ & 53, 59, \\ & 61, 67, \\ & 71, 73, 79, \\ & 83, 89, \\ & 97, \\ & 101 \end{align*}\]

Checking Divisibility by Small Numbers

Note that \(n\mod 10^k\) yields the last \(k\) digits. Let \(d_i\) denote the \(i{\hbox{-}}\)th digit of \(n\).

The recursive prime procedure (RPP): for each prime \(p\), there exists a \(k\) such recursive application of this procedure to \(n\) yields the same remainder mod \(p\) as \(n\) itself.

\(p\) \(p \mathrel{\Big|}n \iff\) Mnemonic
2 \(n \equiv 2, 4, 6, 8 \mod 10\) Last digit is even
3 \(\sum d_i \equiv 0 \mod 3\) 3 divides the sum of digits (apply recursively)
4 \(n \equiv 4k \mod 10^2\) Last two digits are divisible by 4
5 \(n \equiv 0, 5 \mod 10\) Last digit is 0 or 5
6 \(n \equiv 0 \mod 2 \text{ and } n \equiv 0 \mod 3\) Reduce to 2, 3 case
7 RPP, \(k=-2\) \(-20 \equiv 1 \mod 7 \implies 10x+y \equiv 10(x-2y) \mod 7\)
8 \(n \equiv 8k \mod 10^3\) Manually divide the last 3 digits by 8 (or peel off factors of 2)
9 \(\sum d_i \equiv 0 \mod 9\) 9 divides the sum of digits (apply recursively)
10 \(n \equiv 0 \mod 10\) Last digit is 0
11 \(\sum (-1)^i d_i \equiv 0 \mod 11\) or 11 divides alternating sum
13 RPP, \(k=4\) \(40 \equiv 1 \mod 13 \implies 10x + y \equiv 10(x + 4y) \mod 13\)
17 RPP, \(k=-5\) \(-50 \equiv 1 \mod 17 \implies 10x + y \equiv 10(x - 5y) \mod 19\)
19 RPP, \(k=2\) \(20 \equiv 1 \mod 19 \implies 10x + y \equiv 10(x + 2y) \mod 19\)

Hyperbolic Functions

\[\begin{align*} \cosh(x) & = \frac{1}{2}(e^x + e^{-x}) \\ \sinh(x) & = \frac{1}{2}(e^x - e^{-x}) \\ \cos(iz) & = \cosh z \\ \cosh(iz) & = \cos z \\ \sin(iz) & = \sinh z \\ \sinh(iz) & = \sin z \\ \sinh^{-1}x & = ? \quad = \ln(x + \sqrt{x^2+1}) \\ \cosh^{-1}x & = ? \quad = \ln(x + \sqrt{x^2-1}) \\ \tanh^{-1}x & = \frac{1}{2}\ln(\frac{1+x}{1-x}) \\ \end{align*}\]

Integral Tables

\[\begin{align*} \frac{\partial f}{\partial{x}}\Leftarrow & & f & & \Rightarrow\int f dx \\ \hline \\ \frac{1}{2\sqrt{x}} & & \sqrt{x} & & \frac{2}{3}x^{\frac{3}{2}} \\ nx^{n-1} & & x^n, n \neq -1 & & \frac{1}{n+1}x^{n+1} \\ \frac{1}{x} & & {\ln(x)} & & x\ln(x) - x \\ a^x\ln(a) & & a^x & & \frac{a^x}{\ln a} \\ \cos(x) & & \sin(x) & & -\cos(x) \\ -\sin(x) & & \cos(x) & & \sin(x) \\ 2\sec^2(x)\tan(x) & & \sec^2(x) & & \tan(x) \\ 2\csc^2(x)\cot(x) & & \csc^2(x) & & -\cot(x) \\ \sec^2(x) & & \tan(x) & & \ln{\left\lvert {\sec(x)} \right\rvert} \\ \sec(x)\tan(x) & & \sec(x) & & \ln{\left\lvert {\sec(x) + \tan(x)} \right\rvert} \\ -\csc(x)\cot(x) & & \csc(x) & & \ln{\left\lvert {\csc(x)-\cot(x)} \right\rvert} \\ \frac{1}{1+x^2} & & {\tan^{-1}(x)} & & x\tan^{-1}x - \frac{1}{2}\ln(1+x^2) \\ \frac{1}{\sqrt{1-x^2}} & & {\sin^{-1}(x)} & & x\sin^{-1}x+ \sqrt{1-x^2} \\ -\frac{1}{\sqrt{1-x^2}} & & {\cos^{-1}(x)} & & x\cos^{-1}x -\sqrt{1-x^2} \\ \frac{1}{\sqrt{x^2+a}} & & \ln{\left\lvert {x+\sqrt{x^2+a}} \right\rvert} & & \cdot\\ -\csc^2(x) & & \cot(x) & & ? \\ ? & & \cos^2(x) & & ? \\ ? & & \sin^2(x) & & ? \\ ? & & xe^{ax} & & \frac { 1 } { a ^ { 2 } } ( a x - 1 ) e ^ { a x } \\ ? & & e^{ax}\sin(bx) & & \frac { 1 } { a ^ { 2 } + b ^ { 2 } } e ^ { a x } ( a \sin b x - b \cos b x ) \\ ? & & e^{ax}\cos(bx) & & \frac { 1 } { a ^ { 2 } + b ^ { 2 } } e ^ { a x } ( a \sin b x + b \cos b x ) \\ ? & & ? & & ? \end{align*}\]

Definitions

\[\begin{align*} e^x = \lim_{n \to \infty} \left(1 + \frac{1}{n}\right)^n = \lim_{n \to \infty} \left( \frac{n+1}{n} \right)^n .\end{align*}\]

Set Theory

Calculus

Analysis

Linear Algebra

Convention: always over a field \(k\), and \(T: k^n \to k^m\) is a generic linear map (or \(m\times n\) matrix).

\[\begin{align*} \mathrm{cofactor}(A)_{i,j} = (-1)^{i+j} M_{i, j} \end{align*}\] where \(M_{i, j}\) is the minor obtained by deleting the \(i{\hbox{-}}\)th row and \(j{\hbox{-}}\)th column of \(A\).

Differential Equations

Algebra

Complex Analysis

Algebra

Be careful! \(\frac{\ln x}{\ln y} \neq \ln\frac{x}{y} = \ln x - \ln y\)

Geometry


  1. Note that the matrix appearing here is sometimes called the adjugate.↩︎

  2. For a derivation of this formula, see the section on least-squares approximations.↩︎