Note:
These are notes I took while studying for the Mathematics Subject GRE. There are likely a lot of errors and mistakes, please let me know if you find any!
Last updated: 2020-11-10
\[\begin{align*} \frac{\partial}{\partial x} \int_a^x f(t) dt = f(x) \\ \\ \end{align*}\]
\[\begin{align*} \frac{\partial}{\partial x} \int_{a(x)}^{b(x)} f(x, t) dt - \int_{a(x)}^{b(x)} \frac{\partial}{\partial x} f(x, t) dt &= f(x, t) \cdot \frac{\partial}{\partial x}(t) \bigg\rvert_{t=a(x)}^{t=b(x)} \\ \\ &= f(x, b(x))\cdot b'(x) - f(x, a(x))\cdot a'(x) \\ \\ \end{align*}\]
If \(f(x,t) = f(t)\) doesn’t depend on \(x\), then \({\frac{\partial f}{\partial x}\,} = 0\) and the second integral vanishes:
\[\begin{align*} \frac{\partial}{\partial x} \int_{a(x)}^{b(x)} f(t) dt &= f(b(x))\cdot b'(x) - f(a(x))\cdot a'(x) \end{align*}\]
Note that you can recover the original FTC by taking \[\begin{align*} a(x) &= c \\ b(x) &= x \\ f(x,t) &= f(t) .\end{align*}\]
\[\begin{align*} \frac{\partial}{\partial x} \int_{1}^{x} f(x, t) dt = \int_{1}^{x} \frac{\partial}{\partial x} f(x, t) dt + f(x, x) \end{align*}\]
Todo
\[\begin{align*} f \in C^0(I) &\implies \exists p\in I: f(b) - f(a) = f'(p)(b-a) \\ &\implies \exists p\in I: \int_a^b f(x)~dx = f(p)(b-a) .\end{align*}\]
If
\[\begin{align*} \lim_{x\to {\{\text{pt}\}}} f(x) = \lim_{x\to {\{\text{pt}\}}} g(x) \in \left\{{0, \pm \infty}\right\}, && \forall x \in I, g'(x) \neq 0, && \lim_{x\to{\{\text{pt}\}}} \frac{ f'(x)}{\ g'(x)} \text{ exists}, \\ \end{align*}\]
Then it is necessarily the case that \[\begin{align*} \lim _ { x \rightarrow {\{\text{pt}\}}} \frac { f ( x ) } { g ( x ) } = \lim _ { x \rightarrow {\{\text{pt}\}}} \frac { f ^ { \prime } ( x ) } { g ^ { \prime } ( x ) }. \end{align*}\]
Note that this includes the following indeterminate forms: \[\begin{align*} \frac{0}{0}, \quad \frac{\infty}{\infty}, \quad 0 \cdot \infty, \quad 0^{0}, \quad \infty^{0}, \quad 1^{\infty}, \quad \infty-\infty .\end{align*}\]
For \(0\cdot \infty\), can rewrite as \({0 \over {1\over \infty}} = {0\over 0},\) or alternatively \({\infty \over {1\over 0}} = {\infty \over \infty}.\)
For \(1^\infty, \infty^0,\) and \(0^0\), set \[\begin{align*} L \mathrel{\vcenter{:}}=\lim f^g \implies \ln L = \lim g \ln(f) \end{align*}\] to recover \(\infty\cdot 0, 0 \cdot \infty,\) or \(0\cdot 0\).
\[\begin{align*} T(a, x) &= \sum _ { n = 0 } ^ { \infty } \frac { f ^ { ( n ) } ( a ) } { n ! } ( x - a ) ^ { n } \\ &= f ( a ) + f'(a)( x - a ) + \frac { 1 } { 2 }f ^ { \prime \prime } ( a ) ( x - a ) ^ { 2 } \\ & \quad \quad + \frac { 1} { 6 } f ^ { \prime \prime \prime } ( a ) ( x - a ) ^ { 3 } + \frac{1}{24}f^{(4)}(a)(x-a)^4 + ~\cdots \end{align*}\] There is a bound on the error: \[\begin{align*} {\left\lvert {f(x) - T_k(a,x)} \right\rvert} \leq {\left\lvert {\frac{f^{(k+1)}(a)}{(k+1)!}} \right\rvert} \end{align*}\] where \(T_k(a, x) = \sum _ { n = 0 } ^ { k } \frac { f ^ { ( n ) } ( a ) } { n ! } ( x - a ) ^ { n }\) is the \(k\)th truncation.
Approximating change: \(\Delta y \approx f'(x) \Delta x\)
How to find \(\lim_{x\to a} f(x)\)in order of difficulty:
Plug in: if \(f\) is continuous, \(\lim_{x\to a} f(x) = f(a)\).
Check for indeterminate forms and apply L’Hopital’s Rule.
Algebraic rules
Squeeze theorem
Expand in Taylor series at \(a\)
Monotonic + bounded
One-sided limits: \(\lim_{x\to a^-} f(x) = \lim_{\varepsilon \to 0} f(a-\varepsilon)\)
Limits at zero or infinity: \[\begin{align*} \lim_{x\to\infty} f(x) = \lim_{\frac{1}{x}\to 0} f\qty{\frac{1}{x}} \text{ and } \lim_{x\to 0} f(x) = \lim_{x\to\infty} f\qty{1 \over x} \end{align*}\]
Be careful: limits may not exist!! Example \(:\lim_{x\to 0} \frac{1}{x} \neq 0\).
\[\begin{align*} f(x) = \frac{p(x)}{q(x)} = r(x) + \frac{s(x)}{t(x)} \sim r(x) \end{align*}\]
\[\begin{align*} {\frac{\partial }{\partial x}\,}(f\circ g) = (f' \circ g) \cdot g' \end{align*}\]
\[\begin{align*} {\frac{\partial }{\partial x}\,} f\cdot g =f'\cdot g + g' \cdot f \end{align*}\]
\[\begin{align*} {\frac{\partial }{\partial x}\,} \frac{f(x)}{g(x)} = \frac{f'g - g'f}{g^2} \end{align*}\]
Mnemonic: Low d-high minus high d-low
\[\begin{align*} {\frac{\partial f^{-1}}{\partial x}\,}(f(x_0)) = \left( {\frac{\partial f}{\partial x}\,} \right)^{-1}(x_0) = 1/f'(x_0) \end{align*}\]
\[\begin{align*} (f(x))' = f'(x)~dx, (f(y))' = f'(y)~dy \end{align*}\] - Often able to solve for \({\frac{\partial y}{\partial x}\,}\) this way.
General series of steps: want to know some unknown rate \(y_t\)
Example: ladder sliding down wall
\[\begin{align*} \mu_f = \frac{1}{b-a}\int_a^b f(t) dt \end{align*}\]
Apply MVT to \(F(x)\).
Area in polar coordinates: \[\begin{align*} A = \int_{r_1}^{r_2} \frac{1}{2}r^2(\theta) ~d\theta \end{align*}\]
\[\begin{align*} \text{Disks} && A = \int \pi r(t)^2 ~dt \\ \text{Cylinders} && A = \int 2\pi r(t)h(t) ~dt .\end{align*}\]
\[\begin{align*} L &= \int ~ds && ds = \sqrt{dx^2 + dy^2} \\ &= \int_{x_0}^{x_1}\sqrt{1 + {\frac{\partial y}{\partial x}\,}}~dx \\ &= \int_{y_0}^{y_1}\sqrt{{\frac{\partial x}{\partial y}\,} + 1}~dy \end{align*}\]
\[\begin{align*} SA = \int 2 \pi r(x) ~ds \end{align*}\]
Given a density \(\rho(\mathbf x)\) of an object \(R\), the \(x_i\) coordinate is given by \[\begin{align*} x_i = \frac {\displaystyle\int_R x_i\rho(x) ~dx} {\displaystyle\int_R \rho(x)~dx} \end{align*}\]
Given \(f(x)\), we want to find an antiderivative \(F(x) = \int f\) satisfying \(\frac{\partial}{\partial x}F(x) = f(x)\)
The standard form: \[\begin{align*} \int u dv = uv - \int v du \end{align*}\]
Derivatives | Integrals | Signs | Result | |
---|---|---|---|---|
\(u\) | \(v\) | NA | NA | |
\(u'\) | \(\int v\) | \(+\) | \(u\int v\) | |
\(u''\) | \(\int\int v\) | \(-\) | \(-u'\int\int v\) | |
\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
Fill out until one column is zero (alternate signs). Get the result column by multiplying diagonally, then sum down the column.
\[\begin{align*} \frac{\partial}{\partial x} \int_{a(x)}^{b(x)} f(x, t) dt - \int_{a(x)}^{b(x)} \frac{\partial}{\partial x} f(x, t) dt &= f(x, \cdot)\frac{\partial}{\partial x}(\cdot) \bigg\rvert_{a(x)}^{b(x)} \\ &= f(x, b(x))~b'(x) - f(x, a(x))~a'(x) \end{align*}\]
Let \(F(x)\) be an antiderivative and compute \(F'(x)\) using the chain rule.
LIPET: Log, Inverse trig, Polynomial, Exponential, Trig: generally let \(u\) be whichever one comes first.
The ridiculous trig sub: for any integrand containing only trig terms
Transforms any such integrand into a rational function of \(x\)
Let \(u = 2\tan^{-1}x, ~du = \frac{2}{x^2+1}\), then \[\begin{align*} \int_a^b f(x)~dx = \int_{\tan\frac{a}{2}}^{\tan\frac{b}{2}} f(u)~du \end{align*}\]
\[\begin{align*} \int_0^{\pi/2} \frac{1}{\sin \theta}~d\theta = 1/2 \end{align*}\]
Trig Formulas \[\begin{align*} \sin^2(x) && = && \frac{1}{2}(1-2\cos x) \\ && = && \\ && = && \\ && = && \\ && = && \\ \end{align*}\]
Products of trig functions
Other small but useful facts: \[\begin{align*} \int_0^{2\pi} \sin \theta~d\theta = \int_0^{2\pi} \cos \theta~d\theta = 0 .\end{align*}\]
Critical points: boundary points and wherever \(f'(x) = 0\)
Second derivative test:
Inflection points of \(h\) occur where the tangent of \(h'\) changes sign. (Note that this is where \(h'\) itself changes sign.)
Inverse function theorem: The slope of the inverse is reciprocal of the original slope
If two equations are equal at exactly one real point, they are tangent to each other there - therefore their derivatives are equal. Find the \(x\) that satisfies this; it can be used in the original equation.
Fundamental theorem of Calculus: If \[\begin{align*} \int f(x) dx = F(b) - F(a) \implies F'(x) = f(x) .\end{align*}\]
Min/maxing - either derivatives of Lagranage multipliers!
Distance from origin to plane: equation of a plane \[\begin{align*} P: ax+by+cz=d .\end{align*}\]
You can always just read off the normal vector \(\mathbf{n} = (a,b,c)\). So we have \(\mathbf{n}\mathbf{x} = d\).
Since \(\lambda \mathbf{n}\) is normal to \(P\) for all \(\lambda\), solve \(\mathbf{n}\lambda \mathbf{n} = d\), which is \(\lambda = \frac{d}{ {\left\lVert {\mathbf{n}} \right\rVert}^2}\)
A plane can be constructed from a point \(p\) and a normal \(n\) by the equation \(np = 0\).
In a sine wave \(f(x) = \sin(\omega x)\), the period is given by \(2\pi/\omega\). If \(\omega > 1\), then the wave makes exactly \(\omega\) full oscillations in the interval \([0, 2\pi]\).
The directional derivative is the gradient dotted against a unit vector in the direction of interest
Related rates problems can often be solved via implicit differentiation of some constraint function
The second derivative of a parametric equation is not exactly what you’d intuitively think!
For the love of god, remember the FTC! \[\begin{align*} \frac{\partial}{\partial x} \int_0^x f(y) dy = f(x) \end{align*}\]
Technique for asymptotic inequalities: WTS \(f < g\), so show \(f(x_0) < g(x_0)\) at a point and then show \(\forall x > x_0, f'(x) < g'(x)\). Good for big-O style problems too.
Inflection points of \(h\) occur where the tangent of \(h'\) changes sign. (Note that this is where \(h'\) itself changes sign.)
Inverse function theorem: The slope of the inverse is reciprocal of the original slope
If two equations are equal at exactly one real point, they are tangent to each other there - therefore their derivatives are equal. Find the \(x\) that satisfies this; it can be used in the original equation.
Fundamental theorem of Calculus: If \[\begin{align*} \int f(x) dx = F(b) - F(a) \implies F'(x) = f(x) .\end{align*}\]
Min/maxing - either derivatives of Lagranage multipliers!
Distance from origin to plane: equation of a plane \[\begin{align*} P: ax+by+cz=d .\end{align*}\]
Notation: \[\begin{align*} \mathbf{v}, \mathbf{a}, \cdots && \text{vectors in }{\mathbb{R}}^n \\ \mathbf{R}, \mathbf{A}, \cdots && \text{matrices} \\ \mathbf{r}(t) && \text{A parameterized curve }\mathbf{r}: {\mathbb{R}}\to {\mathbb{R}}^n \\ \\ \widehat{\mathbf{v}} && {\mathbf{v} \over {\left\lVert {\mathbf{v}} \right\rVert}} .\end{align*}\]
\[\begin{align*} \mathbf{v} = [x, y] \in {\mathbb{R}}^2 \implies m = \frac{y}{x} .\end{align*}\]
\[\begin{align*} \mathbf{R}_\theta = \left[ \begin{array} { l l } { \cos \theta } & { - \sin \theta } \\ { \sin \theta } & { \cos \theta } \end{array} \right] \implies \mathbf{R}_{\frac{\pi}{2}} = \left[ \begin{array} { l l } { 0 } & { - 1 } \\ { 1 } & { 0 } \end{array}\right] .\end{align*}\]
\[\begin{align*} \mathbf{R}_{\frac{\pi}{2}} \mathbf{x} \mathrel{\vcenter{:}}= \mathbf{R}_{\frac{\pi}{2}} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} -y \\ x \end{bmatrix} \in \mathbf{\mathbb{R}}\mathbf{x}^\perp .\end{align*}\] Thus if a planar line is defined by the span of \({\left[ {x, y} \right]}\) and a slope of \(m = y/x\), a normal vector is given by the span of \({\left[ {-y, x} \right]}\) of slope \(-{1 \over m} = -x/y\).
Given \(\mathbf{v}\), the rotated vector \(\mathbf{R}_{\frac{\pi}{2}} \mathbf{v}\) is orthogonal to \(\mathbf{v}\), so this can be used to obtain normals and other orthogonal vectors in the plane.
There is a direct way to come up with one orthogonal vector to any given vector: \[\begin{align*} \mathbf{v} = [a,b,c] \implies \mathbf{y} \mathrel{\vcenter{:}}= \begin{cases} \mathbf{[}-(b+c), a, a] & \mathbf{v} = [-1,-1,0], \\ [c,c, -(a+b)] & \text{else} \end{cases} \in {\mathbb{R}}\mathbf{v}^\perp .\end{align*}\]
For a subspace given by a single vector \(\mathbf{a}\): \[\begin{align*} \mathrm{proj}_\mathbf{a}( \textcolor{Aquamarine}{\textbf{x}} ) = {\left\langle {\textcolor{Aquamarine}{\textbf{x}} },~{\mathbf{a}} \right\rangle}\mathbf{\widehat{a}} \hspace{8em} \mathrm{proj}_{\mathbf{a}}^\perp(\textcolor{Aquamarine}{\textbf{x}}) = \textcolor{Aquamarine}{\textbf{x}} - \mathrm{proj}_\mathbf{a}(\textcolor{Aquamarine}{\textbf{x}}) = \textcolor{Aquamarine}{\textbf{x}} - {\left\langle {\textcolor{Aquamarine}{\textbf{x}}},~{\mathbf{a}} \right\rangle}\widehat{\mathbf{a}} \end{align*}\]
In general, for a subspace \(\operatorname{colspace}(A) = \left\{{\mathbf{a}_1, \cdots \mathbf{a}_n}\right\}\),
\[\begin{align*} \mathrm{proj}_A(\textcolor{Aquamarine}{\textbf{x}}) = \sum_{i=1}^n {\left\langle {\textcolor{Aquamarine}{\textbf{x}} },~{\mathbf{a}_i} \right\rangle}\mathbf{\widehat{a}_i} = A(A^T A)^{-1}A^T\textcolor{Aquamarine}{\textbf{x}} \end{align*}\]
\[\begin{align*} \text{General Equation} && Ax + By + C = 0 \\ \\ \text{Parametric Equation} && \mathbf{r}(t) = t\mathbf{x} + \mathbf{b} .\end{align*}\]
Characterized by an equation in inner products: \[\begin{align*} \mathbf{y} \in L \iff {\left\langle {\mathbf{y}},~{\mathbf{n}} \right\rangle} = 0 \end{align*}\]
Given \(\mathbf{p}_0, \mathbf{p}_1\), take \(\mathbf{x} = \mathbf{p}_1 - \mathbf{p}_0\) and \(\mathbf{b} = \mathbf{p}_i\) for either \(i\): \[\begin{align*} \mathbf{r}(t) = t(\mathbf{p}_1 - \mathbf{p}_0) + \mathbf{p}_0 && = t\mathbf{p}_1 + (1-t) \mathbf{p}_0 .\end{align*}\]
If a line \(L\) is given by \[\begin{align*} \mathbf{r}(t) = t {\left[ {x_1, x_2, x_3} \right]} + {\left[ {p_1, p_2, p_3} \right]} ,\end{align*}\] then \[\begin{align*} (x, y, z) \in L \iff \frac{x-p_1}{x_1} = \frac{y-p_{2}}{x_2} = \frac{z-p_{3}}{x_3} .\end{align*}\]
The symmetric equation of the line through \({\left[ {2,1,-3} \right]}\) and \({\left[ {1,4,-3} \right]}\) is given by \[\begin{align*} \frac{x-2}{1}=\frac{y+1}{-5}=\frac{z-3}{6} .\end{align*}\]
Key idea: just need a point and a normal vector, and the gradient is normal to level sets.
For any locus \(f(\mathbf{x}) = 0\), we have \[\begin{align*} \mathbf{x} \in T_f(\mathbf{p}) \implies {\left\langle {\nabla f(\mathbf{p})},~{\mathbf{x}-\mathbf{p}} \right\rangle} = 0 .\end{align*}\]
Key idea: the gradient is normal.
To find a normal line, you just need a single point \(\mathbf{p}\) and a normal vector \(\mathbf{n}\); then \[\begin{align*} L = \left\{{\mathbf{x} \mathrel{\Big|}\mathbf{x} = \mathbf{p} + t\mathbf{v}}\right\} .\end{align*}\]
\[\begin{align*} \text{General Equation} && A x + B y + C z + D = 0 \\ \\ \text{Parametric Equation} &&\mathbf{y}(t,s) = t\mathbf{x}_1 + s\mathbf{x}_2 + \mathbf{b} \\ \\ .\end{align*}\]
Characterized by an equation in inner products: \[\begin{align*} \mathbf{y} \in P \iff {\left\langle {\mathbf{y} - \mathbf{p}_0},~{\mathbf{n}} \right\rangle} = 0 \end{align*}\]
Determined by a point \(\mathbf{p}_0\) and a normal vector \(\mathbf{n}\)
Given \(\mathbf{v}_0, \mathbf{v}_1\), set \(\mathbf{n} = \mathbf{v}_0 \times\mathbf{v}_1\).
\[\begin{align*} \mathbf{r}(t) = [x(t), y(t), z(t)] .\end{align*}\]
We have an equation for the tangent vector at each point: \[\begin{align*} \widehat{ \mathbf{T} }(t) = \widehat{\mathbf{r}'}(t) ,\end{align*}\] so we can write \[\begin{align*} \mathbf{L}_{T}(t) = \mathbf{r}(t_0) + t \widehat{ \mathbf{T}}(t_0) \mathrel{\vcenter{:}}=\mathbf{r}(t_0) + t \widehat{\mathbf{r}'}(t_0) .\end{align*}\]
Suppose \(y = f(x)\). Set \(g(x, y) = f(x) - y\), then \[\begin{align*} \nabla g = [f_x(x), -1]\implies m = -\frac{1}{f_x(x)} \end{align*}\]
Fix a point \(\mathbf{p}\). Key idea: find a subspace and project onto it.
Key equations: projection and orthogonal projection of \(\mathbf{b}\) onto \(\mathbf{a}\): \[\begin{align*} \mathrm{proj}_\mathbf{a}(\mathbf{b}) = {\left\langle {\mathbf{b}},~{\mathbf{a}} \right\rangle}\mathbf{\widehat{a}} \hspace{8em} \mathrm{proj}_{\mathbf{a}}^\perp(\mathbf{b}) = \mathbf{b} - \mathrm{proj}_\mathbf{a}(\mathbf{b}) = \mathbf{b} - {\left\langle {\mathbf{b}},~{\mathbf{a}} \right\rangle}\widehat{\mathbf{a}} \end{align*}\]
\[\begin{align*} d = {\left\lVert {\mathrm{proj}_{\mathbf{n}}(\mathbf{q} - \mathbf{p})} \right\rVert} = {\left\lVert {{\left\langle {\mathbf{q} - \mathbf{p}},~{\mathbf{n}} \right\rangle} \widehat{\mathbf{n}}} \right\rVert} = {\left\langle {\mathbf{q} - \mathbf{p}},~{\mathbf{n}} \right\rangle} .\end{align*}\]
Special case: if \(\mathbf{p} = \mathbf{0}\), \[\begin{align*} d = {\left\lVert {\mathrm{proj}_{\mathbf{n}}(\mathbf{q})} \right\rVert} = {\left\lVert {{\left\langle {\mathbf{p}},~{\mathbf{n}} \right\rangle} \widehat{\mathbf{n}}} \right\rVert} = {\left\langle {\mathbf{p}},~{\mathbf{n}} \right\rangle}. .\end{align*}\]
Given a line \(L: \mathbf{x}(t) = t\mathbf{v}\) for some fixed \(\mathbf{v}\), use \[\begin{align*} d = {\left\lVert {\mathrm{proj}_\mathbf{v}^^\perp(\mathbf{p})} \right\rVert} = {\left\lVert {\mathbf{p} - {\left\langle {\mathbf{p}},~{\mathbf{v}} \right\rangle}\widehat{\mathbf{v} }} \right\rVert} .\end{align*}\]
Given a line \(L: \mathbf{x}(t) = \mathbf{w}_0 + t\mathbf{w}\), let \(\mathbf{v} = \mathbf{x}(1) - \mathbf{x}(0)\) and proceed as above.
Given \(\mathbf{r}_1(t) = \mathbf{p}_1 + t\mathbf{v}_2\) and \(\mathbf{r}_2(t) = \mathbf{p}_2 + t\mathbf{v}_2\), let \(d\) be the desired distance.
Let \(\widehat{ \mathbf{n}} = \widehat{\mathbf{v}_1 \times \mathbf{v}_2}\), which is orthogonal to both lines.
Then project the vector connecting the two fixed points \(\mathbf{p}_i\) onto this subspace and take its norm: \[\begin{align*} d &= {\left\lVert {\mathrm{proj}_{\mathbf{n}}(\mathbf{p}_2 - \mathbf{p}_1)} \right\rVert} \\ &= {\left\lVert {{\left\langle {\mathbf{p}_2 -\mathbf{p}_1},~{\mathbf{n}} \right\rangle}\widehat{\mathbf{n}}} \right\rVert} \\ &= {\left\langle {\mathbf{p}_2 - \mathbf{p}_1},~{\mathbf{n}} \right\rangle} \\ &\mathrel{\vcenter{:}}={\left\langle {\mathbf{p}_2 - \mathbf{p}_1},~{\mathbf{v}_1 \times\mathbf{v}_2} \right\rangle} .\end{align*}\]
\[\begin{align*} S = \left\{{(x,y,z) \mathrel{\Big|}f(x,y, z) = 0}\right\} \hspace{10em} z = f(x,y) \end{align*}\]
Given a function \(f: {\mathbb{R}}^n \to {\mathbb{R}}\), let \(S_k \mathrel{\vcenter{:}}=\left\{{\mathbf{p}\in {\mathbb{R}}^n ~{\text{s.t.}}~f(\mathbf{p}) = k}\right\}\) denote the level set for \(k\in {\mathbb{R}}\). Then \[\begin{align*} \nabla f(\mathbf{p}) \in S_k^\perp .\end{align*}\]
\[\begin{align*} \mathbf{v} &= [v_1, v_2, \cdots] && \text{a vector} \\ \\ \mathbf{e}_i &= [0, 0, \cdots, \overbrace{1}^{i \text{th term}}, \cdots, 0] && \text{the } i \text{th standard basis vector} \\ \\ \phi: {\mathbb{R}}^n &\to {\mathbb{R}} && \text{a functional on } {\mathbb{R}}^n\\ \phi(x_1, x_2, \cdots) &= \cdots && \\ \\ \mathbf{F}: {\mathbb{R}}^n &\to {\mathbb{R}}^n && \text{a multivariable function}\\ \mathbf{F}(x_1,x_2,\cdots) &= [\mathbf{F}_1(x_1, x_2, \cdots), \mathbf{F}_2(x_1, x_2, \cdots), \cdots, \mathbf{F}_n(x_1, x_2, \cdots)] \end{align*}\]
For a functional \(f:{\mathbb{R}}^n\to {\mathbb{R}}\), the partial derivative of \(f\) with respect to \(x_i\) is \[\begin{align*} {\frac{\partial f}{\partial x_i}\,}(\mathbf p) \mathrel{\vcenter{:}}=\lim_{h\to 0}\frac{f(\mathbf p + h\mathbf e_i) - f(\mathbf p)}{h} \end{align*}\]
\[\begin{align*} f: {\mathbb{R}}^2 &\to {\mathbb{R}}\\ {\frac{\partial f}{\partial x}\,}(x_0,y_0) &= \lim_{h \to 0} \frac{f(x_0+h, y_0) - f(x_0,y_0)}{h} \end{align*}\]
A function \(f: {\mathbb{R}}^n \to {\mathbb{R}}^m\) is differentiable iff there exists a linear transformation \(D_f: {\mathbb{R}}^n \to {\mathbb{R}}^m\) such that the following limit exists \[\begin{align*} \lim _ { \mathbf x \rightarrow \mathbf{p} } \frac { \left\| f (\mathbf x ) - f (\mathbf{p} ) - D_f (\mathbf x - \mathbf{p} ) \right\| } { \| \mathbf x - \mathbf{p} \| } = 0 .\end{align*}\]
\(D_f\) is the “best linear approximation” to \(f\).
When \(f\) is differentiable, \(D_f\) can be given in coordinates by
\[\begin{align*} (D_f)_{ij} = {\frac{\partial f_i}{\partial x_j}\,} \end{align*}\]
This yields the Jacobian of \(f\): \[\begin{align*} D_f(\mathbf{\mathbf}{p}) \begin{bmatrix} \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ \nabla f_1(\mathbf{p}) & \nabla f_2(\mathbf{p}) & \cdots & \nabla f_m(\mathbf{p}) \\ \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex} \end{bmatrix}^T = \left[ \begin{array} { c c c c } { \frac { \partial f _ { 1 } } { \partial x _ { 1 } } ( \mathbf{p} ) } & { \frac { \partial f _ { 1 } } { \partial x _ { 2 } } ( \mathbf{p} ) } & { \ldots } & { \frac { \partial f _ { 1 } } { \partial x _ { n } } ( \mathbf{p} ) } \\ { \frac { \partial f _ { 2 } } { \partial x _ { 1 } } ( \mathbf{p} ) } & { \frac { \partial f _ { 2 } } { \partial x _ { 2 } } ( \mathbf{p} ) } & { \dots } & { \frac { \partial f _ { 2 } } { \partial x _ { n } } ( \mathbf{p} ) } \\ { \vdots } & { \vdots } & { \ddots } & { \vdots } \\ { \frac { \partial f _ { m } } { \partial x _ { 1 } } ( \mathbf{p} ) } & { \frac { \partial f _ { m } } { \partial x _ { 2 } } ( \mathbf{p} ) } & { \cdots } & { \frac { \partial f _ { m } } { \partial x _ { n } } ( \mathbf{p} ) } \end{array} \right]. \end{align*}\]
This is equivalent to
For a function \(f: {\mathbb{R}}^n \to {\mathbb{R}}\), the Hessian is a generalization of the second derivative, and is given in coordinates by \[\begin{align*} (H_f)_{ij} = {\frac{\partial ^2f}{\partial x_i x_j}\,} \end{align*}\]
Explicitly, we have \[\begin{align*} H_f(\mathbf{p}) = \begin{bmatrix} \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ D \nabla f_1(\mathbf{p}) & D\nabla f_2(\mathbf{p}) & \cdots & D\nabla f_m(\mathbf{p}) \\ \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex} \end{bmatrix}^T = \left[ \begin{array} { c c c } { \frac { \partial ^ { 2 } f } { \partial x _ { 1 } \partial x _ { 1 } } ( \mathbf { a } ) } & { \dots } & { \frac { \partial ^ { 2 } f } { \partial x _ { 1 } \partial x _ { n } } ( \mathbf { a } ) } \\ { \vdots } & { \ddots } & { \vdots } \\ { \frac { \partial ^ { 2 } f } { \partial x _ { n } \partial x _ { 1 } } ( \mathbf { a } ) } & { \cdots } & { \frac { \partial ^ { 2 } f } { \partial x _ { n } \partial x _ { n } } ( \mathbf { a } ) } \end{array} \right]. \end{align*}\]
Mnemonic: make matrix with \(\nabla f\) as the columns, and then differentiate variables left to right.
Write out tree of dependent variables:
Then sum each possible path.
Let subscripts denote which variables are held constant, then \[\begin{align*} \left({\frac{\partial z}{\partial x}\,}\right)_y &= \left({\frac{\partial z}{\partial x}\,}\right)_{u,y,v} \\ & + \left({\frac{\partial z}{\partial v}\,}\right)_{x,y,u} \left({\frac{\partial v}{\partial x}\,}\right)_y \\ & + \left({\frac{\partial z}{\partial u}\,}\right)_{x,y,v} \left({\frac{\partial u}{\partial x}\,}\right)_{v,y} \\ & + \left({\frac{\partial z}{\partial u}\,}\right)_{x,y,v} \left({\frac{\partial u}{\partial v}\,}\right)_{x,y} \left({\frac{\partial v}{\partial x}\,}\right)_y \end{align*}\]
Let \(z = f(x,y)\), then to approximate near \(\mathbf{p}_0 = {\left[ {x_0, y_0} \right]}\), \[\begin{align*} f(\textcolor{Aquamarine}{\textbf{x}}) &\approx f(\mathbf{p}) + \nabla f (\textcolor{Aquamarine}{\textbf{x}} - \mathbf{p}_0) \\ \implies f(x,y) &\approx f(\mathbf{p}) + f_x(\mathbf{p})(x-x_0) + f_y(\mathbf{p})(y-y_0) \\ .\end{align*}\]
Critical points of \(f\) given by points \(\mathbf{p}\) such that the derivative vanishes: \[\begin{align*} \operatorname{crit}(f) = \left\{{\mathbf{p}\in {\mathbb{R}}^n ~{\text{s.t.}}~D_f({\mathbf p}) = 0}\right\} \end{align*}\]
. Compute \[\begin{align*} {\left\lvert {H_f(\mathbf p)} \right\rvert} \mathrel{\vcenter{:}}=\left| \begin{array} { l l } { f _ { x x } } & { f _ { x y } } \\ { f _ { y x } } & { f _ { y y } } \end{array} \right| ({ \mathbf p }) \end{align*}\] 2. Check by cases:
What’s really going on?
Eigenvalues have same sign \(\iff\) positive definite or negative definite
Positive definite \(\implies\) convex \(\implies\) local min
Negative definite \(\implies\) concave \(\implies\) local max
The setup: \[\begin{align*} \text{Optimize } f(\mathbf x) &\quad \text{subject to } g(\mathbf x) = c \\ \implies \nabla f &= \lambda \nabla g \end{align*}\] 1. Use this formula to obtain a system of equations in the components of \(x\) and the parameter \(\lambda\).
Use \(\lambda\) to obtain a relation involving only components of \(\mathbf{x}\).
Substitute relations back into constraint to obtain a collection of critical points.
Evaluate \(f\) at critical points to find max/min.
For any \(f: {\mathbb{R}}^n \to {\mathbb{R}}^n\) and region \(R\), \[\begin{align*} \int _ { g ( R ) } f ( \mathbf { x } ) ~d V = \int _ { R } (f \circ g) ( \mathbf { x } ) \cdot {\left\lvert {D_g ( \mathbf { x })} \right\rvert} ~d V \end{align*}\]
\(R\) is a region, \(S\) is a surface, \(V\) is a solid.
\[\begin{align*} \oint _ { \partial S } \mathbf { F } \cdot d \mathbf { r } = \oint _ { \partial S } [\mathbf{F}_1, \mathbf{F}_2, \mathbf{F}_3] \cdot [dx, dy, dz] = \oint_{{\partial}S} \mathbf{F}_1dx + \mathbf{F}_2dy + \mathbf{F}_3dz \end{align*}\]
The main vector operators \[\begin{align*} \nabla: ({\mathbb{R}}^n \to {\mathbb{R}}) &\to ({\mathbb{R}}^n \to {\mathbb{R}}^n) \\ \phi &\mapsto \nabla \phi \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial \phi}{\partial x_i} ~\mathbf{e}_i \\ \\ \text{} \mathrm{div}(\mathbf{F}): ({\mathbb{R}}^n \to {\mathbb{R}}^n) &\to ({\mathbb{R}}^n \to {\mathbb{R}}) \\ \mathbf{F} &\mapsto \nabla \cdot \mathbf{F} \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial \mathbf{F}_i}{\partial x_i} \\ \\ \text{} \mathrm{curl}(\mathbf{F}): ({\mathbb{R}}^3 \to {\mathbb{R}}^3) &\to ({\mathbb{R}}^3 \to {\mathbb{R}}^3) \\ \mathbf{F} &\mapsto \nabla \times\mathbf{F} \\ \\ \text{} \end{align*}\] Some terminology: \[\begin{align*} \text{Scalar Field} && \phi:&~ X \to {\mathbb{R}}\\ \text{Vector Field} && \mathbf{F}:&~ X\to {\mathbb{R}}^n\\ \text{Gradient Field} && \mathbf{F}:&~ X \to {\mathbb{R}}^n \mathrel{\Big|}\exists \phi: X\to {\mathbb{R}}\mathrel{\Big|}\nabla \phi = F \end{align*}\]
\[\begin{align*} \mathbf x \cdot \mathbf y = {\left\langle {\mathbf x},~{\mathbf y} \right\rangle} = \sum_{i=1}^n {x_i y_i} = x_1y_1 + x_2y_2 + \cdots && \text{inner/dot product} \\ {\left\lVert {\mathbf x} \right\rVert} = \sqrt{{\left\langle {\mathbf x},~{\mathbf x} \right\rangle}} = \sqrt{\sum_{i=1}^n x_i^2} = \sqrt{x_1^2 + x_2^2 + \cdots} && \text{norm} \\ \mathbf a \times\mathbf b = \mathbf{\widehat{n}} {\left\lVert {\mathbf a} \right\rVert}{\left\lVert {\mathbf b} \right\rVert}\sin\theta_{\mathbf a,\mathbf b} = \left| \begin{array}{ccc} \mathbf{\widehat{x}} & \mathbf{\widehat{y}} & \mathbf{\widehat{z}} \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \end{array}\right| && \text{cross product} \\ \\ D_\mathbf{u}(\phi) = \nabla \phi \cdot \mathbf{\widehat{u}} && \text{directional derivative} \\ \\ \nabla \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial}{\partial x_i} \mathbf{e}_i = \left[\frac{\partial}{\partial x_1}, \frac{\partial}{\partial x_2}, \cdots, \frac{\partial}{\partial x_n}\right] && \text{del operator} \\ \\ \nabla \phi \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial \phi}{\partial x_i} ~\mathbf{e}_i = {\left[ { \frac{\partial \phi}{\partial x_1}, \frac{\partial \phi}{\partial x_2}, \cdots, \frac{\partial \phi}{\partial x_n}} \right]} && \text{gradient} \\ \\ \Delta \phi \mathrel{\vcenter{:}}=\nabla\cdot\nabla \phi \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial^2 \phi}{\partial x_i^2} = \frac{\partial^2 \phi}{\partial x_1^2} + \frac{\partial^2 \phi}{\partial x_2} + \cdots + \frac{\partial^2 \phi}{\partial x_n^2} && \text{Laplacian} \\ \\ \nabla \cdot \mathbf{F} \mathrel{\vcenter{:}}=\sum_{i=1}^n \frac{\partial \mathbf{F}_i}{\partial x_i} = \frac{\partial \mathbf{F}_1}{\partial x_1} + \frac{\partial \mathbf{F}_2}{\partial x_2} + \cdots + \frac{\partial \mathbf{F}_n}{\partial x_n} && \text{divergence} \\ \\ \nabla \times \mathbf { F } = \left| \begin{array} { c c c } { \mathbf { e }_1 } & { \mathbf { e }_2 } & { \mathbf { e }_3 } \\ { \frac { \partial } { \partial x } } & { \frac { \partial } { \partial y } } & { \frac { \partial } { \partial z } } \\ { \mathbf{F} _ { 1 } } & { \mathbf{F} _ { 2 } } & { \mathbf{F} _ { 3 } } \end{array} \right| = [\mathbf{F}_{3y} - \mathbf{F}_{2z}, \mathbf{F}_{1z}- \mathbf{F}_{3x}, \mathbf{F}_{2x} -\mathbf{F}_{1y}] && \text{curl} \\ \iint _ { S } ( \nabla \times \mathbf { F } ) \cdot d \mathbf { S } = \iint _ { S } ( \nabla \times \mathbf { F } ) \cdot \mathbf { n } ~dS && \text{surface integral} \end{align*}\]
\[\begin{align*} \oint _ { \partial S } \mathbf { F } \cdot d \mathbf { r } = \iint _ { S } ( \nabla \times \mathbf { F } ) \cdot d \mathbf { S } .\end{align*}\]
Note that if \(S\) is a closed surface, so \({\partial}S = \emptyset\), this integral vanishes.
\[\begin{align*} \oint _ { {\partial}R } ( L ~d x + M ~d y ) = \iint _ { R } \left( \frac { \partial M } { \partial x } - \frac { \partial L } { \partial y } \right) d x d y .\end{align*}\]
Recovering Green’s Theorem from Stokes’ Theorem:
Let \(\mathbf{F} = [L, M, 0]\), then \(\nabla\times\mathbf{F} = [0, 0, \frac{\partial M}{\partial x} - \frac{\partial L}{\partial y}]\)
\[\begin{align*} \iint_ { \partial V } \mathbf { F } \cdot d \mathbf { S } = \iiint _ { V } ( \nabla \cdot \mathbf { F } ) ~d V .\end{align*}\]
\[\begin{align*} D_{\mathbf{v}} f(\mathbf{p}) \mathrel{\vcenter{:}}={\frac{\partial f}{\partial t}\,}(\mathbf{p} + t\mathbf{v}) \Big|_{t=0} .\end{align*}\]
Note that the directional derivative uses a normalized direction vector!
Suppose \(f:{\mathbb{R}}^n\to {\mathbb{R}}\) and \(\mathbf{v}\in {\mathbb{R}}^n\). Then \[\begin{align*} D_{\mathbf{v}}f(\mathbf{p}) = {\left\langle {\nabla f(\mathbf{p})},~{\mathbf{v}} \right\rangle} .\end{align*}\]
We first use the fact that we can find \(L\), the best linear approximation to \(f\): \[\begin{align*} L(\mathbf{x}) &\mathrel{\vcenter{:}}= f(\mathbf{p}) + D_f(\mathbf{p})(\mathbf{x} - \mathbf{p}) \\ \\ D_{\mathbf{v}}f(\mathbf{p}) &= D_{\mathbf{v}} L(\mathbf{p}) \\ &= \lim_{t\to 0} {L(\mathbf{p} + t\mathbf{v}) - L(\mathbf{p}) \over t}\\ &= \lim_{t\to 0} { f(\mathbf{p}) + D_f(\mathbf{p})(\mathbf{p} + t\mathbf{v} - \mathbf{p}) -\qty{f(\mathbf{p}) + D_f(\mathbf{p})(\mathbf{p} - \mathbf{p})} \over t }\\ &= \lim_{t\to 0} { D_f(\mathbf{p})(t\mathbf{v}) \over t} \\ &= D_f(\mathbf{p})\mathbf{v} \\ &\mathrel{\vcenter{:}}=\nabla f(\mathbf{p}) \cdot \mathbf{v} .\end{align*}\]
\[\begin{align*} x = r\cos\theta \\ y = r\sin\theta \\ dV \mapsto r \quad dr~d\theta \end{align*}\]
\[\begin{align*} x = r\cos\theta = \rho\sin\phi\cos\theta \\ y = r\sin\theta = \rho\sin\phi\sin\theta \\ dV \mapsto r^2 \sin\phi \quad dr~d\phi~d\theta \end{align*}\]
\[\begin{align*} \int_C f ~ds &\mathrel{\vcenter{:}}=\int_a^b (f\circ \mathbf{r})(t) ~{\left\lVert {\mathbf{r}'(t)} \right\rVert}~dt \\ &= \int_a^b f(x(t), y(t), z(t)) \sqrt{x_t^2 + y_t^2 + z_t^2} ~dt \end{align*}\]
The function \(\phi\) can be found using the same method from ODEs.
Parametrize the path \(C\) as \(\left\{{\mathbf{r}(t): t\in[a,b]}\right\}\), then \[\begin{align*} \int_C \mathbf F \cdot d\mathbf r & \mathrel{\vcenter{:}}=\int_a^b (\mathbf F \circ \mathbf r)(t) \cdot \mathbf r'(t) ~dt \\ &= \int_a^b [\mathbf F_1(x(t), y(t), \cdots), \mathbf F_2(x(t), y(t), \cdots)]\cdot[x_t, y_t, \cdots] ~dt \\ &= \int_a^b \mathbf F_1(x(t), y(t) \cdots)x_t + \mathbf F_2(x(t), y(t), \cdots)y_t + \cdots ~dt \end{align*}\]
Equivalently written:
\[\begin{align*} \int_a^b \mathbf F_1 ~dx + \mathbf F_2 ~dy + \cdots \mathrel{\vcenter{:}}=\int_C \mathbf F \cdot d\mathbf r \end{align*}\] in which case \([dx, dy, \cdots] \mathrel{\vcenter{:}}=[x_t, y_t, \cdots] = \mathbf r'(t)\).
\[\begin{align*} \iint_S \mathbf{F}\cdot d\mathbf{S} = \iint_S \mathbf{F}\cdot \mathbf{\widehat{n}} ~dS .\end{align*}\]
Given \(R\) and \(f(x,y) = 0\), \[\begin{align*} A(R) = \oint _ { {\partial}R } x ~d y = - \oint _ { {\partial}R } y ~d x = \frac { 1 } { 2 } \oint _ { {\partial}R } - y ~d x + x ~d y . \end{align*}\]
Compute \[\begin{align*} \oint_{{\partial}R} x ~dy = - \oint_{{\partial}R} y ~dx \\ = \frac{1}{2} \oint_{{\partial}R} -y~dx + x~dy = \frac{1}{2} \iint_R 1 - (-1) ~dA =\iint_R 1 ~dA \end{align*}\]
\(\nabla \cdot \mathbf{F} = 0 \not \implies \exists G:~ \mathbf{F} = \nabla\times G\). A counterexample
\[\begin{align*} \mathbf{F}(x,y,z) =\frac{1}{\sqrt{x^2+y^2+z^2}}[x, y, z]~,\quad S = S^2 \subset {\mathbb{R}}^3 \\ \implies \nabla \mathbf{F} = 0 \text{ but } \iint_{S^2}\mathbf{F}\cdot d\mathbf{S} = 4\pi \neq 0 \end{align*}\] Where by Stokes’ theorem, \[\begin{align*} \mathbf{F} = \nabla\times\mathbf{G}\implies \iint_{S^2} \mathbf{F} &= \iint_{S^2} \nabla\times\mathbf{G} \\ \\ &= \oint_{{\partial}S^2}\mathbf{G}~d\mathbf{r} && \text{by Stokes}\\ &= 0 \end{align*}\] since \({\partial}S^2 = \emptyset\).
Sufficient condition: if \(\mathbf{F}\) is everywhere \(C^1\), \[\begin{align*} \exists \mathbf{G}:~ \mathbf{F} = \nabla \times\mathbf{G} \iff \iint_S \mathbf{F}\cdot d\mathbf{S} = 0 \text{ for all closed surfaces }S .\end{align*}\]
The underlying field will be assumed to be \({\mathbb{R}}\) for this section.
\[\begin{align*} \operatorname{Mat}(m, n) && \text{the space of all } m\times n\text{ matrices} \\ T && \text{a linear map } {\mathbb{R}}^n \to {\mathbb{R}}^m \\ A\in \operatorname{Mat}(m, n)&& \text{an } m\times n \text{ matrix representing }T \\ A^t\in \operatorname{Mat}(n, m) && \text{an } n\times m \text{ transposed matrix} \\ \mathbf{a} && \text{a } 1\times n \text{ column vector} \\ \mathbf{a}^t && \text{an } n\times 1 \text{ row vector} \\ A = {\left[ {\mathbf{a}_1, \cdots, \mathbf{a}_n} \right]} && \text{a matrix formed with } \mathbf{a}_i \text{ as the columns} \\ V, W && \text{vector spaces} \\ |V|, \dim(W) && \text{dimensions of vector spaces} \\ \det(A) && \text{the determinant of }A \\ \begin{bmatrix} A &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& \mathbf{b} \end{bmatrix} \mathrel{\vcenter{:}}={\left[ {\mathbf{a}_1, \mathbf{a}_2, \cdots \mathbf{a}_n, \mathbf{b}} \right]} && \text{augmented matrices} \\ \begin{bmatrix} A &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& B \end{bmatrix} \mathrel{\vcenter{:}}={\left[ {\mathbf{a}_1, \cdots \mathbf{a}_n, \mathbf{b}_1, \cdots, \mathbf{b}_m} \right]} && \text{block matrices}\\ \operatorname{Spec}(A) && \text{the multiset of eigenvalues of } A \\ A\mathbf{x} = \mathbf{b} && \text{a system of linear equations} \\ r\mathrel{\vcenter{:}}=\operatorname{rank}(A) && \text{the rank of }A\\ r_b = \operatorname{rank}\qty{ \begin{bmatrix} A &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& \mathbf{b} \\ \end{bmatrix} } && \text{the rank of }A\text{ augmented by }\mathbf{b} .\end{align*}\]
\[\begin{align*} {\left\lvert {\ker(A)} \right\rvert} + {\left\lvert {\operatorname{im}(A)} \right\rvert} = {\left\lvert {\operatorname{dom}(A)} \right\rvert} ,\end{align*}\] where \(\operatorname{nullspace}(A) = {\left\lvert {\operatorname{im}{A}} \right\rvert}, \operatorname{rank}(A) = {\left\lvert {\operatorname{im}(A)} \right\rvert},\) and \(n\) is the number of columns in the corresponding matrix.
Generalization: the following sequence is always exact: \[\begin{align*} 0 \to \ker(A) \xhookrightarrow{\text{id}} \operatorname{dom}(A) % \xrightarrow[]{A}\mathrel{\mkern-14mu}\rightarrow \operatorname{im}(A) \to 0 .\end{align*}\] Moreover, it always splits, so \(\operatorname{dom}A = \ker A \oplus \operatorname{im}A\) and thus \({\left\lvert {\operatorname{dom}(A)} \right\rvert} = {\left\lvert {\ker(A)} \right\rvert} + {\left\lvert {\operatorname{im}(A)} \right\rvert}\).
We also have \[\begin{align*} \dim(\operatorname{rowspace}(A)) = \dim(\operatorname{colspace}(A)) = \operatorname{rank}(A) .\end{align*}\]
Let \(A\) be an \(m\times n\) matrix. TFAE: - \(A\) is invertible and has a unique inverse \(A^{-1}\) - \(A^T\) is invertible - \(\det(A) \neq 0\) - The linear system \(A\mathbf{x} = \mathbf{b}\) has a unique solution for every \(b\ \in {\mathbb{R}}^m\) - The homogeneous system \(A\mathbf{x} = 0\) has only the trivial solution \(\mathbf{x} = 0\) - \(\operatorname{rank}(A) = n\) - i.e. \(A\) is full rank - \(\mathrm{nullity}(A) \mathrel{\vcenter{:}}=\dim\mathrm{nullspace}(A) = 0\) - \(A = \prod_{i=1}^k E_i\) for some finite \(k\), where each \(E_i\) is an elementary matrix. - \(A\) is row-equivalent to the identity matrix \(I_n\) - \(A\) has exactly \(n\) pivots - The columns of \(A\) are a basis for \({\mathbb{R}}^n\) - i.e. \(\operatorname{colspace}(A) = {\mathbb{R}}^n\) - The rows of \(A\) are a basis for \({\mathbb{R}}^m\) - i.e. \(\mathrm{rowspace}(A) = {\mathbb{R}}^m\) - \(\left(\operatorname{colspace}(A)\right)^\perp= \left(\mathrm{rowspace}A\right)^\perp= \left\{{\mathbf{0}}\right\}\) - Zero is not an eigenvalue of \(A\). - \(A\) has \(n\) linearly independent eigenvectors - The rows of \(A\) are coplanar.
Similarly, by taking negations, TFAE:
Reformulated in terms of linear maps \(T\), TFAE: - \(T^{-1}: {\mathbb{R}}^m \to {\mathbb{R}}^n\) exists - \(\operatorname{im}(T) = {\mathbb{R}}^n\) - \(\ker(T) = 0\) - \(T\) is injective - \(T\) is surjective - \(T\) is an isomorphism - The system \(A\mathbf{x} = 0\) has infinitely many solutions
It is common to want to know the range and kernel of a specific linear transformation \(T\). \(T\) can be given in many ways, but a general strategy for deducing these properties involves:
Express an arbitrary vector in \(V\) as a linear combination of its basis vectors, and set it equal to an arbitrary vector in \(W\).
Use the linear properties of \(T\) to make a substitution from known transformations
Find a restriction or relation given by the constants of the initial linear combination.
Useful fact: if \(V\leq W\) is a subspace and \(\dim(V) \geq \dim(W)\), then \(V=W\).
If \(V\subseteq W\), then \(V\) is a subspace of \(W\) if the following hold:
\[\begin{align*} (1) && \mathbf{0}\in V \\ (2) && \mathbf{a}, \mathbf{b}\in V\implies t\mathbf{a} + \mathbf{b}\in V .\end{align*}\]
Any set of two vectors \(\left\{{\mathbf{v}, \mathbf{w}}\right\}\) is linearly dependent \(\iff \exists \lambda :~\mathbf{v} = \lambda \mathbf{w}\), i.e. one is not a scalar multiple of the other.
A set \(S\) forms a basis for a vector space \(V\) iff
In this case, we define the dimension of \(V\) to be \({\left\lvert {S} \right\rvert}\).
The point of this section is to show how an inner product can induce a notion of “angle”, which agrees with our intuition in Euclidean spaces such as \({\mathbb{R}}^n\), but can be extended to much less intuitive things, like spaces of functions.
The Euclidean inner product is defined as \[\begin{align*} {\left\langle {\mathbf{a}},~{\mathbf{b}} \right\rangle} = \sum_{i=1}^n a_i b_i = a_1b_1 + a_2b_2 + \cdots + a_nb_n .\end{align*}\]
Also sometimes written as \(\mathbf{a}^T\mathbf{b}\) or \(\mathbf{a} \cdot \mathbf{b}\).
Yields a norm \[\begin{align*} {\left\lVert {\mathbf{x}} \right\rVert} \mathrel{\vcenter{:}}=\sqrt{{\left\langle {\mathbf{x}},~{\mathbf{x}} \right\rangle}} \end{align*}\]
which has a useful alternative formulation \[\begin{align*} {\left\langle {\mathbf{x}},~{\mathbf{x}} \right\rangle} = {\left\lVert {\mathbf{x}} \right\rVert}^2 .\end{align*}\]
This leads to a notion of angle: \[\begin{align*} {\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle} = {\left\lVert {\mathbf{x}} \right\rVert} {\left\lVert {\mathbf{y}} \right\rVert} \cos\theta_{x,y} \implies \cos \theta_{x,y} \mathrel{\vcenter{:}}=\frac{{\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle}}{{\left\lVert {\mathbf{x}} \right\rVert} {\left\lVert {\mathbf{y}} \right\rVert}} = {\left\langle {\widehat{\mathbf{x}}},~{\widehat{\mathbf{y}}} \right\rangle} \end{align*}\] where \(\theta_{x,y}\) denotes the angle between the vectors \(\mathbf{x}\) and \(\mathbf{y}\).
Since \(\cos \theta=0\) exactly when \(\theta = \pm \frac \pi 2\), we can can declare two vectors to be orthogonal exactly in this case: \[\begin{align*} \mathbf{x} \in \mathbf{y}^\perp\iff {\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle} = 0 .\end{align*}\]
Note that this makes the zero vector orthogonal to everything.
Given a subspace \(S \subseteq V\), we define its orthogonal complement \[\begin{align*} S^\perp= \left\{{\mathbf{v}\in V {~\mathrel{\Big|}~}\forall \mathbf{s}\in S,~ {\left\langle {\mathbf{v}},~{\mathbf{s}} \right\rangle} = 0}\right\}. \end{align*}\]
Any choice of subspace \(S\subseteq V\) yields a decomposition \(V = S \oplus S^\perp\).
A useful formula is \[\begin{align*} {\left\lVert {\mathbf{x} + \mathbf{y}} \right\rVert}^2 = {\left\lVert {\mathbf{x}} \right\rVert}^2 + 2{\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle} + {\left\lVert {\mathbf{y}} \right\rVert}^2, .\end{align*}\]
When \(\mathbf{x}\in \mathbf{y}^\perp\), this reduces to \[\begin{align*} {\left\lVert {\mathbf{x} + \mathbf{y}} \right\rVert}^2 = {\left\lVert {\mathbf{x}} \right\rVert}^2 + {\left\lVert {\mathbf{y}} \right\rVert}^2 .\end{align*}\]
. Bilinearity: \[\begin{align*} {\left\langle {\sum_j \alpha_j \mathbf{a}_j},~{\sum_k \beta_k \mathbf{b}_k} \right\rangle} = \sum_j \sum_i \alpha_j \beta_i {\left\langle {\mathbf{a}_j},~{\mathbf{b}_i} \right\rangle}. \end{align*}\]
Symmetry: \[\begin{align*} {\left\langle {\mathbf{a}},~{\mathbf{b}} \right\rangle} = {\left\langle {\mathbf{b}},~{\mathbf{a}} \right\rangle} \end{align*}\]
Positivity: \[\begin{align*} \mathbf{a} \neq \mathbf{0} \implies {\left\langle {\mathbf{a}},~{\mathbf{a}} \right\rangle} > 0 \end{align*}\]
Nondegeneracy: \[\begin{align*} \mathbf{a} = \mathbf{0} \iff {\left\langle {\mathbf{a}},~{\mathbf{a}} \right\rangle} = 0 \end{align*}\]
Extending a basis \(\left\{{\mathbf{x}_i}\right\}\) to an orthonormal basis \(\left\{{\mathbf{u}_i}\right\}\)
\[\begin{align*} \mathbf{u}_1 &= N(\mathbf{x_1}) \\ \mathbf{u}_2 &= N(\mathbf{x}_2 - {\left\langle {\mathbf{x}_2},~{\mathbf{u}_1} \right\rangle}\mathbf{u}_1)\\ \mathbf{u}_3 &= N(\mathbf{x}_3 - {\left\langle {\mathbf{x}_3},~{\mathbf{u}_1} \right\rangle}\mathbf{u}_1 - {\left\langle {\mathbf{x}_3},~{\mathbf{u}_2} \right\rangle}\mathbf{u}_2 ) \\ \vdots & \qquad \vdots \\ \mathbf{u}_k &= N(\mathbf{x}_k - \sum_{i=1}^{k-1} {\left\langle {\mathbf{x}_k},~{\mathbf{u}_i} \right\rangle}\mathbf{u}_i) \end{align*}\]
where \(N\) denotes normalizing the result.
The general setup here is that we are given an orthogonal basis \(\left\{{\mathbf{x}_i}\right\}_{i=1}^n\) and we want to produce an orthonormal basis from them.
Why would we want such a thing? Recall that we often wanted to change from the standard basis \(\mathcal{E}\) to some different basis \(\mathcal{B} = \left\{{\mathbf{b}_1, \mathbf{b}_2, \cdots}\right\}\). We could form the change of basis matrix \(B = [\mathbf{b}_1, \mathbf{b}_2, \cdots]\) acts on vectors in the \(\mathcal{B}\) basis according to \[\begin{align*} B[\mathbf{x}]_\mathcal{B} = [\mathbf{x}]_{\mathcal{E}} .\end{align*}\]
But to change from \(\mathcal{E}\) to \(\mathcal{B}\) requires computing \(B^{-1}\), which acts on vectors in the standard basis according to \[\begin{align*} B^{-1}[\mathbf{x}]_\mathcal{E} = [\mathbf{x}]_{\mathcal{B}} .\end{align*}\]
If, on the other hand, the \(\mathbf{b}_i\) are orthonormal, then \(B^{-1} = B^T\), which is much easier to compute. We also obtain a rather simple formula for the coordinates of \(\mathbf{x}\) with respect to \(\mathcal B\). This follows because we can write \[\begin{align*} \mathbf{x} = \sum_{i=1}^n {\left\langle {\mathbf{x}},~{\mathbf{b}_i} \right\rangle} \mathbf{b}_i \mathrel{\vcenter{:}}=\sum_{i=1}^n c_i \mathbf{b}_i, .\end{align*}\]
and we find that \[\begin{align*} [\mathbf{x}]_\mathcal{B} = \mathbf{c} \mathrel{\vcenter{:}}=[c_1, c_2, \cdots, c_n]^T. .\end{align*}\]
This also allows us to simplify projection matrices. Supposing that \(A\) has orthonormal columns and letting \(S\) be the column space of \(A\), recall that the projection onto \(S\) is defined by \[\begin{align*} P_S = Q(Q^TQ)^{-1}Q^T .\end{align*}\]
Since \(Q\) has orthogonal columns and satisfies \(Q^TQ = I\), this simplifies to \[\begin{align*} P_S = QQ^T. .\end{align*}\]
Given the orthogonal basis \(\left\{{\mathbf{x}_i}\right\}\), we form an orthonormal basis \(\left\{{\mathbf{u}_i}\right\}\) iteratively as follows.
First define \[\begin{align*} N: {\mathbb{R}}^n &\to S^{n-1} \\ \mathbf{x} &\mapsto \widehat{\mathbf{x}} \mathrel{\vcenter{:}}=\frac {\mathbf{x}} {{\left\lVert {\mathbf{x}} \right\rVert}} \end{align*}\]
which projects a vector onto the unit sphere in \({\mathbb{R}}^n\) by normalizing. Then,
\[\begin{align*} \mathbf{u}_1 &= N(\mathbf{x_1}) \\ \mathbf{u}_2 &= N(\mathbf{x}_2 - {\left\langle {\mathbf{x}_2},~{\mathbf{u}_1} \right\rangle}\mathbf{u}_1)\\ \mathbf{u}_3 &= N(\mathbf{x}_3 - {\left\langle {\mathbf{x}_3},~{\mathbf{u}_1} \right\rangle}\mathbf{u}_1 - {\left\langle {\mathbf{x}_3},~{\mathbf{u}_2} \right\rangle}\mathbf{u}_2 ) \\ \vdots & \qquad \vdots \\ \mathbf{u}_k &= N(\mathbf{x}_k - \sum_{i=1}^{k-1} {\left\langle {\mathbf{x}_k},~{\mathbf{u}_i} \right\rangle}\mathbf{u}_i) \end{align*}\]
In words, at each stage, we take one of the original vectors \(\mathbf{x}_i\), then subtract off its projections onto all of the \(\mathbf{u}_i\) we’ve created up until that point. This leaves us with only the component of \(\mathbf{x}_i\) that is orthogonal to the span of the previous \(\mathbf{u}_i\) we already have, and we then normalize each \(\mathbf{u}_i\) we obtain this way.
Alternative Explanation:
Given a basis \[\begin{align*} S = \left\{\mathbf{v_1, v_2, \cdots v_n}\right\}, \end{align*}\]
the Gram-Schmidt process produces a corresponding orthogonal basis \[\begin{align*} S' = \left\{\mathbf{u_1, u_2, \cdots u_n}\right\} \end{align*}\] that spans the same vector space as \(S\).
\(S'\) is found using the following pattern: \[\begin{align*} \mathbf{u_1} &= \mathbf{v_1} \\ \mathbf{u_2} &= \mathbf{v_2} - \text{proj}_{\mathbf{u_1}} \mathbf{v_2}\\ \mathbf{u_3} &= \mathbf{v_3} - \text{proj}_{\mathbf{u_1}} \mathbf{v_3} - \text{proj}_{\mathbf{u_2}} \mathbf{v_3}\\ \end{align*}\]
where \[\begin{align*} \text{proj}_{\mathbf{u}} \mathbf{v} = (\text{scal}_{\mathbf{u}} \mathbf{v})\frac{\mathbf{u}}{\mathbf{{\left\lVert {u} \right\rVert}}} = \frac{\langle \mathbf{v,u} \rangle}{{\left\lVert {\mathbf{u}} \right\rVert}}\frac{\mathbf{u}}{\mathbf{{\left\lVert {u} \right\rVert}}} = \frac{{\left\langle {\mathbf{v}},~{\mathbf{u}} \right\rangle}}{{\left\lVert {\mathbf{u}} \right\rVert}^2}\mathbf{u} \end{align*}\] is a vector defined as the
The orthogonal set \(S'\) can then be transformed into an orthonormal set \(S''\) by simply dividing the vectors \(s\in S'\) by their magnitudes. The usual definition of a vector’s magnitude is \[\begin{align*} {\left\lVert {\mathbf{a}} \right\rVert} = \sqrt{{\left\langle {\mathbf{a}},~{\mathbf{a}} \right\rangle}} \text{ and } {\left\lVert {\mathbf{a}} \right\rVert}^2 = {\left\langle {\mathbf{a}},~{\mathbf{a}} \right\rangle} \end{align*}\]
As a final check, all vectors in \(S'\) should be orthogonal to each other, such that \[\begin{align*} {\left\langle {\mathbf{v_i}},~{\mathbf{v_j}} \right\rangle} = 0 \text{ when } i \neq j \end{align*}\]
and all vectors in \(S''\) should be orthonormal, such that \[\begin{align*} {\left\langle {\mathbf{v_i}},~{\mathbf{v_j}} \right\rangle} = \delta_{ij} \end{align*}\]
Given a matrix \(A \in \mathrm{Mat}(m, n)\), and noting that \[\begin{align*} A &: {\mathbb{R}}^n \to {\mathbb{R}}^m,\\ A^T &: {\mathbb{R}}^m \to {\mathbb{R}}^n \end{align*}\]
We have the following decompositions: \[\begin{align*} &{\mathbb{R}}^n &\cong \ker A &\oplus \operatorname{im}A^T &\cong \mathrm{nullspace}(A) &\oplus~ \mathrm{colspace}(A^T) \\ &{\mathbb{R}}^m &\cong \operatorname{im}A &\oplus \ker A^T &\cong \mathrm{colspace}(A) &\oplus~ \mathrm{nullspace}(A^T) \end{align*}\]
An \(m\times n\) matrix is a map from \(n\)-dimensional space to \(m\)-dimensional space. The number of rows tells you the dimension of the codomain, the number of columns tells you the dimension of the domain.
The space of matrices is not an integral domain! Counterexample: if \(A\) is singular and nonzero, there is some nonzero \(\mathbf{v}\) such that \(A \mathbf{v} = \mathbf{0}\). Then setting \(B = {\left[ {\mathbf{v}, \mathbf{v}, \cdots} \right]}\) yields \(AB = 0\) with \(A\neq 0, B\neq 0\).
The rank of a matrix \(A\) representing a linear transformation \(T\) is \(\dim \operatorname{colspace}(A)\), or equivalently \(\dim \operatorname{im}T\).
\(\operatorname{rank}(A)\) is equal to the number of nonzero rows in \(\operatorname{RREF}(A)\).
\[\begin{align*} \mathrm{Trace}(A) = \sum_{i=1}^m A_{ii} \end{align*}\]
The following are elementary row operations on a matrix:
If \(A = {\left[ {\mathbf{a}_1, \mathbf{a}_2, \cdots} \right]} \in \mathrm{Mat}(m, n)\) and \(B = {\left[ {\mathbf{b}_1, \mathbf{b}_2, \cdots} \right]} \in\mathrm{Mat}(n, p)\), then \[\begin{align*} C \mathrel{\vcenter{:}}= AB \implies c_{ij} = \sum_{k=1}^n a_{ik}b_{kj} = {\left\langle {\mathbf{a_i}},~{\mathbf{b_j}} \right\rangle} \end{align*}\] where \(1\leq i \leq m\) and \(1\leq j \leq p\). In words, each entry \(c_{ij}\) is obtained by dotting row \(i\) of \(A\) against column \(j\) of \(B\).
A system of linear equations is consistent when it has at least one solution. The system is inconsistent when it has no solutions.
?
Homogeneous systems are always consistent, i.e. there is always at least one solution.
There are three possibilities for a system of linear equations:
These possibilities can be check by considering \(r\mathrel{\vcenter{:}}=\operatorname{rank}(A)\):
\[\begin{align*} \det{(A \mod p}) \mod p \equiv (\det{A}) \mod p \end{align*}\]
For \(2\times 2\) matrices, \[\begin{align*} A^{-1} = \left( \begin{array}{cc} a & b \\ c & d \end{array}\right)^{-1} = \frac{1}{\det{A}}\left( \begin{array}{cc} d & -b \\ -c & a \end{array}\right) \end{align*}\] In words, swap the main diagonal entries, and flip the signs on the off-diagonal.
Let \(A \in \mathrm{Mat}(m, n)\), then there is a function \[\begin{align*} \det: \operatorname{Mat}(m, m) &\to {\mathbb{R}}\\ A &\mapsto \det(A) \end{align*}\] satisfying the following properties:
\(\det\) is a group homomorphism onto \(({\mathbb{R}}, \cdot)\): \[\begin{align*} \det(AB) = \det(A) \det(B) \end{align*}\]
Invariance under adding scalar multiples of any row to another: \[\begin{align*} \det \begin{bmatrix} & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& \textcolor{Aquamarine}{\textbf{a}}_i & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \end{bmatrix} = \det \begin{bmatrix} & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& \textcolor{Aquamarine}{\textbf{a}}_i + t\mathbf{a_j} & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \end{bmatrix} \end{align*}\]
Sign change under row permutation: \[\begin{align*} \det \begin{bmatrix} & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& \textcolor{Aquamarine}{\textbf{a}}_i & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& \mathbf{a}_j & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \end{bmatrix} = (-1) \det \begin{bmatrix} & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& \mathbf{a}_j & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& \textcolor{Aquamarine}{\textbf{a}}_i & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \end{bmatrix} \end{align*}\]
Multilinearity in rows: \[\begin{align*} \det \begin{bmatrix} & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& t \textcolor{Aquamarine}{\textbf{a}}_i & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \end{bmatrix} &= t \det \begin{bmatrix} & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& \textcolor{Aquamarine}{\textbf{a}}_i & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \end{bmatrix} \\ \det \begin{bmatrix} \rule[.5ex]{2.5ex}{0.5pt}& t \mathbf{a}_1 & \rule[.5ex]{2.5ex}{0.5pt}\\ \rule[.5ex]{2.5ex}{0.5pt}& t \mathbf{a}_2 & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& t \mathbf{a}_m & \rule[.5ex]{2.5ex}{0.5pt}\\ \end{bmatrix} &= t^m \det \begin{bmatrix} \rule[.5ex]{2.5ex}{0.5pt}& \mathbf{a}_1 & \rule[.5ex]{2.5ex}{0.5pt}\\ \rule[.5ex]{2.5ex}{0.5pt}& \mathbf{a}_2 & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& \mathbf{a}_m & \rule[.5ex]{2.5ex}{0.5pt}\\ \end{bmatrix} \\ \det \begin{bmatrix} \rule[.5ex]{2.5ex}{0.5pt}& t_1 \mathbf{a}_1 & \rule[.5ex]{2.5ex}{0.5pt}\\ \rule[.5ex]{2.5ex}{0.5pt}& t_2 \mathbf{a}_2 & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& t_m \mathbf{a}_m & \rule[.5ex]{2.5ex}{0.5pt}\\ \end{bmatrix} &= \prod_{i=1}^m t_i \det \begin{bmatrix} \rule[.5ex]{2.5ex}{0.5pt}& \mathbf{a}_1 & \rule[.5ex]{2.5ex}{0.5pt}\\ \rule[.5ex]{2.5ex}{0.5pt}& \mathbf{a}_2 & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& \mathbf{a}_m & \rule[.5ex]{2.5ex}{0.5pt}\\ \end{bmatrix} .\end{align*}\]
Linearity in each row: \[\begin{align*} \det \begin{bmatrix} & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& \textcolor{Aquamarine}{\textbf{a}}_i + \textcolor{red}{\textbf{a}}_j & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \end{bmatrix} =\det \begin{bmatrix} & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& \textcolor{Aquamarine}{\textbf{a}}_i & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \end{bmatrix} + \det \begin{bmatrix} & \vdots & \\ \rule[.5ex]{2.5ex}{0.5pt}& \textcolor{red}{\textbf{a}}_j & \rule[.5ex]{2.5ex}{0.5pt}\\ & \vdots & \\ \end{bmatrix} .\end{align*}\]
\(\det(A)\) is the volume of the parallelepiped spanned by the columns of \(A\).
If any row of \(A\) is all zeros, \(\det(A) = 0\).
TFAE:
Useful shortcuts:
The minor \(M_{ij}\) of \(A\in \operatorname{Mat}(n, n)\) is the determinant of the \((n-1) \times (n-1)\) matrix obtained by deleting the \(i\)th row and \(j\)th column from \(A\).
The cofactor \(C_{ij}\) is the scalar defined by \[\begin{align*} C_{ij} \mathrel{\vcenter{:}}=(-1)^{i+j} M_{ij} .\end{align*}\]
For any fixed \(i\), there is a formula \[\begin{align*} \det(A) = \sum_{j=1}^n a_{ij} C_{ij} .\end{align*}\]
Let \[\begin{align*} A = \left[\begin{array}{lll} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{array}\right] .\end{align*}\]
Then \[\begin{align*} \det A = 1 \cdot\left|\begin{array}{ll} 5 & 6 \\ 8 & 9 \end{array}\right|-2 \cdot\left|\begin{array}{ll} 4 & 6 \\ 7 & 9 \end{array}\right|+3 \cdot\left|\begin{array}{ll} 4 & 5 \\ 7 & 8 \end{array}\right| = 1 \cdot(-3)-2 \cdot(-6)+3 \cdot(-3) = 0 .\end{align*}\]
\(\det(A)\) can be computed by reducing \(A\) to \(\operatorname{RREF}(A)\) (which is upper triangular) and keeping track of the following effects:
Given a linear system \(A\mathbf{x} = \mathbf{b}\), writing \(\mathbf{x} = {\left[ {x_1, \cdots, x_n} \right]}\), there is a formula \[\begin{align*} x_i = \frac{\det(B_i)}{\det(A)} \end{align*}\] where \(B_i\) is \(A\) with the \(i\)th column deleted and replaced by \(\mathbf{b}\).
Under the equivalence relation of elementary row operations, there is an equivalence of augmented matrices: \[\begin{align*} \begin{bmatrix} A &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& I \end{bmatrix} \sim \begin{bmatrix} I &\fboxsep=-\fboxrule\!\!\!\fbox{\strut}\!\!\!& A^{-1} \end{bmatrix} \end{align*}\] where \(I\) is the \(n\times n\) identity matrix.
\[\begin{align*} A^{-1} = {1\over \det(A)} {\left[ {C_{ij}} \right]}^t .\end{align*}\] where \(C_{ij}\) is the cofactor() at position \(i,j\).1
\[\begin{align*} \left(\begin{array}{cc} a& b \\ c& d \end{array}\right)^{-1} = {1 \over a d - b c} \left(\begin{array}{rr} d & -b \\ -c & a \end{array}\right) \quad \text{ where } ad-bc \ne 0 \end{align*}\]
What’s the pattern?
\[\begin{align*} \begin{bmatrix} + & - \\ - & + \end{bmatrix} \end{align*}\]
\[\begin{align*} A^{-1} \mathrel{\vcenter{:}}= \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix} ^{-1} = {1 \over {\det A}} \begin{bmatrix} e i - f h & -(b i - c h) & b f - c e \\ -(d i - f g) &a i - c g &-(a f -c d) \\ d h - e g & -(a h - b g)& a e - b d \end{bmatrix} .\end{align*}\]
The pattern:
\[\begin{align*} \begin{bmatrix} + & - & + \\ - & + & - \\ \ + & - & + \end{bmatrix} \end{align*}\]
Let \(A\in \operatorname{Mat}(m, n)\) represent a map \(T:{\mathbb{R}}^n\to {\mathbb{R}}^m\).
?
\[\begin{align*} \dim \operatorname{rowspace}(A) = \dim \operatorname{colspace}(A) .\end{align*}\]
\[\begin{align*} \operatorname{im}(T)^\vee= \operatorname{rowspace}(A) \subset {\mathbb{R}}^n .\end{align*}\]
Reduce to RREF, and take nonzero rows of \(\mathrm{RREF}(A)\).
\[\begin{align*} \operatorname{im}(T) = \operatorname{colspace}(A) \subseteq {\mathbb{R}}^m \end{align*}\]
Reduce to RREF, and take columns with pivots from original \(A\).
Not enough pivots implies columns don’t span the entire target domain
\[\begin{align*} \ker(T) = \operatorname{nullspace}(A) \subseteq {\mathbb{R}}^n \end{align*}\]
Reduce to RREF, zero rows are free variables, convert back to equations and pull free variables out as scalar multipliers.
For each \(\lambda \in \operatorname{Spec}(A)\), compute a basis for \(\ker(A - \lambda I)\).
A vector \(\mathbf{v}\) is said to be an eigenvector of \(A\) with eigenvalue \(\lambda\in \operatorname{Spec}(A)\) iff \[\begin{align*} A\mathbf{v} = \lambda \mathbf{v} \end{align*}\] For a fixed \(\lambda\), the corresponding eigenspace \(E_\lambda\) is the span of all such vectors.
Similar matrices have identical eigenvalues and multiplicities.
Eigenvectors corresponding to distinct eigenvalues are always linearly independent
\(A\) has \(n\) distinct eigenvalues \(\implies A\) has \(n\) linearly independent eigenvectors.
A matrix \(A\) is diagonalizable \(\iff A\) has \(n\) linearly independent eigenvectors.
For \(\lambda\in \operatorname{Spec}(A)\), \[\begin{align*} \mathbf{v}\in E_\lambda \iff \mathbf{v} \in \ker(A-I\lambda) .\end{align*}\]
Some miscellaneous useful facts:
\(\lambda \in \operatorname{Spec}(A) \implies \lambda^2 \in \operatorname{Spec}(A^2)\) with the same eigenvector.
\(\prod \lambda_i = \det A\)
\(\sum \lambda_i = \mathrm{Tr}~A\)
An \(n\times n\) matrix \(P\) is diagonalizable iff its eigenspace is all of \(\mathbb{R}^n\) (i.e. there are \(n\) linearly independent eigenvectors, so they span the space.)
\(A\) is diagonalizable if there is a basis of eigenvectors for the range of \(P\).
\[\begin{align*} A \mathrel{\vcenter{:}}=\left[ \begin{array} { c c } { 1 } & { 1 } \\ { 0 } & { 1 } \end{array} \right] &\implies A^n = \left[ \begin{array} { c c } { 1 } & { n } \\ { 0 } & { 1 } \end{array} \right], && \operatorname{Spec}(A) = [1,1] \\ A \mathrel{\vcenter{:}}=\left[ \begin{array} { c c } { 1 } & { 1 } \\ { 0 } & { - 1 } \end{array} \right] &\implies A^2 = I_2, && \operatorname{Spec}(A) = [1, -1] \end{align*}\]
Determine a basis for \[\begin{align*} S = \left\{a_0 + a_1 x + a_2 x^2\mathrel{\Big|}a_0,a_1,a_2 \in \mathbb{R} \land a_0 - a_1 -2a_2 =0\right\}. \end{align*}\]
Let \(a_2=t, a_1=s, a_0=s+2t\), then \[\begin{align*} S &= \left\{ (s+2t) + (sx+tx^2)\mathrel{\Big|}s,t\in\mathbb{R} \right\} \\ &= \left\{ (s+sx) + (2t+tx^2)\mathrel{\Big|}s,t\in\mathbb{R} \right\} \\ &= \left\{ s(1+x) + t(2+x^2)\mathrel{\Big|}s,t\in\mathbb{R} \right\} \\ &= \text{span}\left\{(1+x),(2+x^2)\right\} \end{align*}\]
and a basis for \(S\) is \[\begin{align*} \left\{(1+x), (2+x^2)\right\} \end{align*}\]
: If \(V\) is an \(n\)-dimensional vector space, then every set \(S\) with fewer than \(n\) vectors can be extended to a basis for \(V\).
Only sets with fewer than \(n\) vectors can be extended to form a basis for \(V\).
: The set of all 3 x 3 matrices forms a three-dimensional subspace of \(M_{3}(\mathbb{R})\).
This set forms a 6-dimensional subspace. A basis for this space would require six elements.
Given \(A=\) \[\begin{align*}\begin{bmatrix} 1 & 3 ` \\ -2 & -6 \end{bmatrix}\end{align*}\] what is the dimension of the null space of \(A\)?
The augmented matrix for the system \(A\mathbf{x} = \mathbf{0}\) is \[\begin{align*}\begin{bmatrix}[cc|c] 1 & 3 & 0 \\ 0 & 0 & 0 \end{bmatrix}\end{align*}\] which has one free variable.
Writing one variable in terms of another results in \(x_1 + 3x_2 = 0 \Rightarrow x_1 = 3x_2\).
Let \(x_2 = r\) where \(r \in R\), then \(S = \left\{ x \in \mathbb{R}^2 : \mathbf{x} = r(3,1), r \in \mathbb{R}\right\} = \text{span}\left\{(3,1)\right\}\).
So, the set \(B = \left\{(3,1)\right\}\) is a basis for the null space of \(A\) and\ .
Let \(S\) be the subspace of \(\mathbb{R}^3\) that consists of all solutions to the equation \(x-3y+z = 0\). Determine a basis for \(S\), and find dim[\(S\)].
The first goal is to find a way to express the set of 3-tuples that satisfy this equation.
Let \(y=r\) and \(z=s\), then \(x=r-s\). Then vectors \(\mathbf{v}\) that satisfy the equation are all of the form \[\begin{align*} \mathbf{v} = (3r-s, r, s) = (3r,r,0)+(-s,0,s) = r(3,1,0) + s(-1,0,1). \end{align*}\] (Note - the goal here is to separate the dependent variables into different vectors so they can be written as a linear combination of something.)
The set \(S\) that satisfies this equation is then \[\begin{align*} S &= \left\{ \mathbf{v} \in \mathbb{R}^3 : \mathbf{v} =r(3,1,0) + s(-1,0,1) \land r,s\in\mathbb{R} \right\} \\ &= \text{span}\left\{ (3,1,0), (-1,0,1)\right\} \end{align*}\]
All that remains is to check that the vectors in this span are linearly independent. This can be done by showing that \[\begin{align*} a(3,1,0) + b(-1,0,1) = (0,0,0) \end{align*}\] \(a=b=0\).
Since the two vectors are linearly independent and span the solution set \(S\), they form a basis for \(S\) of dimension 2.
Determine a basis for the subspace of \(M_2(\mathbb{R})\) spanned by \[\begin{align*}\begin{bmatrix} -1 & 3 \\ -1 & 2 \end{bmatrix}\end{align*}\]
\[\begin{align*}\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}\end{align*}\]
\[\begin{align*}\begin{bmatrix} -1 & 4 \\ 1 & 1 \end{bmatrix}\end{align*}\]
\[\begin{align*}\begin{bmatrix} 5 & 6 \\ -5 & 1 \end{bmatrix}\end{align*}\].
Note that because the set contains the zero matrix, it is linearly dependent. So only consider the other three, as they span the same subspace as the original set.
First, determine if \(\left\{ A_1, A_2, A_3\right\}\) is linearly independent. Start with the equation \[\begin{align*} c_1A_1 + c_2A_2 + c_3A_3 = 0_2 \end{align*}\]
which gives \[\begin{align*} c_1 - c_2 + 5c_3 &= 0 \\ 3c_1 + 4c_2 - 6c_3 &= 0 \\ -c_1 + c_2 - 5c_3 &= 0 \\ 2c_1 + c_2 + c_3 &= 0 \end{align*}\]
which has the solution \((-2r,3r,r)\). So the set is linearly dependent by the relation \[\begin{align*} -2A_1 + 3A_2 + A_3 = 0 \text{ or }\\ A_3 = 2A_1 - 3A_2 \end{align*}\]
So \(\left\{A_1, A_2\right\}\) spans the same subspace as the original set. It is also linearly independent, and therefore forms a basis for the original subspace.
Let \(A, B, C \in M_2 (\mathbb{R})\). Define \(\langle A,B\rangle = a_{11}b_{11}+2a_{12}b{12}+3a_{21}b_{21}\). Does this define an inner product on \(M_2 (\mathbb{R})\)?
Instead, let \(\langle A,B\rangle = a_{11} + b_{22}\). Does this define an inner product on \(M_2(\mathbb{R})\)?
Let \(p=a_0 + a_1 x + a_2 x^2\) and \(q=b_0 + b_1 x + b_2 x^2\). Define \(\langle p,q\rangle = \sum_{i=0}^{2}(i+1)a_i b_i\). Does this define an inner product on \(P_2\)?
The transition matrix from a given basis \(\mathcal{B} = \left\{{\mathbf{b}_i}\right\}_{i=1}^n\) to the standard basis is given by \[\begin{align*} A\mathrel{\vcenter{:}}= \begin{bmatrix} \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_n \\ \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ \end{bmatrix} ,\end{align*}\] and the transition matrix from the standard basis to \(\mathcal{B}\) is \(A^{-1}\).
Given a notion of orthogonality for vectors, we can extend this to matrices. A square matrix is said to be orthogonal iff \(QQ^T = Q^TQ = I\). For rectangular matrices, we have the following characterizations: \[\begin{align*} QQ^T = I \implies &\text{The rows of } Q \text { are orthogonal,} \\ Q^TQ = I \implies &\text{The columns of } Q \text{ are orthogonal.} \end{align*}\]
To remember which condition is which, just recall that matrix multiplication \(AB\) takes the inner product between the rows of \(A\) and the columns of \(B\). So if, for example, we want to inspect whether or not the columns of \(Q\) are orthogonal, we should let \(B=Q\) in the above formulation – then we just note that the rows of \(Q^T\) are indeed the columns of \(Q\), so \(Q^TQ\) computes the inner products between all pairs of the columns of \(Q\) and stores them in a matrix.
A projection \(P\) induces a decomposition \[\begin{align*} \operatorname{dom}(P) = \ker(P) \oplus \ker(P)^\perp .\end{align*}\]
Distance from a point \(\mathbf{p}\) to a line \(\mathbf{a} + t\mathbf{b}\): let \(\mathbf{w} = \mathbf{p} - \mathbf{a}\), then: \({\left\lVert {\mathbf{w} - P(\mathbf{w}, \mathbf{v})} \right\rVert}\)
\[\begin{align*} \operatorname{Proj}_{\operatorname{range}(A)}(\mathbf{x}) = A(A^t A)^{-1} A^t \mathbf{x} .\end{align*}\]
Mnemonic: \[\begin{align*} P \approx {A^t A \over AA^t} .\end{align*}\]
With an inner product in hand and a notion of orthogonality, we can define a notion of orthogonal projection of one vector onto another, and more generally of a vector onto a subspace spanned by multiple vectors.
Say we have two vectors \(\mathbf{x}\) and \(\mathbf{y}\), and we want to define “the component of \(\mathbf{x}\) that lies along \(\mathbf{y}\)”, which we’ll call \(\mathbf{p}\). We can work out what the formula should be using a simple model:
We notice that whatever \(p\) is, it will in the direction of \(\mathbf{y}\), and thus \(\mathbf{p} = \lambda \widehat{\mathbf{y}}\) for some scalar \(\lambda\), where in fact \(\lambda = {\left\lVert {\mathbf{p}} \right\rVert}\) since \({\left\lVert {\widehat{\mathbf{y}}} \right\rVert} = 1\). We will find that \(\lambda = {\left\langle {\mathbf{x}},~{\widehat{\mathbf{y}}} \right\rangle}\), and so \[\begin{align*} \mathbf{p} = {\left\langle {\mathbf{x}},~{\widehat{\mathbf{y}}} \right\rangle}\widehat{\mathbf{y}} = \frac{{\left\langle {\mathbf{x}},~{\mathbf{y}} \right\rangle}}{{\left\langle {\mathbf{y}},~{\mathbf{y}} \right\rangle}} \mathbf{y} .\end{align*}\]
Notice that we can then form a “residual” vector \(\mathbf{r} = \mathbf{x} - \mathbf{p}\), which should satisfy \(\mathbf{r} ^\perp\mathbf{p}\). If we were to let \(\lambda\) vary as a function of a parameter \(t\) (making \(\mathbf{r}\) a function of \(t\) as well) we would find that this particular choice minimizes \({\left\lVert {\mathbf{r} (t)} \right\rVert}\).
In general, supposing one has a subspace \(S = \mathrm{span}\left\{{\mathbf{y}_1, \mathbf{y}_2, \cdots, \mathbf{y}_n}\right\}\) and (importantly!) the \(\mathbf{y}_i\) are orthogonal, then the projection of \(\mathbf{p}\) of \(x\) onto \(S\) is given by the sum of the projections onto each basis vector, yielding
\[\begin{align*} \mathbf{p} = \sum_{i=1}^n \frac{{\left\langle {\mathbf{x}},~{\mathbf{y}_i} \right\rangle}}{{\left\langle {\mathbf{y}_i},~{\mathbf{y}_i} \right\rangle}} \mathbf{y}_i = \sum_{i=1}^n {\left\langle {\mathbf{x}},~{\mathbf{y}_i} \right\rangle} \widehat{\mathbf{y}_i} .\end{align*}\]
Note: this is part of why having an orthogonal basis is desirable!
Letting \(A = [\mathbf{y}_1, \mathbf{y}_2, \cdots]\), then the following matrix projects vectors onto \(S\), expressing them in terms of the basis \(\mathbf{y}_i\)2: \[\begin{align*} \tilde P_A = (AA^T)^{-1}A^T, \end{align*}\]
while this matrix performs the projection and expresses it in terms of the standard basis: \[\begin{align*} P_A = A(AA^T)^{-1}A^T. \end{align*}\]
Equation of a plane: given a point \(\mathbf{p}_0\) on a plane and a normal vector \(\mathbf{n}\), any vector \(\mathbf{x}\) on the plane satisfies \[\begin{align*} {\left\langle {\mathbf{x} - \mathbf{p}_0},~{\mathbf{n}} \right\rangle} = 0 \end{align*}\]
To find the distance between a point \(\mathbf{a}\) and a plane, we need only project \(\mathbf{a}\) onto the subspace spanned by the normal \(\mathbf{n}\): \[\begin{align*} d = {\left\langle {\mathbf{a}},~{\mathbf{n}} \right\rangle} .\end{align*}\]
One important property of projections is that for any vector \(\mathbf{v}\) and for any subspace \(S\), we have \(\mathbf{v} - P_S(\mathbf{v}) \in S^\perp\). Moreover, if \(\mathbf{v} \in S^\perp\), then \(P_s(\mathbf{v})\) must be zero. This follows by noting that in equation \(\ref{projection_equation}\), every inner product appearing in the sum vanishes, by definition of \(\mathbf{v} \in S^\perp\), and so the projection is zero.
\(\mathbf{x}\) is a least squares solution to \(A\mathbf{x} = \mathbf{b}\) iff \[\begin{align*} A^t A \mathbf{x} = A^t \mathbf{b} \end{align*}\]
The general setup here is that we would like to solve \(A\mathbf{x} = \mathbf{b}\) for \(\mathbf{x}\), where \(\mathbf{b}\) is not in fact in the range of \(A\). We thus settle for a unique “best” solution \(\tilde{\mathbf{x}}\) such that the error \({\left\lVert {A\tilde{\mathbf{x}} - \mathbf{b}} \right\rVert}\) is minimized.
Geometrically, the solution is given by projecting \(\mathbf{b}\) onto the column space of \(A\). To see why this is the case, define the residual vector \(\mathbf{r} = A\tilde{\mathbf{x}} - \mathbf{b}\). We then seek to minimize \({\left\lVert {\mathbf{r}} \right\rVert}\), which happens exactly when \(\mathbf{r} ^\perp\operatorname{im}A\). But this happens exactly when \(\mathbf{r} \in (\operatorname{im}A)^\perp\), which by the fundamental subspaces theorem, is equivalent to \(\mathbf{r} \in \ker A^T\).
From this, we get the equation \[\begin{align*} A^T \mathbf{r} = \mathbf{0} \\ \implies A^T(A \tilde{\mathbf{x}} - \mathbf{b}) = \mathbf{0}\\ \implies A^TA\tilde{\mathbf{x}} = A^T \mathbf{b}, \end{align*}\]
where the last line is described as the normal equations.
If \(A\) is an \(m\times n\) matrix and is of full rank, so it has \(n\) linearly independent columns, then one can show that \(A^T A\) is nonsingular, and we thus arrive at the least-squares solution \[\begin{align*} \tilde{\mathbf{x}} = (A^TA)^{-1}A^T \mathbf{b} \hfill\blacksquare \end{align*}\]
These equations can also be derived explicitly using Calculus applied to matrices, vectors, and inner products. This requires the use of the following formulas: \[\begin{align*} {\frac{\partial }{\partial \mathbf{x}}\,} {\left\langle {\mathbf{x}},~{\mathbf{a}} \right\rangle} &= \mathbf{a} \\ {\frac{\partial }{\partial \mathbf{x}}\,} {\left\langle {\mathbf{x}},~{\mathbf{A}\mathbf{x}} \right\rangle} &= (A+A^T)\mathbf{x} \end{align*}\]
as well as the adjoint formula \[\begin{align*} {\left\langle {A\mathbf{x}},~{\mathbf{x}} \right\rangle} = {\left\langle {\mathbf{x}},~{A^T \mathbf{x}} \right\rangle}. .\end{align*}\]
From these, by letting \(A=I\) we can derive \[\begin{align*} {\frac{\partial }{\partial \mathbf{x}}\,} {\left\lVert {\mathbf{x}} \right\rVert}^2 = {\frac{\partial }{\partial \mathbf{x}}\,} {\left\langle {\mathbf{x}},~{\mathbf{x}} \right\rangle} = 2\mathbf{x}\\ .\end{align*}\]
The derivation proceeds by solving the equation \[\begin{align*} {\frac{\partial }{\partial \mathbf{x}}\,} {\left\lVert {\mathbf{b} - A\mathbf{x}} \right\rVert}^2 = \mathbf{0}. .\end{align*}\]
Every square matrix is similar to a matrix in Jordan canonical form.
Gram-Schmidt is often computed to find an orthonormal basis for, say, the range of some matrix \(A\). With a small modification to this algorithm, we can write \(A = QR\) where \(R\) is upper triangular and \(Q\) has orthogonal columns.
Why is this useful? One reason is that this also allows for a particularly simple expression of least-squares solutions. If \(A=QR\), then \(R\) will be invertible, and a bit of algebraic manipulation will show that \[\begin{align*} \tilde{\mathbf{x}} = R^{-1}Q^T\mathbf{b}. .\end{align*}\]
How does it work? You simply perform Gram-Schmidt to obtain \(\left\{{\mathbf{u}_i}\right\}\), then \[\begin{align*}Q = [\mathbf{u}_1, \mathbf{u}_2, \cdots ].\end{align*}\]
The matrix \(R\) can then be written as
\[\begin{align*} r_{ij} = \begin{cases} {\left\langle {\mathbf{u}_i},~{\mathbf{x}_j} \right\rangle}, & i\leq j, \\ 0, & \text{else}. \end{cases} \end{align*}\]
Explicitly, this yields the matrix \[\begin{align*} R = \begin{bmatrix} {\left\langle {\mathbf{u}_1},~{\mathbf{x}_1} \right\rangle} & {\left\langle {\mathbf{u}_1},~{\mathbf{x}_2} \right\rangle} & {\left\langle {\mathbf{u}_1},~{\mathbf{x}_3} \right\rangle} & \cdots & \\ 0 & {\left\langle {\mathbf{u}_2},~{\mathbf{x}_2} \right\rangle} & {\left\langle {\mathbf{u}_2},~{\mathbf{x}_3} \right\rangle} & \cdots & \\ 0 & 0 & {\left\langle {\mathbf{u}_3},~{\mathbf{x}_3} \right\rangle} & \cdots & \\ \vdots & \vdots & \vdots & \ddots \\ \end{bmatrix} \end{align*}\]
Textbook: Leon, Linear Algebra with Applications
\[\begin{align*} p(y)y' = q(x) && \hspace{10em} \text{separable} \\ \\ y'+p(x)y = q(x) && \text{integrating factor} \\ \\ y' = f(x,y), f(tx,ty) = f(x,y) && y = xV(x)\text{ COV reduces to separable} \\ \\ y' +p(x)y = q(x)y^n && \text{Bernoulli, divide by $y^n$ and COV $u = y^{1-n}$} \\ \\ M(x,y)dx + N(x,y)dy = 0 && M_y = N_x: \phi(x,y) = c (\phi_x =M, \phi_y = N) \\ \\ P(D)y = f(x,y) && x^ke^{rx} \text{ for each root } \end{align*}\]
Where \(e^{zx}\) yields \(e^{ax}\cos bx, e^{ax}\sin bx\)
General form: \[\begin{align*} y^{(n)} + c_{n-1} y^{(n-1)} + \cdots + c_2y'' + cy' + cy = 0 \\ p(D)y = \prod (D-r_i)^{m_i} y= 0 \end{align*}\] where \(p\) is a polynomial in the differential operator \(D\) with roots \(r_i\):
Real roots: contribute \(m_i\) solutions of the form \[\begin{align*}e^{rx}, xe^{rx}, \cdots, x^{m_i-1}e^{rx}\end{align*}\]
Complex conjugate roots: for \(r=a+bi\), contribute \(2m_i\) solutions of the form \[\begin{align*}e^{(a\pm bi)x}, xe^{(a\pm bi)x}, ~\cdots,~ x^{m_i-1}e^{(a\pm bi)x} \\ = e^{ax}\cos(bx), e^{ax}\sin(bx),~ xe^{ax}\cos(bx), xe^{ax}\sin(bx),~ \cdots,~ \end{align*}\]
Example: by cases, second order equation of the form \[\begin{align*}ay'' + by' + cy = 0\end{align*}\] - Two distinct roots: \(c_1 e^{r_1 x} + c_2 e^{r_2 x}\) - One real root: \(c_1 e^{rx} + c_2 x e^{rx}\) - Complex conjugates \(\alpha \pm i \beta\): \(e^{\alpha x}(c_1 \cos \beta x + c_2 \sin \beta x)\)
General form: \[\begin{align*} y^{(n)} + c_{n-1} y^{(n-1)} + \cdots + c_2y'' + cy' + cy = F(x) \\ p(D)y = \prod (D-r_i)^{m_i} y= 0 \end{align*}\]
Then solutions are of the form \(y_c + y_p\), where \(y_c\) is the solution to the associated homogeneous system and \(y_p\) is a particular solution.
Methods of obtaining particular solutions
Useful Annihilators: \[\begin{align*} &F(x) = p(x): & D^{\deg(p)+1} \\ &F(x) = p(x)e^{ax}: & (D-a)^{\deg(p)+1}\\ &F(x) = \cos(ax) + \sin(ax): & D^2 + a^2\\ &F(x) = e^{ax}(a_0\cos(bx) + b_0\sin(bx)): & (D-z)(D-{\overline{{z}}}) = D^2 -2aD + a^2 + b^2 \\ &F(x) = p(x)e^{ax}\cos(bx) + p(x)e^{ax}\cos(bx): & \left( (D-z)(D-{\overline{{z}}})\right)^{\max(\deg(p), \deg(q))+ 1} \end{align*}\]
General form: \[\begin{align*} \frac{\partial \mathbf{x}(t) }{\partial t} = A\mathbf{x}(t) + \mathbf{b}(t) \iff \mathbf{x}'(t) = A\mathbf{x}(t) + \mathbf{b}(t) \end{align*}\]
General solution to homogeneous equation: \[\begin{align*} c_1\mathbf{x_1}(t) + c_2\mathbf{x_2}(t)+ \cdots +c_n\mathbf{x_n}(t) = \mathbf{X}(t)\mathbf{c} \end{align*}\]
If \(A\) is a matrix of constants: \(\mathbf{x}(t) = e^{\lambda_i t}~\mathbf{v}_i\) is a solution for each eigenvalue/eigenvector pair \((\lambda_i, \mathbf{v}_i)\) - If \(A\) is defective, you’ll need generalized eigenvectors.
Inhomogeneous Equation: particular solutions given by \[\begin{align*} \mathbf{x}_p(t) = \mathbf{X}(t) \int^t \mathbf{X}^{-1}(s)\mathbf{b}(s) ~ds \end{align*}\]
Definitions: \[\begin{align*} H_ { a } ( t ) \mathrel{\vcenter{:}}=\left\{ \begin{array} { l l } { 0 , } & { 0 \leq t < a } \\ { 1 , } & { t \geq a } \end{array} \right.\\ \delta(t): \int_{\mathbb{R}}\delta(t-a)f(t)~dt &= f(a),\quad \int_{\mathbb{R}}\delta(t-a)~dt = 1\\ (f \ast g )(t) &= \int_0^t f(t-s)g(s)~ds \\ L[f(t)] &= L[f] =\int_0^\infty e^{-st}f(t)dt = F(s) .\end{align*}\] Useful property: for \(a\leq b\), \(H_a(t) - H_b(t) = \indic{[a,b]}\). \[\begin{align*} t^n, n\in{\mathbb{N}}\quad&\iff &\frac{n!}{s^{n+1}},\quad &s > 0 \\ t^{-\frac{1}{2}} \quad&\iff &\sqrt{\pi} s^{-\frac{1}{2}}\quad & s>0\\ e^{at} \quad&\iff &\frac{1}{s-a},\quad &s > a \\ \cos(bt) \quad&\iff &\frac{s}{s^2+b^2},\quad &s>0 \\ \sin(bt) \quad&\iff &\frac{b}{s^2+b^2},\quad &s>0 \\ \cosh(bt) \quad&\iff &\frac{s}{s^2 - b^2},\quad &? \\ \sinh(bt) \quad&\iff &\frac{b}{s^2-b^2},\quad &? \\ \delta(t-a) \quad&\iff &e^{-as} \quad& \\ H_a(t) \quad&\iff &s^{-1}e^{-as}\quad& \\ e^{at}f(t) \quad&\iff &F(s-a)\quad & \\ H_a(t)f(t-a) \quad&\iff &e^{-as}F(s)& \\ f'(t) \quad&\iff & sL(f) - f(0) & \\ f''(t) \quad&\iff &s^2L(f) -sf(0) - f'(0) &\\ f^{(n)}(t) \quad&\iff & s^nL(f) - \sum_{i=0}^{n-1} s^{n-1-i}f^{(i)}(0) & \\ f(t)g(t) \quad&\iff &F(s) \ast G(s)\quad & \end{align*}\]
\[\begin{align*} p(y)y' = q(x) & & \hspace{10em} \text{separable} \\ \\ y'+p(x)y = q(x) & & \text{integrating factor} \\ \\ y' = f(x,y), f(tx,ty) = f(x,y) & & y = xV(x)\text{ COV reduces to separable} \\ \\ y' +p(x)y = q(x)y^n & & \text{Bernoulli, divide by $y^n$ and COV $u = y^{1-n}$} \\ \\ M(x,y)dx + N(x,y)dy = 0 & & M_y = N_x: \phi(x,y) = c (\phi_x =M, \phi_y = N) \\ \\ P(D)y = f(x,y) & & x^ke^{rx} \text{ for each root } \end{align*}\]
\[\begin{align*} L[e^{at} f(t)] = \int_0^\infty e^{(a-s)}f(t)dt = F(s-a), .\end{align*}\]
The general technique for solving differential equations with Laplace Transforms: - Take the Laplace Transform of all terms on both sides. - Solve for \(L[y]\) in terms of \(s\). - Attempt an inverse Laplace Transformations - This may involve partial fraction decomposition, completing the square, and splitting numerators to match terms with known inverse transformations.
For a collection of \(n\) functions \(f_i: {\mathbb{R}}^n \to {\mathbb{R}}\), define the \(n\times 1\) column vector \[\begin{align*} W(f_i)(\mathbf{p}) \mathrel{\vcenter{:}}= \begin{bmatrix} f_i(\mathbf{p}) \\ f_i'(\mathbf{p}) \\ f_i''(\mathbf{p}) \\ \vdots \\ f^{(n-1)}(\mathbf{p}) \end{bmatrix} .\end{align*}\]
The Wronskian of this collection is defined as \[\begin{align*} W(f_1, \cdots, f_n)(\mathbf{p}) \mathrel{\vcenter{:}}= \det \begin{bmatrix} \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ W(f_1)(\mathbf{p}) & W(f_2)(\mathbf{p}) & \cdots & W(f_n)(\mathbf{p})\\ \rule[-1ex]{0.5pt}{2.5ex}& \rule[-1ex]{0.5pt}{2.5ex}& & \rule[-1ex]{0.5pt}{2.5ex}\\ \end{bmatrix} .\end{align*}\]
A set of functions \(\left\{{f_i}\right\}\) is linearly independent on \(I \iff \exists x_0 \in I: W(x_0) \neq 0\).
\(W \equiv 0\) on \(I\) does not imply that \(\left\{{f_i}\right\}\) is linearly dependent! Counterexample: \(\left\{{x, x+x^2, 2x-x^2}\right\}\) where \(W \equiv 0\) but \(x+x^2 = 3(x) + (2x-x^2)\) is a linear combination of the other two functions.
The standard form of such equations is \[\begin{align*} y^{(n)} + a_1y^{(n-1)} + a_2y^{(n-2)} + \cdots +a_ny'' + a_{n-1}y' + y = F(x). \end{align*}\]
All solutions will be the sum of the solution to the associated homogeneous equation and a single particular solution.
In the homogeneous case, examine the discriminant of the characteristic polynomial. Three cases arise:
That is, every real root contributes a term of \(ce^{rx}\), while a multiplicity of \(m\) multiplies the solution by a polynomial in \(x\) of degree \(m-1\).
Every pair of complex roots contributes a term \(ce^r(a\cos \omega x + b\sin \omega x)\), where \(r\) is the real part of the roots and \(\omega\) is the complex part.
In the nonhomogeneous case, assume a solution in the most general form of \(F(x)\), and substitute it into the equation to solve for constant terms. For example,
Use to reduce a nonhomogeneous equation to a homogeneous one as a polynomial in the operator \(D\).
\(F(x)\) of the form \(e^{ax}sin(kx)\) can be rewritten as \(e^{(a+ki)x}\)
Burnside’s Lemma
Cauchy’s Theorem
The Sylow Theorems
Galois Theory
More terms: http://mathroughguides.wikidot.com/glossary:abstract-algebra
Order \(p\): One, \(Z_p\)
Order \(p^2\): Two abelian groups, \(Z_{p^2}, Z_p^2\)
Order \(p^3\):
3 abelian \(Z_{p^3}, Z_p \times Z_{p^2}. Z_p^3\),
2 others \(Z_p \rtimes Z_{p^2}\).
Order \(pq\):
Every element in a permutation group is a product of disjoint cycles, and the order is the lcm of the order of the cycles.
The product ideal \(IJ\) is not just elements of the form \(ij\), it is all sums of elements of this form! The product alone isn’t enough.
The intersection of any number of ideals is also an ideal
\[\begin{align*} C(x) = && \left\{{g\in G : gxg^{-1} = x}\right\} && \subseteq G && \text{Centralizer} \\ C_G(x) = && \left\{{gxg^{-1} : g\in G}\right\} && \subseteq G && \text{Conjugacy Class} \\ G_x = && \left\{{g.x : x\in X}\right\} && \subseteq X && \text{Orbit} \\ x_0 = && \left\{{g\in G : g.x = x}\right\} && \subseteq G && \text{Stabilizer} \\ Z(G) = && \left\{{x\in G: \forall g\in G,~ gxg^{-1} = x}\right\} && \subseteq G && \text{Center} \\ \mathrm{Inn}(G) = && \left\{{\phi_g(x) = gxg^{-1} }\right\} && \subseteq {\operatorname{Aut}}(G) && \text{Inner Aut.} \\ \mathrm{Out}(G) = && {\operatorname{Aut}}(G) / \mathrm{Inn}(G) && \hookrightarrow{\operatorname{Aut}}(G) && \text{Outer Aut.} \\ N(H) = && \left\{{g\in G: gHg^{-1} = H}\right\} && \subseteq G && \text{Normalizer} \end{align*}\]
Notation: \(H < G\) a subgroup, \(N < G\) a normal subgroup, concatenation is a generic group operation.
\({\mathbb{Z}}_n\) the unique cyclic group of order \(n\)
\(\mathbf{Q}\) the quaternion group
\(G^n = G\times G \times \cdots G\)
\(Z(G)\) the center of \(G\)
\(o(G)\) the order of a group
\(S_n\) the symmetric group
\(A_n\) the alternating group
\(D_n\) the dihedral group of order \(2n\)
Group Axioms
Definitions:
One step subgroup test: \[\begin{align*} a,b \in H \implies a b^{-1} \in H \\ .\end{align*}\]
Useful isomorphism invariants:
Useful homomorphism invariants
Classification of Abelian Groups \[\begin{align*} G \cong {\mathbb{Z}}_{p_1^{k_1}} \oplus {\mathbb{Z}}_{p_2^{k_2}} \oplus \cdots \oplus {\mathbb{Z}}_{p_n^{k_n}} ,\end{align*}\] where \((p_i, k_i)\) are the set of elementary divisors of \(G\).
Isomorphism Theorems
\[\begin{align*} \phi: G \to G’ \implies && \frac{G}{\ker{\phi}} \cong &~ \phi(G) \\ H {~\trianglelefteq~}G,~ K < G \implies && \frac{K}{H\cap K} \cong &~ \frac{HK}{H} \\ H,K {~\trianglelefteq~}G,~ K < H \implies && \frac{G/K}{H/K} \cong &~ \frac{G}{H} \end{align*}\]
Lagrange’s Theorem: \(H < G \implies o(H) \mathrel{\Big|}o(G)\)
The \(GZ\) Theorem: \(G/Z(G)\) cyclic implies that \(G \in \mathbf{Ab}\).
Orbit Stabilizer Theorem: \(G / x_0 \cong Gx\)
The Class Equation
\[\begin{align*} {\left\lvert {G} \right\rvert} = {\left\lvert {Z(G)} \right\rvert} + \sum_{[x_i] = C_G(x_i)} \left[ G: [x_i] \right] .\end{align*}\]
Useful facts:
Sylow Subgroups:
Big List of Interesting Groups
Chinese Remainder Theorem: \[\begin{align*} {\mathbb{Z}}_{pq} \cong {\mathbb{Z}}_p \oplus {\mathbb{Z}}_q \iff (p,q) = 1 \end{align*}\]
Finding all of the unique groups of a given order: #todo
\[\begin{align*} (a, b) \mathrel{\vcenter{:}}=\gcd(a, b) && \text{the greatest common divisor} \\ {\mathbb{Z}}_n && \text{the ring of integers} \mod n \\ {\mathbb{Z}}_n^{\times}&& \text{the group of units}\mod n .\end{align*}\]
A function \(f:{\mathbb{Z}}\to {\mathbb{Z}}\) is said to be multiplicative iff \[\begin{align*} (a, b) = 1 \implies f(ab) = f(a) f(b) .\end{align*}\]
Every \(n\in {\mathbb{Z}}\) can be written uniquely as \[\begin{align*} n = \prod_{i=1}^m p_i^{k_i} \end{align*}\] where the \(p_i\) are the \(m\) distinct prime divisors of \(n\).
Note that the number of distinct prime factors is \(m\), while the total number of factors is \(\prod_{i=1}^m(k_i + 1)\).
\[\begin{align*} a{~\Bigm|~}b \iff b \equiv 0 \mod a \iff \exists k \text{ such that } ak = b \end{align*}\]
\(\gcd(a, b)\) can be computed by taking prime factorizations of \(a\) and \(b\), intersecting the primes occurring, and taking the lowest exponent that appears. Dually, \(\operatorname{lcm}(a, b)\) can be computed by taking the union and the highest exponent.
\[\begin{align*} xy = \gcd{(x,y)}~\mathrm{lcm}{(x,y)} \end{align*}\]
If \(d\mathrel{\Big|}x\) and \(d\mathrel{\Big|}y\), then \[\begin{align*} \gcd(x,y) &= d\cdot \gcd\qty{ \frac x d, \frac y d} \\ \operatorname{lcm}(x,y) &= d\cdot \operatorname{lcm}\qty{ \frac x d, \frac y d} \end{align*}\]
\[\begin{align*} \gcd(x, y, z) &= \gcd(\gcd(x,y), z) \\ \gcd(x, y) &= \gcd(x\bmod y, y) \\ \gcd(x,y) &= \gcd(x-y, y) .\end{align*}\]
\(\gcd(a, b)\) can be computed via the Euclidean algorithm, taking the final bottom-right coefficient.
Generally concerned with the multiplicative group \(({\mathbb{Z}}_n, \times)\).
The system \[\begin{align*} \begin{array} { c } { x \equiv a _ { 1 } \mod m _ { 1 } } \\ { x \equiv a _ { 2 } \mod m _ { 2 } } \\ { \vdots } \\ { x \equiv a _ { r } \mod m _ { r } } \end{array} \end{align*}\]
has a unique solution \(x \mod \prod m_i \iff (m_i, m_j) = 1\) for each pair \(i,j\), given by \[\begin{align*} x = \sum_{j=1}^r a_j \frac{\prod_i m_i}{m_j} \left[ \frac{\prod_i m_i}{m_j} \right]^{-1}_{\mod m_j} .\end{align*}\]
\[\begin{align*} a^{\phi(p)} \equiv 1 \mod n .\end{align*}\]
\[\begin{align*} x^{p} &\equiv x \mod p \\ x^{p-1} &\equiv 1 \mod p \quad \text{ if } p \nmid a \end{align*}\]
Consider \(ax + by = c\). This has solutions iff \(c = 0 \mod (a,b) \iff \gcd(a,b) \text{ divides } c\).
If \(x\equiv 0 \mod n\), then \(x\equiv 0 \mod p^k\) for all \(p^k\) appearing in the prime factorization of \(n\).
If there are factors of the modulus in the equation, peel them off with addition, using the fact that \(nk \equiv 0 \mod n\). \[\begin{align*} x &\equiv nk + r \mod n \\ &\equiv r \mod n \end{align*}\]
So take \(x=463, n = 4\), then use \(463 = 4\cdot 115 + 4\) to write \[\begin{align*} 463 &\equiv y \mod 4 \\ \implies 4\cdot 115 + 3 &\equiv y \mod 4 \\ \implies 3&\equiv y\mod 4 .\end{align*}\]
For any \(n\), \[\begin{align*} x^k \mod n \equiv (x^{k/d} \bmod n)^d \mod n .\end{align*}\]
\[\begin{align*} 2^{25} &\equiv (2^5 \mod 5)^5 \mod 5 \\ &\equiv 2^5 \mod 5 \\ &\equiv 2 \mod 5 \end{align*}\]
Make things easier with negatives! For example, \(\mod 5\), \[\begin{align*} 4^{25} &\equiv (-1)^{25} \mod 5\\ &\equiv (-1) \mod 5\\ &\equiv 4 \mod 5 \end{align*}\]
\[\begin{align*} xa = xb \mod n \implies a = b \mod \frac{n}{(x,n)} .\end{align*}\]
\[\begin{align*} x\in {\mathbb{Z}}_n^{\times}\iff (x, n) = 1 ,\end{align*}\] and thus \[\begin{align*} {\mathbb{Z}}_n^\times = \left\{{1\leq x \leq n : (x,n) = 1}\right\} \end{align*}\] and \({\left\lvert {{\mathbb{Z}}_n^\times} \right\rvert} = \phi(n)\).
One can reduce equations by dividing through by a unit. Pick any \(x\) such that \(x{~\Bigm|~}a\) and \(x{~\Bigm|~}b\) with \((x,n) = 1\), then \[\begin{align*} a =b \mod n \implies \frac a x = \frac b x \mod n .\end{align*}\]
\[\begin{align*} \phi(n) = {\left\lvert {\left\{{1\leq x \leq n : (x,n) = 1}\right\}} \right\rvert} \end{align*}\]
\[\begin{align*} \phi(1) & = {\left\lvert {\left\{{1}\right\}} \right\rvert} = 1 \\ \phi(2) & = {\left\lvert {\left\{{1}\right\}} \right\rvert} = 1 \\ \phi(3) & = {\left\lvert {\left\{{1,2}\right\}} \right\rvert} = 2 \\ \phi(4) & = {\left\lvert {\left\{{1,3}\right\}} \right\rvert} = 2 \\ \phi(5) & = {\left\lvert {\left\{{1,2,3,4}\right\}} \right\rvert} = 4 \end{align*}\]
\[\begin{align*} \phi(p) & = p-1 \\ \phi(p^k) & = p^{k-1}(p-1) \\ \phi(n) &= n\prod_{i=1}^{?} \qty{1 - {1\over p_i}} \\ n &= \sum_{\tiny d{~\Bigm|~}n} \phi(d) \end{align*}\]
All numbers less than \(p\) are coprime to \(p\); there are \(p^k\) numbers less than \(p^k\) and the only numbers not coprime to \(p^k\) are multiples of \(p\), i.e. \(\left\{{p, p^2, \cdots p^{k-1}}\right\}\) of which there are \(k-1\), yielding \(p^k - p^{k-1}\)
Along with the fact that \(\phi\) is multiplicative, so \((p,q) = 1 \implies \phi(pq) = \phi(p)\phi(q)\), compute this for any \(n\) by taking the prime factorization.
With these properties, one can compute: \[\begin{align*} \phi(n) &= \phi\qty{ \prod_i p_i^{k_i}} \\ &= \prod_i p_i^{k_i-1}(p_i-1) \\ &= n \left(\frac{\prod_i (p_i-1)}{\prod_i p_i}\right) \\ &= n\prod_i \qty{ 1 - \frac{1}{p_i}} \end{align*}\]
\todo[inline]{Check and explain}
\(x\) is a quadratic residue \(\bmod n\) iff there exists an \(a\) such that \(a^2 = x \mod n\).
In \({\mathbb{Z}}_p\), exactly half of the elements (even powers of generator) are quadratic residues.
\[\begin{align*} -1\text{ is a quadratic residue in } {\mathbb{Z}}_p \iff p = 1 \mod 4 .\end{align*}\]
If \(n\) is prime, then \[\begin{align*} a^{n-1} = 1 \mod n \end{align*}\]
\(n\) is prime iff \[\begin{align*} x^2 = 1 \mod n \implies x = \pm 1 \end{align*}\]
Every bounded sequence has a convergent subsequence.
In \({\mathbb{R}}^n, X\) is compact \(\iff X\) is closed and bounded.
Necessity of \({\mathbb{R}}^n\): \(X = ({\mathbb{Z}}, d(x,y) = 1)\) is closed, complete, bounded, but not compact since \(\left\{{1,2,\cdots}\right\}\) has no convergent subsequence
Converse holds iff bounded is replaced with totally bounded
Notation: \(\left\{{a_n}\right\}_{n\in{\mathbb{N}}}\) is a sequence, \(\displaystyle\sum_{i\in{\mathbb{N}}} a_i\) is a series.
A sequence \(\left\{{x_j}\right\}\) converges to \(L\) iff \[\begin{align*} \forall \varepsilon > 0,\, \exists N > 0 \text{ such that } \quad n\geq N \implies {\left\lvert {x_n - L} \right\rvert} < \varepsilon .\end{align*}\]
\[\begin{align*} b_n \leq a_n \leq c_n \text{ and } b_n,c_n \to L \implies a_n \to L \end{align*}\]
If \(\left\{{a_j}\right\}\) monotone and bounded, then \(a_j \to L = \lim\sup a_i < \infty\).
\({\left\lvert {a_m - a_n} \right\rvert} \to 0 \in {\mathbb{R}}\implies \left\{{a_i}\right\}\) converges.
Is the sequence bounded?
Use algebraic properties of limits
Epsilon-delta definition
Algebraic properties and manipulation:
Limits commute with \(\pm, \times, \operatorname{Div}\) and \(\lim C = C\) for constants.
E.g. Divide all terms by \(n\) before taking limit
Clear denominators
A series is an function of the form \[\begin{align*} f(x) = \sum_{j=1}^\infty c_j x^j .\end{align*}\]
\[\begin{align*} \sum_{k=1}^\infty k^p &< \infty &&\iff p \leq 1 \\ \sum_{k=1}^\infty \frac{1}{k^p} &< \infty &&\iff p > 1 \\ \sum_{k=1}^\infty \frac{1}{k} &= \infty && \end{align*}\]
\[\begin{align*} \sum_{n=1}^\infty \frac{1}{n^2} & < \infty \\ \sum_{n=1}^\infty \frac{1}{n^3} & < \infty \\ \sum_{n=1}^\infty \frac{1}{n^\frac{3}{2}} & < \infty \\ \sum_{n=1}^\infty \frac{1}{n!} & = e \\ \sum_{n=1}^\infty \frac{1}{c^n} & = \frac{c}{c-1} \\ \sum_{n=1}^\infty (-1)^n \frac{1}{c^n} & = \frac{c}{c+1} \\ \sum_{n=1}^\infty (-1)^n \frac{1}{n} & = \ln 2 \end{align*}\]
\[\begin{align*} \sum_{n=1}^\infty \frac{1}{n} = \infty \\ \sum_{n=1}^\infty \frac{1}{\sqrt n} = \infty \end{align*}\]
Useful reference: http://math.hawaii.edu/~ralph/Classes/242/SeriesConvTests.pdf
\(a_n\to 0\) does not imply \(\sum a_n < \infty\). Counterexample: the harmonic series.
Absolute convergence \(\implies\) convergence
\[\begin{align*} \limsup a_i \to 0 \implies \sum a_i \text{ converges } \end{align*}\]
\[\begin{align*} R =\lim_{n\to\infty} {\left\lvert {\frac{a_{n+1}}{a_n}} \right\rvert} \end{align*}\]
\[\begin{align*} R = \limsup_{n \to \infty} \sqrt[n]{{\left\lvert {a_n} \right\rvert}} \end{align*}\] - \(R < 1\): convergent - \(R > 1\): divergent - \(R = 1\): inconclusive
\[\begin{align*} f(n) = a_n \implies \sum a_n < \infty \iff \int_1^\infty f(x) dx < \infty \end{align*}\]
\[\begin{align*} \lim_{n\to\infty}\frac{a_n}{b_n} = L < \infty \implies \sum a_n < \infty \iff \sum b_n < \infty \end{align*}\]
\[\begin{align*} a_n \downarrow 0 \implies \sum (-1)^n a_n < \infty \end{align*}\]
\[\begin{align*} \sum_{n=1}^\infty {\left\lVert {f_n} \right\rVert}_\infty < \infty \implies \exists f\text{ such that } {\left\lVert { \sum_{n=1}^\infty f_n - f} \right\rVert}_\infty \to 0 \end{align*}\] In other words, the series converges uniformly.
Slogan: Convergence of the sup norms implies uniform convergence"
The \(M\) in the name comes from defining \(\sup\left\{{f_k(x)}\right\} \mathrel{\vcenter{:}}= M_n\) and requiring \(\sum {\left\lvert {M_n} \right\rvert} < \infty\).
Some Pattern Recognition:
Use the fact that \[\begin{align*} \lim_{k\to\infty} {\left\lvert {\frac{a_{k+1}x^{k+1}}{a_kx^k}} \right\rvert} = {\left\lvert {x} \right\rvert}\lim_{k\to\infty} {\left\lvert {\frac{a_{k+1}}{a_k}} \right\rvert} < 1 \implies \sum a_k x^k < \infty ,\end{align*}\] so take \(L \mathrel{\vcenter{:}}=\lim_{k\to\infty} \frac{a_{k+1}}{a_k}\) and then obtain the radius as \[\begin{align*} R = \frac{1}{L} = \lim_{k\to\infty} {a_k \over a_{k+1}} \end{align*}\]
A function is continuously differentiable iff \(f\) is differentiable and \(f'\) is continuous.
Conventions:
\[\begin{align*} f && \text{a functional }{\mathbb{R}}^n \to {\mathbb{R}}\\ \mathbf{f} && \text{a function } {\mathbb{R}}^n\to {\mathbb{R}}^m \\ A, E, U, V && \text{open sets} \\ A' && \text{the limit points of }A \\ \mkern 1.5mu\overline{\mkern-1.5muA\mkern-1.5mu}\mkern 1.5mu && \text{the closure of }A \\ A^\circ\mathrel{\vcenter{:}}= A\setminus A' && \text{the interior of }A \\ K && \text{a compact set} \\ \mathcal{R}_A && \text{the space of Riemann integral functions on }A \\ C^j(A) && \text{the space of }j\text{ times continuously differentiable functions }f: {\mathbb{R}}^n \to {\mathbb{R}}\\ \left\{{f_n}\right\} && \text{a sequence of functions} \\ \left\{{x_n}\right\} && \text{a sequence of real numbers}\\ f_n \to f && \text{pointwise convergence} \\ f_n \rightrightarrows f && \text{uniform convergence} \\ x_n \nearrow x && x_i\leq x_j \text{ and }x_j\text{ converges to }x \\ x_n \searrow x && x_i\geq x_j \text{ and }x_j\text{ converges to }x \\ \sum_{k\in {\mathbb{N}}} f_k && \text{a series}\\ D(f) && \text{the set of discontinuities of }f .\end{align*}\]
Summary for GRE:
Limits,
Continuity,
Boundedness,
Compactness,
Definitions of topological spaces,
Lipschitz continuity
Sequences and series of functions.
Know the interactions between the following major operations:
The derivative of a continuous function need not be continuous
A continuous function need not be differentiable
A uniform limit of differentiable functions need not be differentiable
A limit of integrable functions need not be integrable
An integrable function need not be continuous
An integrable function need not be differentiable
\[\begin{align*} f,g\text{ differentiable on } [a,b] \implies \exists c\in[a,b] : \left[f ( b ) - f ( a ) \right] g' ( c ) = \left[g ( b ) - g ( a )\right] f' ( c ) \end{align*}\]
?
Then consider the following possible ways to commute various limiting operations:
Does taking the derivative of the integral of a function always return the original function? \[\begin{align*} [\frac{\partial}{\partial x}, \int dx]:\qquad\qquad \frac{\partial}{\partial x}\int f(x, t)dt =_? \int \frac{\partial}{\partial x} f(x, t)dt\\ \text{} \end{align*}\]
Answer: Sort of (but possibly not).
Counterexample: \[\begin{align*} f(x) = \begin{cases} 1 & x > 0 \\ -1 & x \leq 0 \end{cases} \implies \int f \approx {\left\lvert {x} \right\rvert}, \end{align*}\] which is not differentiable. (This is remedied by the so-called “weak derivative”)
Sufficient Condition: If \(f\) is continuous, then both are always equal to \(f(x)\) by the FTC.
Is the derivative of a continuous function always continuous? \[\begin{align*} [\frac{\partial}{\partial x}, \lim_{x_i\to x}]:\qquad\qquad \lim_{x_i \to x} f'(x_n) =_? f'(\lim_{x_i\to x} x) \end{align*}\] Answer: No.
Counterexample: \[\begin{align*} f ( x ) = \left\{ \begin{array} { l l } { x ^ { 2 } \sin ( 1 / x ) } & { \text { if } x \neq 0 } \\ { 0 } & { \text { if } x = 0 } \end{array} \right. \implies f ^ { \prime } ( x ) = \left\{ \begin{array} { l l } { 2 x \sin \left( \frac { 1 } { x } \right) - \cos \left( \frac { 1 } { x } \right) } & { \text { if } x \neq 0 } \\ { 0 } & { \text { if } x = 0 } \end{array} \right. \end{align*}\] which is discontinuous at zero.
Sufficient Condition: There doesn’t seem to be a general one (which is perhaps why we study \(C^k\) functions).
Is the limit of a sequence of differentiable functions differentiable and the derivative of the limit?
\[\begin{align*} [\frac{\partial}{\partial x}, \lim_{f_n \to f}]:\qquad\qquad \lim_{f_n \to f}\frac{\partial}{\partial x}f_n(x) =_? \frac{\partial }{\partial x}\lim_{f_n \to f} f_n(x) \end{align*}\] Answer: Super no – even the uniform limit of differentiable functions need not be differentiable!
Counterexample: \(f_n(x) = \frac{\sin(nx)}{\sqrt{n}} \rightrightarrows f = 0\) but \(f_n' \not\to f' = 0\)
Sufficient Condition: \(f_n \rightrightarrows f\) and \(f_n \in C^1\).
Is the limit of a sequence of integrable functions integrable and the integral of the limit?
\[\begin{align*} [\int dx, \lim_{f_n \to f}](f):\qquad\qquad \lim_{f_n \to f}\int f_n(x) dx =_? \int \lim_{f_n \to f} f_n(x) dx \end{align*}\]
Answer: No.
Counterexample: Order \({\mathbb{Q}}\cap[0,1]\) as \(\left\{{q_i}\right\}_{i\in{\mathbb{N}}}\), then take \[\begin{align*} f_n(x) = \sum_{i=1}^n \indic{q_n} \to \indic{{{\mathbb{Q}}\cap[0,1]}} \end{align*}\] where each \(f_n\) integrates to zero (only finitely many discontinuities) but \(f\) is not Riemann-integrable.
Sufficient Condition: - \(f_n \rightrightarrows f\), or - \(f\) integrable and \(\exists M: \forall n, {\left\lvert {f_n} \right\rvert} < M\) (\(f_n\) uniformly bounded)
Is the integral of a continuous function also continuous?
\[\begin{align*} [\int dx, \lim_{x_i \to x}]:\qquad\qquad \lim_{x_i \to x} F(x_i) =_? F(\lim_{x_i \to x} x_i) \end{align*}\]
Answer: Yes.
Proof: \(|f(x)| < M\) on \(I\), so given \(c\) pick a sequence \(x\to c\). Then \[\begin{align*} {\left\lvert {f(x)} \right\rvert} < M \implies \left\vert \int_c^x f(t)dt \right\vert < \int_c^x M dt \implies {\left\lvert {F(x) - F(c)} \right\rvert} < M(b-a) \to 0 \end{align*}\]
Is the limit of a sequence of continuous functions also continuous?
\[\begin{align*} [\lim_{x_i \to x}, \lim_{f_n \to f}]: \qquad\qquad \lim_{f_n \to f}\lim_{x_i \to x} f(x_i) =_? \lim_{x_i \to x}\lim_{f_n \to f} f_n(x_i)\\ \text{}\\ \end{align*}\]
Answer: No.
Counterexample: \(f_n(x) = x^n \to \delta(1)\)
Sufficient Condition: \(f_n \rightrightarrows f\)
Does a sum of differentiable functions necessarily converge to a differentiable function?
\[\begin{align*} \left[\frac{\partial}{\partial x}, \sum_{f_n}\right]: \qquad\qquad \frac{\partial}{\partial x} \sum_{k=1}^\infty f_k =_? \sum_{k=1}^\infty \frac{\partial}{\partial x} f_k \\ \text{} \\ \text{}\\ \end{align*}\]
Answer: No.
Counterexample: \(f_n(x) = \frac{\sin(nx)}{\sqrt{n}} \rightrightarrows 0 \mathrel{\vcenter{:}}= f\), but \(f_n' = \sqrt{n}\cos(nx) \not\to 0 = f'\) (at, say, \(x=0\))
Sufficient Condition: When \(f_n \in C^1, \exists x_0: f_n(x_0) \to f(x_0)\) and \(\sum {\left\lVert {f_n'} \right\rVert}_\infty < \infty\) (continuously differentiable, converges at a point, and the derivatives absolutely converge)
\[\begin{align*} f\text{ continuous } \iff \lim_{x \to p} f(x) = f(p) \end{align*}\]
\[\begin{align*} f:(X, d_X) \to (Y, d_Y) \text{ continuous } \iff \forall \varepsilon,~ \exists \delta \mathrel{\Big|}~ d_X(x,y) < \delta \implies d_Y(f(x), f(y)) < \varepsilon \end{align*}\]
\[\begin{align*} f(x) = \sin\qty{ \frac{1}{x} } \implies 0\in D(f) \end{align*}\]
The Dirichlet function is nowhere continuous: \[\begin{align*} f(x) = \indic{{\mathbb{Q}}} \end{align*}\]
The following function continuous at infinitely many points and discontinuous at infinitely many points: \[\begin{align*} f(x) = \begin{cases} 0 & x\in{\mathbb{R}}\setminus{\mathbb{Q}}\\ \frac{1}{q} & x = \frac{p}{q} \in {\mathbb{Q}} \end{cases} \end{align*}\] Then \(f\) is discontinuous on \({\mathbb{Q}}\) and continuous on \({\mathbb{R}}\setminus{\mathbb{Q}}\).
\(f\) is continuous on \({\mathbb{Q}}\):
\(f\) is discontinuous on \({\mathbb{R}}\setminus{\mathbb{Q}}\):
There are no functions that are continuous on \({\mathbb{Q}}\) but discontinuous on \({\mathbb{R}}-{\mathbb{Q}}\)
A continuous function on a compact space attains its extrema.
\[\begin{align*} f'(p) \mathrel{\vcenter{:}}=\frac{\partial f}{\partial x}(p) = \lim_{x\to p} \frac{f(x) - f(p)}{x-p} \end{align*}\]
\[\begin{align*} C^\infty \subsetneq C^k \subsetneq \text{ differentiable } \subsetneq C^0 \subset \mathcal{R}_K .\end{align*}\]
Proof that \(f\) differentiable \(\implies f \in C^0\): \[\begin{align*} f(x) - f(p) = \frac{f(x)-f(p)}{x-p}(x-p) \stackrel{\tiny\mbox{hypothesis}}{=} f'(p)(x-p) \stackrel{\tiny\mbox{$x\to p$}}\rightrightarrows 0 \end{align*}\]
Bold are assumed hypothesis, regular text is the strongest conclusion you can reach, strikeout denotes implications that aren’t necessarily true.
\[\begin{align*} f' && f && \therefore f && F \\ \hline \\ \cancel{\text{exists}} && \mathbf{continuous} && \text{K-integrable} && \text{exists} \\ \cancel{\text{continuous}} && \mathbf{differentiable} && \text{continuous} && \text{exists} \\ \cancel{\text{exists}} && \mathbf{integrable} && \cancel{\text{continuous}} && \text{differentiable} \\ \end{align*}\]
Explanation of items in table:
If \(F\) is a differentiable function on the interval \([a,b]\), and \(F'\) is bounded and continuous a.e., then \(F' \in L_R([a, b])\) and \[\begin{align*} \forall x\in [a,b]: \int_a^x F'(t)~dt=F(x)-F(a) \end{align*}\]
Suppose \(f\) bounded and continuous a.e. on \([a,b]\), and define \[\begin{align*} F(x) \mathrel{\vcenter{:}}=\int_a^x f(t)~dt \end{align*}\] Then \(F\) is absolutely continuous on \([a,b]\), and for \(p \in [a,b]\), \[\begin{align*} f \in C^0(p) \implies F \text{ differentiable at } p,~ F'(p) = f(p), \text{ and } F' \stackrel{\tiny\mbox{a.e}}{=} f. \end{align*}\]
The Dirichlet function is Lebesgue integrable but not Riemann integrable: \[\begin{align*} f(x) = \indic{x \in {\mathbb{Q}}} \end{align*}\]
Define \[\begin{align*} s_n(x) \mathrel{\vcenter{:}}=\sum_{k=1}^n f_k(x) \end{align*}\] and \[\begin{align*} \sum_{k=1}^\infty f_k(x) \mathrel{\vcenter{:}}=\lim_{n\to\infty} s_n(x), \end{align*}\] which can converge pointwise, absolutely, uniformly, or not all.
If \(\limsup_{k\in {\mathbb{N}}} {\left\lvert {f_k(x)} \right\rvert} \neq 0\) then \(f_k\) is not convergent.
If \(f\) is injective, then \(f'\) is nonzero in some neighborhood of ???
\[\begin{align*} f_n \to f = \lim_{n\to\infty} f_n .\end{align*}\] Summary: \[\begin{align*} \lim_{f_n \to f} \lim_{x_i \to x} f_n(x_i) \neq \lim_{x_i \to x} \lim_{f_n \to f} f_n(x_i) .\end{align*}\]
\[\begin{align*} \lim_{f_n \to f} \int_I f_n \neq \int_I \lim_{f_n \to f} f_n .\end{align*}\]
Pointwise convergence is strictly weaker than uniform convergence.
\(f_n(x) = x^n\) on \([0, 1]\) converges pointwise but not uniformly.
\[\begin{align*} f_n \text{ continuous} \centernot\implies f\mathrel{\vcenter{:}}=\lim_n f_n \text{ is continuous} .\end{align*}\]
Take \[\begin{align*} f_n(x) = x^n,\quad f_n(x) \to \indic[x = 1] .\end{align*}\]
\[\begin{align*} f_n \text{ differentiable} &\centernot\implies f'_n \text{ converges} \\ f'_n \text{ converges} &\not\implies \lim f'_n = f' .\end{align*}\]
Take \[\begin{align*} f_n(x) = \frac{1}{n}\sin(n^2 x) \to 0,&& \text{but } f'_n = n\cos(n^2 x) \text{ does not converge} .\end{align*}\]
\[\begin{align*} f_n\in \mathcal{R}_I \centernot\implies\lim_{f_n \to f} \int_I f_n \neq \int_I \lim_{f_n \to f} f_n .\end{align*}\]
May fail to converge to same value, take \[\begin{align*} f_n(x) = \frac{2n^2x}{(1+n^2x^2)^2} \to 0 && \text{but }\int_0^1 f_n = 1 - \frac{1}{n^2 + 1} \to 1\neq 0 .\end{align*}\]
Notation: \[\begin{align*} f_n \rightrightarrows f= \lim_{n\to\infty} f_n \text{ and } \sum_{n=1}^\infty f_n \rightrightarrows S .\end{align*}\]
Summary: \[\begin{align*} \lim_{x_i \to x} \lim_{f_n \to f} f_n(x_i) = \lim_{f_n \to f} \lim_{x_i \to x} f_n(x_i) = \lim_{f_n \to f} f_n(\lim_{x_i \to x} x_i) .\end{align*}\]
\[\begin{align*} \lim_{f_n \to f} \int_I f_n = \int_I \lim_{f_n \to f} f_n .\end{align*}\]
\[\begin{align*} \sum_{n=1}^\infty \int_I f_n = \int_I \sum_{n=1}^\infty f_n .\end{align*}\]
“The uniform limit of a(n) \(x\) function is \(x\)”, for \(x \in\) {continuous, bounded}
Equivalent to convergence in the uniform metric on the metric space of bounded functions on \(X\): \[\begin{align*} f_n \rightrightarrows f \iff \sup_{x\in X} {\left\lvert {f_n(x) - f(x)} \right\rvert} \to 0 .\end{align*}\]
\(f_n \rightrightarrows f \implies f_n \to f\) pointwise
\(f_n\) continuous \(\implies f\) continuous
\(f_n \in C^1\), \(\exists x_0: f_n(x_0) \to f(x_0)\), and \(f'_n \rightrightarrows g\) \(\implies f\) differentiable and \(f' = g~\) (i.e. \(f'_n \to f'\))
\(f_n\) integrable \(\implies f\) integrable and \(\int f_n \to \int f\)
\(f_n\) bounded \(\implies f\) bounded
\(f_n \rightrightarrows f_n \centernot\implies f'_n\) converges
\(f_n' \rightrightarrows f' \centernot\implies f_n \rightrightarrows f\)
\(\left\{{x_i}\right\} \to p \implies\) every subsequence also converges to \(p\).
Every convergent sequence in \(X\) is a Cauchy sequence.
The converse need not hold in general, but if \(X\) is complete, every Cauchy sequence converges. An example of a Cauchy sequence that doesn’t converge: take \(X={\mathbb{Q}}\) and set \(x_i = \pi\) truncated to \(i\) decimal places.
If any subsequence of a Cauchy sequence converges, the entire sequence converges.
\[\begin{align*} d(x,y) &\geq 0 && \text{Positive}\\ d(x,y) &= 0 \iff x = y && \text{Nondegenerate}\\ d(x,y) &= d(y,x) && \text{Symmetric}\\ d(x,y) &\leq d(x,p) + d(p,y) \quad \forall p && \text{Triangle Inequality} .\end{align*}\]
?
?
Open Set Characterization: Arbitrary unions and finite intersections of open sets are open.
Closed Set Characterization: Arbitrary intersections and finite unions of closed sets are closed.
The best source of examples and counterexamples is the open/closed unit interval in \(\mathbb{R}\). Always test against these first!
If \(f\) is a continuous function. the preimage of every open set is open and the preimage of every closed set is closed.
In \({\mathbb{R}}\), singleton sets and finite discrete sets are closed.
A singleton set can be written \[\begin{align*} \left\{{p_0}\right\} = (-\infty, p) \cup(p, \infty) .\end{align*}\] A finite discrete set \(\left\{{p_0}\right\}\), which wlog (by relabeling) can be assumed to satisfy \(p_0 < p_1 < \cdots\), can be written \[\begin{align*} \left\{{p_0, p_1, \cdots, p_n}\right\} = (-\infty, p_0) \cup(p_0, p_1) \cup\cdots \cup(p_n, \infty) .\end{align*}\]
This yields a good way to produce counterexamples to continuity.
In \(\mathbb{R}\), singletons are closed. This means any finite subset is closed, as a finite union of singleton sets!
If \(X\) is a compact metric space, then \(X\) is complete and bounded.
If \(X\) complete and \(X \subset Y\), then \(X\) closed in \(Y\).
The converse generally does not hold, and completeness is a necessary condition. Counterexample: \({\mathbb{Q}}\subset {\mathbb{Q}}\) is closed but \({\mathbb{Q}}\subset{\mathbb{R}}\) is not.
If \(X\) is compact, then \(Y \subset X \implies Y\) is compact \(\iff\) \(Y\) closed.
A topological space \(X\) is sequentially compact iff every sequence \(\left\{{x_n}\right\}\) has a subsequence converging to a point in \(X\).
If \(X\) is a metric space, \(X\) is compact iff \(X\) is sequentially compact.
Note that in general, neither form of compactness implies the other.
There are functions differentiable only at a single point. Example: \[\begin{align*} f(x) = \begin{cases} x^2 & x\in QQ\\ -x^2 & x\in {\mathbb{R}}\setminus{\mathbb{Q}} \end{cases} .\end{align*}\]
This is discontinuous everywhere except for \(x=0\), and you can compute \[\begin{align*} \lim_{h\to 0} {f(x+h) - f(x) \over h}\Big|_{x=0} = \lim_{h\to 0} \begin{cases} h & x\in {\mathbb{Q}}\\ -h & x\in {\mathbb{R}}\setminus{\mathbb{Q}} \end{cases} =0 .\end{align*}\]
The product of two non-differentiable functions can be differentiable: take \(f(x) = g(x) = {\left\lvert {x} \right\rvert}\) which are not differentiable at \(x=0\), then \(fg(x) = {\left\lvert {x} \right\rvert}^2\) is differentiable at \(x=0\).
A continuous function that is zero on a dense set \(A\subset X\) is identically zero.
Since \(A\) is dense, for any \(x\in X\setminus A\) take a sequence \(\left\{{x_n}\right\}\) in \(A\) converging to \(x\). Then \(0 = f(x_n) \to f(x)\) implies \(f(x) = 0\).
Epsilon-neighborhood
Limit Point
Isolated Point
Perfect
Interior
Exterior
Closed sets
Open sets
Boundary
Closure
Dense
Connected
Disconnected
Path Connected
Simply Connected
Totally Disconnected
Hausdorff
Compact
The space \(\left\{{\frac{1}{n}}\right\}_{n\in {\mathbb{N}}}\).
List of properties preserved by continuous maps:
Checking if a map is homeomorphism:
\[\begin{align*} L^2(X) &= \left\{{f: X \to {\mathbb{R}}: \int_{\mathbb{R}}f(x) ~dx < \infty}\right\} &&\text{square integrable functions}\\ {\left\langle {g},~{f} \right\rangle}_{2} &= \int_{\mathbb{R}}g(x)f(x) ~dx &&\text{the } L^2 \text{ inner product}\\ {\left\lVert {f} \right\rVert}_2^2 &= {\left\langle {f},~{f} \right\rangle} = \int_{\mathbb{R}}f(x)^2 ~dx &&\text{norm}\\ E[{\,\cdot\,}] &= {\left\langle {{\,\cdot\,}},~{f} \right\rangle} &&\text{expectation}\\ (\tau_{p}f)(x) &= f(p- x) &&\text{translation}\\ (f \ast g)(p) &= \int_{\mathbb{R}}f(t)g(p-t)~dt = \int_{\mathbb{R}}f(t)(T_{p}g)(t) ~dt = {\left\langle {T_pg},~{f} \right\rangle} &&\text{convolution}\\ \end{align*}\]
For \((\Sigma, E, \mu)\) a probability space with sample space \(\Sigma\) and probability measure \(\mu\), a random variable is a function \(X: \Sigma \to {\mathbb{R}}\)
For any \(U \subset {\mathbb{R}}\), given by the relation \[\begin{align*} P(X \in U) = \int_U f(x) ~dx \\ \implies P(a \leq X \leq b) = \int_a^b f(x) ~dx \end{align*}\]
The antiderivative of the PDF \[\begin{align*} F(x) = P(X \leq x) = \int_{-\infty}^x f(x) ~dx .\end{align*}\] Yields \({\frac{\partial F}{\partial x}\,} = f(x)\)
\[\begin{align*} E[X] \mathrel{\vcenter{:}}={\left\langle {\text{id}},~{f} \right\rangle} = \int_{\mathbb{R}}x f(x) ~dx .\end{align*}\] Also denoted \(\mu_X\).
\[\begin{align*} E\left[\sum_{i\in{\mathbb{N}}} a_i X_i\right] = \sum_{i\in{\mathbb{N}}} a_i E[X_i] .\end{align*}\] Does not matter whether or not the \(X_i\) are independent.
\[\begin{align*} \mathrm{Var}(X) &= E[(X - E[X])^2] \\ &= \int (x - E[X])^2 f(x) ~dx \\ &= E[X^2] - E[X]^2 \\ &\mathrel{\vcenter{:}}=\sigma^2(X) \end{align*}\] where \(\sigma\) is the standard deviation. Can also defined as \({\left\langle {(\text{id}- {\left\langle {\text{id}},~{f} \right\rangle})^2},~{f} \right\rangle}\) Take the portion of the id function in the orthogonal complement of \(f\), squared, and project it back onto \(f\)?
\[\begin{align*} \mathrm{Var}(aX + b) &= a^2\mathrm{Var}(X) \\ \mathrm{Var}\qty{ \sum_{\mathbb{N}}X_i } &= \sum_i \mathrm{Var}(X_i) + 2 \sum_{i < j}\mathrm{Cov}(X_i, X_j) .\end{align*}\]
\[\begin{align*} \mathrm{Cov}(X,Y) &= E[(X-\mu_X)(Y-\mu_Y)] \\ &= E[XY] - E[X]E[Y] \end{align*}\]
\[\begin{align*} \mathrm{Cov}(X, X) &= \mathrm{Var}(X) \\ \mathrm{Cov}(aX, Y) &= a\mathrm{Cov}(X,Y) \\ \mathrm{Cov}(\sum_{{\mathbb{N}}} X_i, \sum_{\mathbb{N}}Y_j) &= \sum_i \sum_j\mathrm{Cov}(X_i, Y_j) \\ .\end{align*}\]
\[\begin{align*} k! \sim k^\frac{k+1}{2}e^{-k} \sqrt{2\pi} .\end{align*}\]
\[\begin{align*} P(X \geq a) \leq \frac 1 a E[X] \end{align*}\]
One-sided Markov: \[\begin{align*} P(X \in N_\varepsilon(\mu)) = 2\frac{\sigma^2}{\sigma^2 + a^2} .\end{align*}\]
\[\begin{align*} P({\left\lvert {X - \mu} \right\rvert} \geq a) \leq \left( \frac \sigma k \right)^2 \end{align*}\]
Apply Markov to the variable \((X-\mu)^2\) and \(a=k^2\)
For \(X_i\) i.i.d., \[\begin{align*} \lim_n \frac{\sum_{i=1}^n X_i - n\mu}{\sigma \sqrt n} \sim N(0, 1) .\end{align*}\]
\[\begin{align*} P(\frac{1}{n} \sum X_i \rightarrow \mu) = 1 .\end{align*}\]
For all \(t > 0\), \[\begin{align*} P(X \in N_\varepsilon(a)^c) \leq 2 e^{-at}M_X(t) \\ .\end{align*}\]
\[\begin{align*} E[f(X)] \geq f(E[X]) \end{align*}\]
\[\begin{align*} H(X) = - \sum p_i \ln p_i \end{align*}\]