# Multivariable Calculus

:::{.theorem title="Key Theorem"}
Given a function $f: \RR^n \to \RR$, let $S_k \da \ts{\vector p\in \RR^n \st f(\vector p) = k}$ denote the level set for $k\in \RR$.
Then
\[  
\nabla f(\vector p) \in S_k\perp
.\]
:::

## Notation
\[
\vector{v} &= [v_1, v_2, \cdots] 
&& \text{a vector} \\ \\
\vector{e}_i &= [0, 0, \cdots, \overbrace{1}^{i \text{th term}}, \cdots, 0] 
&& \text{the } i \text{th standard basis vector} \\ \\
\phi: \RR^n &\to \RR 
&& \text{a functional on } \RR^n\\
\phi(x_1, x_2, \cdots) &= \cdots  && \\ \\
\mathbf{F}: \RR^n &\to \RR^n 
&& \text{a multivariable function}\\
\mathbf{F}(x_1,x_2,\cdots) &= 
[\mathbf{F}_1(x_1, x_2, \cdots), \mathbf{F}_2(x_1, x_2, \cdots), \cdots, \mathbf{F}_n(x_1, x_2, \cdots)]
\]


## Partial Derivatives

:::{.definition title="Partial Derivative"}
For a functional $f:\RR^n\to \RR$, the **partial derivative** of $f$ with respect to $x_i$ is
\[
\dd{f}{x_i}(\mathbf p) 
\da \lim_{h\to 0}\frac{f(\mathbf p + h\mathbf e_i)  - f(\mathbf p)}{h}
\]

:::

:::{.example title="$n= 2$"}
\[
f: \RR^2 &\to \RR \\
\dd{f}{x}(x_0,y_0) 
&= \lim_{h \to 0} \frac{f(x_0+h, y_0) - f(x_0,y_0)}{h}
\]
:::

## General Derivatives

:::{.definition title="General definition of differentiability"}
A function $f: \RR^n \to \RR^m$ is **differentiable** iff there exists a linear transformation $D_f: \RR^n \to \RR^m$ such that the following limit exists
\[  
\lim _ { \mathbf x \rightarrow \vector{p} } \frac { \left\| f (\mathbf x ) - f (\vector{p} ) - D_f (\mathbf x - \vector{p} ) \right\| } { \| \mathbf x - \vector{p} \| } = 0
.\]
:::

:::{.remark}
$D_f$ is the "best linear approximation" to $f$.
:::


:::{.definition title="Jacobian"}
When $f$ is differentiable, $D_f$ can be given in coordinates by  
\[
(D_f)_{ij} = \dd{f_i}{x_j}
\]


This yields the **Jacobian** of $f$:
\[
D_f(\vector \vector p) 
\begin{bmatrix}
\vertbar & \vertbar & & \vertbar \\
\nabla f_1(\vector p) & \nabla f_2(\vector p) & \cdots & \nabla f_m(\vector p) \\
\vertbar & \vertbar & & \vertbar
\end{bmatrix}^T
= \left[ \begin{array} { c c c c } { \frac { \partial f _ { 1 } } { \partial x _ { 1 } } ( \vector{p} ) } & { \frac { \partial f _ { 1 } } { \partial x _ { 2 } } ( \vector{p} ) } & { \ldots } & { \frac { \partial f _ { 1 } } { \partial x _ { n } } ( \vector{p} ) } \\ { \frac { \partial f _ { 2 } } { \partial x _ { 1 } } ( \vector{p} ) } & { \frac { \partial f _ { 2 } } { \partial x _ { 2 } } ( \vector{p} ) } & { \dots } & { \frac { \partial f _ { 2 } } { \partial x _ { n } } ( \vector{p} ) } \\ { \vdots } & { \vdots } & { \ddots } & { \vdots } \\ { \frac { \partial f _ { m } } { \partial x _ { 1 } } ( \vector{p} ) } & { \frac { \partial f _ { m } } { \partial x _ { 2 } } ( \vector{p} ) } & { \cdots } & { \frac { \partial f _ { m } } { \partial x _ { n } } ( \vector{p} ) } \end{array} \right].
\]
:::

:::{.remark}
This is equivalent to 

- Taking the gradient of each component $f_i$ of $f$, 
- Evaluating $\nabla f_i$ at $\vector p$, 
- Forming a matrix using these as the columns, and 
- Transposing the resulting matrix. 
:::

:::{.definition title="Hessian"}
For a function $f: \RR^n \to \RR$, the **Hessian** is a generalization of the second derivative, and is given in coordinates by 
\[
(H_f)_{ij} = \dd{^2f}{x_i x_j}
\]

Explicitly, we have
\[
H_f(\vector p) 
=
\begin{bmatrix}
\vertbar & \vertbar & & \vertbar \\
D \nabla f_1(\vector p) & D\nabla f_2(\vector p) & \cdots & D\nabla f_m(\vector p) \\
\vertbar & \vertbar & & \vertbar
\end{bmatrix}^T
=
\left[ \begin{array} { c c c } { \frac { \partial ^ { 2 } f } { \partial x _ { 1 } \partial x _ { 1 } } ( \mathbf { a } ) } & { \dots } & { \frac { \partial ^ { 2 } f } { \partial x _ { 1 } \partial x _ { n } } ( \mathbf { a } ) } \\ { \vdots } & { \ddots } & { \vdots } \\ { \frac { \partial ^ { 2 } f } { \partial x _ { n } \partial x _ { 1 } } ( \mathbf { a } ) } & { \cdots } & { \frac { \partial ^ { 2 } f } { \partial x _ { n } \partial x _ { n } } ( \mathbf { a } ) } \end{array} \right].
\]

:::

:::{.remark}
Mnemonic: make matrix with $\nabla f$ as the columns, and then differentiate variables left to right.
:::

## The Chain Rule

:::{.example title="How to expand a partial derivative"}
Write out tree of dependent variables:

\begin{tikzcd}
 & u \arrow[dd] \arrow[rr] \arrow[rrdd] &  & x \\
z \arrow[rd] \arrow[ru] \arrow[rrru] \arrow[rrrd] &  &  &  \\
 & v \arrow[rr] \arrow[rruu] &  & y
\end{tikzcd}

Then sum each possible path.

Let subscripts denote which variables are held constant, then
\[
\left(\dd{z}{x}\right)_y &= \left(\dd{z}{x}\right)_{u,y,v} \\
& + \left(\dd{z}{v}\right)_{x,y,u} \left(\dd{v}{x}\right)_y \\
& + \left(\dd{z}{u}\right)_{x,y,v} \left(\dd{u}{x}\right)_{v,y}  \\
& + \left(\dd{z}{u}\right)_{x,y,v} \left(\dd{u}{v}\right)_{x,y} \left(\dd{v}{x}\right)_y
\]

:::

## Approximation

Let $z = f(x,y)$, then to approximate near $\vector p_0 = \tv{x_0, y_0}$,
\[  
f(\vecc{Aquamarine} x) &\approx f(\vector p) + \nabla f (\vecc{Aquamarine} x - \vector p_0) \\
\implies f(x,y) &\approx f(\vector p) + f_x(\vector p)(x-x_0) + f_y(\vector p)(y-y_0) \\
.\]


## Optimization

### Classifying Critical Points

:::{.definition title="Critical Points"}
Critical points of $f$ given by points $\vector p$ such that the derivative vanishes: 
\[
\crit(f) = \ts{\vector p\in \RR^n \st D_f({\mathbf p}) = 0}
\]
:::

:::{.proposition title="Second Derivative Test"}
\envlist
1. Compute 
  \[
  \abs{H_f(\mathbf p)} \definedas  \left| \begin{array} { l l } { f _ { x x } } & { f _ { x y } } \\ { f _ { y x } } & { f _ { y y } } \end{array} \right| ({ \mathbf p  })
  \]
2. Check by cases:

  - $\abs{H(\mathbf p)} = 0$: No conclusion
  - $\abs{H(\mathbf p)} < 0$: Saddle point
  - $\abs{H(\mathbf p)} > 0$:
    - $f_{xx}(\mathbf p) > 0 \implies$ local min
    - $f_{xx}(\mathbf p) < 0 \implies$ local max

:::

:::{.remark}
What's really going on?

- Eigenvalues have same sign $\iff$ positive definite or negative definite

  - Positive definite $\implies$ convex $\implies$ local min

  - Negative definite $\implies$ concave $\implies$ local max
:::

- Extrema occur on boundaries, so parameterize each boundary to obtain a function in one less variable and apply standard optimization techniques to yield critical points. Test all critical points to find extrema.
- If possible, use constraint to just reduce equation to one dimension and optimze like single-variable case.

\todo[inline]{Add examples}

### Lagrange Multipliers

The setup:
$$ 
\text{Optimize } f(\mathbf x) &\quad \text{subject to } g(\mathbf x) = c \\ \implies \nabla f &= \lambda \nabla g
$$
1. Use this formula to obtain a system of equations in the components of $x$ and the parameter $\lambda$.

2. Use $\lambda$ to obtain a relation involving only components of $\mathbf{x}$.

3. Substitute relations **back into constraint** to obtain a collection of critical points.

4. Evaluate $f$ at critical points to find max/min.

\todo[inline]{Add examples}

## Change of Variables

For any $f: \RR^n \to \RR^n$ and region $R$,
\[
\int _ { g ( R ) } f ( \mathbf { x } ) ~d V = \int _ { R } (f  \circ g) ( \mathbf { x } )  \cdot \abs{D_g  ( \mathbf { x })} ~d V
\]