What is derivatives?

A derivative of a function at any point tells us how much a minute increment to the argument of the function will increment the value of the function

To be clear, what we want is not differentiable, but how the change effects the outputs.

Based on the fact that at a fine enough resolution, any smooth, continuous function is locally linear at any point. So we can express like this $\Delta y=\alpha \Delta x$

$\Delta y=\alpha_{1} \Delta x_{1}+\alpha_{2} \Delta x_{2}+\cdots+\alpha_{D} \Delta x_{D}$

The partial derivative $\alpha_i$ gives us how $y$ increments when only $x_i$ is incremented
It can be expressed as: $\Delta y=\nabla_{x} y \Delta x$
where

$\nabla_{\mathrm{x}} y=\left[\frac{\partial y}{\partial x_{1}} \quad \cdots \quad \frac{\partial y}{\partial x_{D}}\right]$

Optimization

Three different critical point with zero derivative
The second derivative is
- $\ge 0$ at minima
- $\le 0$ at maxima
- $=0$ at inflection points

$d f(X)=\nabla_{X} f(X) d X$

The gradient is the transpose of the derivative $\nabla_{X} f(X)^{T}$ (give us the change in $f(x)$ for tiny variations in $X$ )
This is a vector inner product
- $d f(x)$ is max if $dX$ is aligned with $\nabla_{X} f(X)^{\mathrm{T}}$
- $\angle\left(\nabla_{X} f(X)^{\mathrm{T}}, d X\right)=0$
The gradient is the direction of fastest increase in $f(x)$
Hessian

$\nabla_{X} f(X)=0$

Compute the Hessian Matrix at the candidate solution and verify that
- Hessian is positive definite (eigenvalues positive) -> to identify local minima
- Hessian is negative definite (eigenvalues negative) -> to identify local maxima

$x^{k+1}=x^{k}+\eta^{k} \nabla_{x} f\left(x^{k}\right)^{T}$

$x^{k+1}=x^{k}-\eta^{k} \nabla_{x} f\left(x^{k}\right)^{T}$

Choose steps
- fixed step size
- iteration-dependent step size: critical for fast optimization