LMS

先构造线性函数进行拟合： $h(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2$
定义cost function： $J(\theta) = \frac{1}{2} \sum (h(x^{i}) - y^i)^2$
因此，可使用梯度下降进行求解
- gradient descent algorithm： $\theta_i := \theta_j - \alpha\frac{\partial} {\partial \theta_j} J(\theta)$
- LMS update rule(Widrow-Hoff learning rule)： $\theta_j := \theta_j +\alpha (y^i - h(x^i))x_j^i$
- Bath gradient descent
  - This method looks at every example in the entire training set on every step
- stochastic gradient descent (also incremental gradient descent)
  - Repeatedly run through the training set, and each time we encounter a training example, we update the parameters according to the gradient of the error with respect to that single training example only
Matrix derivatives
- $\theta = (X^TX)^{-1} X^T y$
Probabilistic interpretation
- assume $\epsilon \sim N(0,\sigma^2), y^i = \theta^T x^i +\epsilon^i$
- 可通过likelihood的方式得到最优解其实就是最小化least square cost
  - $\min \frac{1}{2}\sum (y^i - \theta^T x^i)^2$
  - 注意，这里对 $\theta$ 的假设中，与正态分布中的方差大小无关。
Locally weighted linear regression
- 这是一种非参数模型
  - 在普通的线性拟合中，我们的参数是固定的
  - 而在locally weighted线性模型中，参数是随着训练集合进行增长的（Loess），可以不让我们担心如何来确定feature（在局部进行线性回归）
- $\min \sum w^i(y^i - \theta^T x^i)^2$
- $w^i = \exp(-\frac{(x^i - x)^2}{2\tau ^2})$ $w^{i} = exp (- 2 τ ^{2} ( x ^{i} - x ) ^{2})$
  - 离该样本越近，则权重越大（趋近1），可以看成在局部进行线性回归（局部权重基本不变）
- 与KNN的关系？

logistic regression

sigmoid function: $g(z) = \frac{1}{1+e^{-z}}$ $g (z) = 1 + e ^{- z} 1$
- $g'(z) = g(z)(1-g(z))$
同样可以用likelihood得到
- $l(\theta) = \sum y^i \log h(x^i) + (1-y^i)\log (1-h(x^i))$
using gradient ascent
- $\theta _j := \theta_j + \alpha(y^i - h(x^i)) x_j^i$
同时，我们还可以用Newton法来找最小值
- 我们想要找极大值点，也就是一阶导数为0，因此： $\theta: \theta - \frac{l^{'}(\theta)}{l^{''}(\theta)}$
- 写成矩阵的形式： $\theta := \theta - H^{-1}\bigtriangledown l(\theta)$ ，其中Hessian矩阵为 $H_{ij} = \frac{\partial l^2(\theta)}{\partial \theta_i \partial \theta_j}$
- 在数据量较小时比gradient ascent收敛快，但计算Hessian困难

Generalized Linear Model

首先介绍exponential family：
- $p(y,\eta) = b(y) \exp (\eta^TT(y) - a(\eta))$
很容易可以证明，无论是分类问题（multinomial）还是回归问题（正态分布），都可以转换为指数族的形式
通过指数族的形式，我们可以发现，在线性假设下，我们之前的logistic回归的sigmoid方程其实就是给定x下y的Bernoulli分布。
因此，为什么我们之前要选择sigmoid函数呢？
- 因为其广义线性模型的指数族形式的充分统计量的canonical形式就是sigmoid函数。
softmax function
- 可通过multinomial的指数族形式可以得到： $\phi_i = \frac{e^\eta_i}{\sum e^\eta_j}$
- 可以认为是logistic regression的推广

线性回归与分类

LMS

logistic regression

Generalized Linear Model

results matching ""

No results matching ""