线性回归

线性回归笔记

Posted by jiang on November 8, 2018

最小二乘法

  • “普通最小二乘法”(ordinary least squares)。

一般解参数

特点

  • 建模速度快,不需要很复杂的计算,在数据量大的情况下依然运行速度很快。
  • 可以根据系数给出每个变量的理解和解释。
  • 对异常值很敏感。

局部加权线性回归

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def lwlr(testPoint,xArr,yArr,k=1.0):
    xMat = mat(xArr)
    yMat = mat(yArr).T
    m = shape(xMat)[0]
    wei = mat(eye(m))
    for j in range(m):
        diffMat = testPoint - xMat[j,:]
        wei[j,j] = exp(diffMat * diffMat.T/(-2.0 * k ** 2))
    xTx = xMat.T * (wei * xMat)
    ws = xTx.I * (xTx.T * (wei * yMat))
    return testPoint * ws

def lwlrTest(testArr,xArr,yArr,k=1.0):
    m = shape(testArr)[0]
    yHat = zeros(m)
    for i in range(m):
        yHat[i] = lwlr(testArr[i],xArr,yArr,k)
    return yHat

岭回归

  • 岭回归,又称脊回归、吉洪诺夫正则化(Tikhonov regularization),是对不适定问题(ill-posed problem)进行回归分析时最经常使用的一种正则化方法
1
2
3
4
5
def ridgeRegree(xMat,yMat,lam=0.2):
    xTx = xMat.T * xMat
    denom = xTx + eye(shape(xMat)[1]) * lam
    ws = denom.I * (xMat.T * yMat)
    return ws

sklearn 代码示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
from numpy import *

x = array([[1],[1],[3],[4]])
y = array([1,2,3,5])

def standRegres(X,Y):
    X = mat(X)
    Y = mat(Y).T
    xTx = X.T * X
    ws = xTx.I * (X.T * Y)
    return ws

# ws = standRegres(x,y)
# py = mat(x) * ws
# print(ws)
# print(y)
# print(py)
#
import matplotlib.pyplot as plt
#
# fig = plt.figure()
# ax = fig.add_subplot(111)
# matx = mat(x)
# maty = mat(y)
# ax.scatter(matx[:,1].flatten().A[0],maty.T[:,0].flatten().A[0])
# xcopy = matx.copy()
# xcopy.sort(0)
# yhat = xcopy * ws
# ax.plot(xcopy[:,1],yhat)
# plt.show()

from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(x,y)
print(lr.predict([[7]]))
print(lr.coef_)

fig = plt.figure()
ax = fig.add_subplot(111)
k = arange(0,10)
v = [i* lr.coef_[0] + lr.intercept_ for i in k]
ax.scatter(x[:,0].flatten(),y)
ax.plot(k,v)
plt.show()