作业九反向传播算法（BP）

算法推导

神经网络

令input layer 到hidden layer的权重为\(w_{ih}\)，hidden layer和output layer的权重为\(w_{ho}\)

前向算法伪代码：

Algorithm1: forward
input: X
output: \(y\)
1. hidden_in=\(w_{ih}X+b_1\) 2. hidden_out=\(\sigma(hidden\_in)\) 3. output_in=\(w_{ho}X+b_2\) 4. output_out=\(\sigma(output\_out)\) #根据情况可有可无，无的话则output_in即为output_out 5. y=output_out 6. 7. return \(y\)

output的输出结果为：\(y_j\)，目标结果为\(t_j\)

令误差函数为\(E=\frac{1}{2}\sum_{j=0}^{n}{(y_j-t_j)^2}\)，其中\(i\)为output layer的神经元个数

下面给出反向传播的伪代码：

Algorithm1: backprop
input: X,t,y,learn_rate
output: null
1. \(\delta_1=(y-t)\sigma^{'}(y)\) #output layer的梯度 2. \(\delta_2=\sigma^{'}(hidden\_output)\sum_{i=0}^{n}{w_{ho}\delta_1}\) #求解hidden layer的梯度 3. 4. \(w_{ih}-=learn\_rate\delta_2\) 5. \(w_{ho}-=learn\_rate\delta_1\) 6. \(b_{2}-=learn\_rate\delta_2\) 7. \(b_{1}-=learn\_rate\delta_1\) 8. 9. return

其中梯度求导原因如下： \[ \begin{array}{left} \frac{\partial E}{\partial w_{ij}}=\frac{\partial E}{\partial a{j}}\frac{\partial a_j}{\partial w_{ij}} \\ 其中：\\ E=\frac{1}{2}\sum_{j=0}^{n}{(y_j-t_j)^2}\\ a_j=\sum_i{w_{ij}z_i}\\ z_j=\sigma(a_j)\\ 那么：\\ \frac{\partial E}{\partial a{j}}=y_j-t_j \\ \delta \equiv\frac{\partial E}{\partial a{j}}\\ 则：\\ \frac{\partial E}{\partial w_{ij}}=\delta\frac{\partial a_j}{\partial w_{ij}}=\delta\sigma^{'}(a_j) \end{array} \]

拟合sin曲线

在实现中，output layer输出时不能采用Sigmoid函数激活，否则无法出现正常结果：

import numpy as np
import matplotlib.pyplot as plt


class NN:
    def __init__(self, input_neurons, hidden_neurons, output_neurons, learning_rate, epochs):
        self.input_neurons = input_neurons
        self.hidden_neurons = hidden_neurons
        self.output_neurons = output_neurons
        self.epochs = epochs
        self.lr = learning_rate

        self.wih = np.random.normal(0, 1, (self.hidden_neurons, self.input_neurons))
        self.bih = 0
        self.who = np.random.normal(0, 1, (self.output_neurons, self.hidden_neurons))
        self.bho = 0

    def activation(self, Z):
        return 1 / (1 + np.exp(-Z))

    def sigmoid_derivative(self, Z):
        return self.activation(Z) * (1 - self.activation(Z))

    # 前向传播
    def forward(self, input_list):
        inputs = np.array(input_list, ndmin=2)
        hidden_inputs = np.dot(self.wih, inputs) + self.bih
        hidden_outputs = self.activation(hidden_inputs)
        final_inputs = np.dot(self.who, hidden_outputs) + self.bho
        final_outputs=final_inputs
        return final_outputs

    # 反向传播
    def backprop(self, inputs_list, targets_list):
        inputs = np.array(inputs_list, ndmin=2)
        tj = np.array(targets_list, ndmin=2)
        hidden_inputs = np.dot(self.wih, inputs) + self.bih
        hidden_outputs = self.activation(hidden_inputs)
        final_inputs = np.dot(self.who, hidden_outputs) + self.bho
        yj = final_inputs
        output_errors = (yj - tj)
        hidden_errors = np.dot(self.who.T, output_errors)*self.sigmoid_derivative(hidden_outputs)
        self.who -= self.lr * np.dot(output_errors , np.transpose(hidden_outputs))
        self.wih -= self.lr * np.dot(hidden_errors, np.transpose(inputs))

        # updating bias
        self.bho -= self.lr * output_errors
        self.bih -= self.lr * hidden_errors


    def fit(self, inputs_list, targets_list):
        loss_list=[]
        for epoch in range(self.epochs):
            self.backprop(inputs_list, targets_list)
            y_pre=self.predict(inputs_list)
            loss=np.sqrt(np.sum(np.square(y_pre-targets_list)))/2
            loss_list.append(loss)
            print(f"Epoch {epoch}/{self.epochs} ,loss:{loss}")
        return loss_list

    def predict(self, X):
        outputs = self.forward(X).T
        return outputs


if __name__ == "__main__":

    plt.style.use('seaborn')
    # 创建数据
    x = np.linspace(0, 1, 100)
    y = np.sin(2 * np.pi * x)

    # 添加随机噪声
    np.random.seed(20)
    y_noisy = y + 0.05 * np.random.normal(size=x.shape)
    nn = NN(1, 50, 1, 0.02,200)
    loss=nn.fit(x, y_noisy)

    x_test = np.linspace(0, 1, 100)
    y_pred = nn.predict(x_test)
    plt.figure()
    plt.plot(x_test, y_pred, label='Regressor', color='#FFA628')
    plt.plot(x, y, label='True function', color='g')
    plt.scatter(x, y_noisy, edgecolor='b', s=20, label='Noisy samples')
    plt.legend()
    plt.tight_layout()
    plt.show()

    plt.figure()
    plt.plot(np.linspace(3,len(loss),len(loss)-3),loss[3:])
    plt.title('loss curve')
    plt.tight_layout()
    plt.show()

经过200次迭代后，结果如下：

loss曲线如下：

如果迭代2000次：

f624b4755450bca033368ecadb12ed7

可以看到有明显过拟合。

如果输出结果，采用Sigmoid函数处理：

原因很简单，因为Sigmoid函数将数值缩放到了0，导致结果均为0，如果将误差符号反向，那么将得到一条y=1的直线作业八神经网络初步

作业八神经网络初步

sklearn中人工神经网络（ANN）主要提供的是多层感知机（MLP），其中有回归和分类两种，回归感知机还有能够自动实现交叉验证的版本。

MLPClassifier二分类

流程如下：

导入iris数据，取后两类
标准化
PCA得到x_pca，y
初始化MLP，训练
绘制分类面

代码如下：

from sklearn.neural_network import MLPClassifier
from sklearn import datasets
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.svm import l1_min_c
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from matplotlib.colors import ListedColormap

plt.style.use('seaborn')

def plot_decision_surface(model, X, y, grid_size=0.02):
    # 获取数据范围
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5

    # 生成网格点
    xx, yy = np.meshgrid(np.arange(x_min, x_max, grid_size), np.arange(y_min, y_max, grid_size))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # 绘制分类面和训练数据
    cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
    cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
    plt.figure()
    plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold, edgecolor='k')
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.xlabel('Sepal length')
    plt.ylabel('Sepal width')
    plt.title('Iris classification using MLP')
    plt.show()

iris = datasets.load_iris()
X = iris.data
y = iris.target

X = X[y != 0]
y = y[y != 0]
# 标准化
standard=StandardScaler()
X=standard.fit_transform(X)
# PCA
pca=PCA(n_components=2)
x_pca=pca.fit_transform(X)
# 初始化mlp
mlp=MLPClassifier(solver='lbfgs',hidden_layer_sizes=(5,2))

mlp.fit(x_pca,y)

plot_decision_surface(mlp, x_pca, y)

MLPClassifier多分类

流程如下：

导入iris数据
标准化
PCA得到x_pca，y
初始化MLP，训练
绘制分类面

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import plot_confusion_matrix
from matplotlib.colors import ListedColormap

plt.style.use('seaborn')

def plot_decision_surface(model, X, y, grid_size=0.02):
    # 获取数据范围
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5

    # 生成网格点
    xx, yy = np.meshgrid(np.arange(x_min, x_max, grid_size), np.arange(y_min, y_max, grid_size))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # 绘制分类面和训练数据
    cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
    cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
    plt.figure()
    plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold, edgecolor='k')
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.xlabel('Sepal length')
    plt.ylabel('Sepal width')
    plt.title('Iris classification using MLP')
    plt.show()


# 加载数据
iris = load_iris()
X = iris.data[:, :2]  # 只使用前两个特征
y = iris.target

# 划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 创建多层感知器模型
mlp = MLPClassifier(hidden_layer_sizes=(10,), max_iter=1000, random_state=42)

# 训练模型
mlp.fit(X_train, y_train)

# 绘制分类面
plot_decision_surface(mlp, X_train, y_train)

MLPRegressor

程序流程如下：

创建数据X，y
为y添加随机噪声
初始化MLP
训练MLP
绘制预测效果图

代码如下：

import numpy as np
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor

plt.style.use('seaborn')
# 创建数据
x = np.linspace(-5, 5, 100)
y = np.sin(x)

# 添加随机噪声
np.random.seed(42)
y_noisy = y + 0.2 * np.random.normal(size=x.shape)

# 创建多层感知器模型
mlp = MLPRegressor(hidden_layer_sizes=(50,), max_iter=1000, random_state=42)

# 训练模型
mlp.fit(x.reshape(-1, 1), y_noisy)

# 预测并绘制结果
x_test = np.linspace(-5.5, 5.5, 100)
y_pred = mlp.predict(x_test.reshape(-1, 1))
plt.plot(x_test, y_pred, label='MLP Regressor',color='#FFA628')
plt.plot(x, y, label='True function',color='g')
plt.scatter(x, y_noisy, edgecolor='b', s=20, label='Noisy samples')
plt.legend()
plt.title('MLP Regressor')
plt.tight_layout()
plt.show()

作业七逻辑回归分类

由于都是从sklearn中调用，并不涉及什么复杂算法，因此下面采用列表的方式描述程序作用

二分类逻辑回归

首先说明下面程序的目的：

1.观察不同惩罚项系数对应参数变化
2.观察不同惩罚项系数对应错误率
3.可视化二分类结果

流程如下：

导入iris数据，取后两类
标准化
PCA得到x_pca，y
初始化逻辑回归
调整参数c，分别训练逻辑回归得到权重参数和错误率
绘制错误率和权重参数曲线
绘制分类面

代码如下：

from sklearn import datasets
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.svm import l1_min_c
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

plt.style.use('seaborn')

iris = datasets.load_iris()
X = iris.data
y = iris.target
# 选出前两类
X = X[y != 0]
y = y[y != 0]
# 画出图像
plt.figure()
plt.scatter(X[y == 2, 0], X[y == 2, 1], color='r', label='0')
plt.scatter(X[y == 1, 0], X[y == 1, 1], color='g', label='1')
plt.legend()
plt.tight_layout()
plt.show()
# 训练
cs = l1_min_c(X, y, loss='log') * np.logspace(0, 7, 16)
# 初始化逻辑回归
clf = LogisticRegression(
    penalty='l1',
    solver='liblinear',
    tol=1e-6,
    max_iter=int(1e6),
    warm_start=True,
    intercept_scaling=1000.0,
)
# 错误率
errors = []
# 各个特征对参数
coefs_ = []
# 拟合观察参数
for c in cs:
    clf.set_params(C=c)
    clf.fit(X, y)
    y_pre = clf.predict(X)
    error = 0
    for i in range(len(y)):
        if y_pre[i] != y[i]:
            error = error + 1
    errors.append(error)
    coefs_.append(clf.coef_.ravel().copy())
errors = np.array(errors) / len(y)
coefs_ = np.array(coefs_)
# 绘制参数变化
plt.plot(np.log10(cs), coefs_, marker="o")
ymin, ymax = plt.ylim()
plt.xlabel("log(C)")
plt.ylabel("Coefficients")
plt.title("Logistic Regression Path")
plt.axis("tight")
plt.show()
# 绘制错误率
plt.figure()
plt.plot(np.log10(cs), errors, marker='^', color='orange')
plt.ylabel('error rate')
plt.xlabel('log(C)')
plt.show()
# 最优拟合
clf.set_params(C=1)
clf.fit(X, y)

# PCA可视化
pca = PCA(n_components=2)
X1 = pca.fit_transform(X)
clf.set_params(C=1)
clf.fit(X1, y)

plt.figure()
plt.scatter(X1[y == 2, 0], X1[y == 2, 1], color='r', label='2')
plt.scatter(X1[y == 1, 0], X1[y == 1, 1], color='g', label='1')
plt.plot([min(X1[:, 0]), max(X1[:, 0])], [-(min(X1[:, 0]) * clf.coef_[0, 0] + clf.intercept_[0]) / clf.coef_[0, 1],
                                          -(max(X1[:, 0]) * clf.coef_[0, 0] + clf.intercept_[0]) / clf.coef_[0, 1]],
         label='classier', linewidth=3.0, color='black', linestyle='--')
plt.legend()
plt.xlabel('component 1')
plt.ylabel('component 2')
plt.show()

结果如下：

不同惩罚项对应的参数变化，可以看出当惩罚项小于1时，对于模型的稀疏化效果较好。

不同惩罚项系数对应的分类错误率，选择为10时效果较好；

分类结果效果如图：

多分类逻辑回归

首先说明下面程序的目的：

1.绘制多分类的分类面

流程如下：

导入iris数据
标准化
PCA得到x_pca，y
初始化逻辑回归，训练逻辑回归
绘制分类面

from sklearn import datasets
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.svm import l1_min_c
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

plt.style.use('seaborn')

iris = datasets.load_iris()
X = iris.data
y = iris.target
# 标准化
standard=StandardScaler()
X=standard.fit_transform(X)
# PCA
pca=PCA(n_components=2)
x_pca=pca.fit_transform(X)
# 初始化逻辑回归
clf = LogisticRegression(
    solver='liblinear',
    max_iter=int(1e6),
    warm_start=True,
    intercept_scaling=1000.0
)
clf.fit(x_pca,y)
coef,intercept=np.array(clf.coef_),np.array(clf.intercept_)

plt.figure()
plt.scatter(x_pca[y == 2, 0], x_pca[y == 2, 1], color='r', label='2')
plt.scatter(x_pca[y == 1, 0], x_pca[y == 1, 1], color='g', label='1')
plt.scatter(x_pca[y == 0, 0], x_pca[y == 0, 1], color='b', label='0')
plt.plot([min(x_pca[:, 0]), max(x_pca[:, 0])], [-(min(x_pca[:, 0]) * coef[0, 0] + intercept[0]) / coef[0, 1],
                                          -(max(x_pca[:, 0]) * coef[0, 0] + intercept[0]) / coef[0, 1]],
         label='classier', linewidth=3.0, color='black', linestyle='--')

plt.plot([min(x_pca[:, 0]), max(x_pca[:, 0])], [-(min(x_pca[:, 0]) * coef[1, 0] + intercept[1]) / coef[1, 1],
                                          -(max(x_pca[:, 0]) * coef[1, 0] + intercept[1]) / coef[1, 1]],
         label='classier', linewidth=3.0, color='black', linestyle='--')
plt.plot([min(x_pca[:, 0]), max(x_pca[:, 0])], [-(min(x_pca[:, 0]) * coef[2, 0] + intercept[2]) / coef[2, 1],
                                          -(max(x_pca[:, 0]) * coef[2, 0] + intercept[2]) / coef[2, 1]],
         label='classier', linewidth=3.0, color='black', linestyle='--')
plt.ylim(-5,5)
plt.legend()
plt.xlabel('component 1')
plt.ylabel('component 2')
plt.show()

作业六

Bayesian Linear Regression

所有算法都放在同一个Python文件下，函数算法伪代码如下：

Algorithm1: fit
input: X,t,degree, \(\alpha,\beta,\phi\)
output: \(m_n,S_n\)
1. \(if\) len(X.shape) == 1: # 预先判断形状 2. X=X.reshape(-1,1) 3. \(if\) len(t.shape) ==1: 4. t=t.reshape(-1,1) 5. \(S_n^{-1}=\alpha I+\beta\phi(X)^T\phi(X)\) # 计算协方差的逆 6. \(S_n=\)np.linalg.inv(\(S_n^{-1}\)) 7. \(m_n=\beta S_n \phi(X)^Tt\) # 计算均值 8. 9. return \(m_n,S_n\)

Algorithm2: predict
input: X, \(\phi,m_n,S_n\)
output: \(mean,\sigma^2\)
1. \(\sigma^2\) = \(\frac{1}{\beta}+\phi(X)S_n\phi(X)^T\) 2. mean=\(\phi(x)m_n\) 3. 4. return mean, \(\sigma ^2\)

代码实现：

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from scipy.stats import multivariate_normal

from Polyfeature import Polyfeature
class BLR:
    def __init__(self,n_features,alpha=0,beta=1,phi='linear'):
        self.n_features=n_features
        self.alpha=alpha
        self.beta=beta
        self.phi=phi
    '''
    alpha 表示w的先验的精度
    beta  表示数据的精度
    phi   表示基函数，可选参数为：linear、gaussain、polynomial、sigmoid
    '''

    def gaussian(self,x, mu, sigma):
        return 1 / (sigma * np.sqrt(2 * np.pi)) * np.exp(-(x - mu) ** 2 / (2 * sigma ** 2))


    def fit(self,X,t):
        # 检查向量是否是二维数组并转化
        if len(X.shape) == 1:
            X=X.reshape(-1,1)
        if len(t.shape) == 1:
            t=t.reshape(-1,1)
        # 是否要0均值？
        global phi_x
        global x
        if self.phi=='linear':
            phi_x=X
            x=np.dot(phi_x,phi_x.T)
        if self.phi=='polynomial':
            polyf=Polyfeature(self.n_features)
            phi_x=polyf.fit_transform(X)
            x=np.dot(phi_x.T,phi_x)
        # Sn的逆
        Sn_inverse=self.alpha*np.eye(x.shape[0])+self.beta*x
        # 计算期望
        mn=self.beta*np.dot(np.dot(np.linalg.inv(Sn_inverse),phi_x.T),t)

        self.phi_x=phi_x
        self.Sn_inverse=Sn_inverse
        self.mn=mn

    def predict(self,X):
        if self.phi=='linear':
            if len(X.shape)==1:
                X = X.reshape(-1, 1)
                sigma2=1/self.beta+np.dot(np.dot(self.phi_x.T,np.linalg.inv(self.Sn_inverse)),self.phi_x)
                t_pre=self.gaussian(X,self.mn,np.sqrt(sigma2))
                return t_pre

        if self.phi=='polynomial':
            polyf=Polyfeature(self.n_features)
            X=polyf.fit_transform(X)

        sigma2 = np.diag(1 / self.beta + X @ np.linalg.inv(self.Sn_inverse) @ X.T).reshape(-1,1)
        mean=np.dot(X,self.mn)

        upper = mean + np.sqrt(sigma2)
        lower = mean - np.sqrt(sigma2)

        return mean,sigma2,upper,lower

回归效果：

90e39f72c8da404c0738ddf8072aae4

不同样本数量影响拟合效果

   x_1=np.linspace(0,1,100)
    X_data=[np.array([0.5]),
                     np.array([0.3,0.6]),
                     np.array([0.25,0.5,0.75,0.8]),
                     np.linspace(0,1,60)]
j=1
plt.figure(figsize=(10,6))
for i in X_data:
    plt.subplot(2,2,j)
    t=np.sin(2*np.pi*i)+np.random.normal(loc=0,scale=0.2,size=i.shape)
    if j<2:
        blr = BLR(5, phi='polynomial',alpha=0.2)
        plt.title('alpha=0.2')
    else:
        blr = BLR(4, phi='polynomial')
        plt.title('without alpha')
    j=j+1
    # 绘制sin原图像
    plt.plot(x_1,np.sin(2*np.pi*x_1),label='sin(x)')
    # 绘制数据散点
    plt.scatter(i, t, color='g',label='samples')
    blr.fit(i, t)
    plt.plot(x_1, blr.predict(x_1)[0], color='black',label='prediction')
    # 绘制阴影部分
    plt.fill_between(x_1.squeeze(), blr.predict(x_1)[3].squeeze(), blr.predict(x_1)[2].squeeze(), alpha=0.4,label='varience')

    plt.xlabel('X')
    plt.ylabel('t')
plt.legend()
plt.tight_layout()
plt.show()

同时，我调整了样本数量较小时的\(\alpha\)值，下图是无正则化的情况（样本数量为1时，必须有正则化否则出现奇异矩阵）：

交叉验证调优

因为要做调优，所以固定生成数据，设定种子为0。

在固定\(\alpha=0,beta=10\)时，首先确定degree：

best degree is 9 min RMSE in test sets: 0.2014522467351943

接下来在\(degrer=9\)的情况下，寻找\(\alpha\)的参数最优情况：

parameter_best is 2.848035868435799e-05

最后，按以上两种情况搜寻参数\(\beta\)：

beta_best is 14.84968262254465

最终参数选择结果为：

\(\alpha\)	2.848035868435799e-05
\(degree\)	9
\(\beta\)	14.84968262254465

最终拟合结果：

模拟拟合过程（似然、先验）

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal, norm
from numpy.random import seed, uniform, randn
from numpy.linalg import inv

def f(x, a):
    return a[0] + a[1] * x

a = np.array([-0.3, 0.5])
N = 30
sigma = 0.2
X = uniform(-1, 1, (N, 1))
T = f(X, a) + randn(N, 1) * sigma

beta = (1 / sigma) ** 2 # precision
alpha = 2.0


def posterior_w(phi, t, S0, m0):
    SN = inv(inv(S0) + beta * Phi.T @ Phi)
    mN = SN @ (inv(S0) @ m0 + beta * Phi.T @ t)
    return SN, mN


def sample_vals(X, T, ix):
    x_in = X[ix]
    Phi = np.c_[np.ones_like(x_in), x_in]
    t = T[[ix]]
    return Phi, t


def plot_prior(m, S, liminf=-1, limsup=1, step=0.05, ax=plt, **kwargs):
    grid = np.mgrid[liminf:limsup + step:step, liminf:limsup + step:step]
    nx = grid.shape[-1]
    z = multivariate_normal.pdf(grid.T.reshape(-1, 2), mean=m.ravel(), cov=S).reshape(nx, nx).T

    return ax.contourf(*grid, z, cmap='jet',interpolation='nearest',**kwargs)


def plot_sample_w(mean, cov, size=10, ax=plt):
    w = np.random.multivariate_normal(mean=mean.ravel(), cov=cov, size=size)
    x = np.linspace(-1, 1)
    for wi in w:
        ax.plot(x, f(x, wi), c="tab:blue", alpha=0.4)


def plot_likelihood_obs(X, T, ix, ax=plt):
    W = np.mgrid[-1:1:0.1, -1:1:0.1]
    x, t = sample_vals(X, T, ix)
    mean = W.T.reshape(-1, 2) @ x.T

    likelihood = norm.pdf(t, loc=mean, scale=np.sqrt(1 / beta)).reshape(20, 20).T
    ax.contourf(*W, likelihood,cmap='jet',interpolation='nearest')
    ax.scatter(-0.3, 0.5, c="white", marker="+")


SN = np.eye(2) / alpha
mN = np.zeros((2, 1))

seed(1643)
N = 20
nobs = [1, 2, 20]
ix_fig = 1
fig, ax = plt.subplots(len(nobs) + 1, 3, figsize=(10, 12))
plot_prior(mN, SN, ax=ax[0, 1])
ax[0, 1].scatter(-0.3, 0.5, c="white", marker="+")
ax[0, 0].axis("off")
plot_sample_w(mN, SN, ax=ax[0, 2])
for i in range(0, N + 1):
    Phi, t = sample_vals(X, T, i)
    SN, mN = posterior_w(Phi, t, SN, mN)
    if i + 1 in nobs:
        plot_likelihood_obs(X, T, i, ax=ax[ix_fig, 0])
        plot_prior(mN, SN, ax=ax[ix_fig, 1])
        ax[ix_fig, 1].scatter(-0.3, 0.5, c="white", marker="+")
        ax[ix_fig, 2].scatter(X[:i + 1], T[:i + 1], c="crimson")
        ax[ix_fig, 2].set_xlim(-1, 1)
        ax[ix_fig, 2].set_ylim(-1, 1)
        for l in range(2):
            ax[ix_fig, l].set_xlabel("$w_0$")
            ax[ix_fig, l].set_ylabel("$w_1$")
        plot_sample_w(mN, SN, ax=ax[ix_fig, 2])
        ix_fig += 1

titles = ["likelihood", "prior/posterior", "data space"]
for axi, title in zip(ax[0], titles):
    axi.set_title(title, size=15)
plt.tight_layout()
plt.show()

作业五

等价核绘制

首先是高斯核的绘制，下图是一个取值为线性的高斯核函数的图像

import matplotlib.pyplot as plt
import numpy as np

def gaussianLike(mu,sigma,x):
    return np.exp((-1/2)*np.matmul(np.matmul((x-mu),np.linalg.inv(sigma)),(x-mu).reshape(len(x),1)))

def gaussianKernel(gaussianLike,x):
    return np.dot(gaussianLike(np.zeros(2,),np.eye(2,2),x),gaussianLike(np.zeros(2,),np.eye(2,2),x))

x=np.linspace(-2,2,100)
y=np.linspace(-2,2,100)
X, Y = np.meshgrid(x, y)
Z=[]
z1=np.ones((100,100))
for i in x:
    n=0
    z=[]
    for j in y:
        m=0
        z.append(gaussianKernel(gaussianLike,[i,j]))
        z1[n,m]=gaussianKernel(gaussianLike,[i,j])
        m+=1
    n+=1
    Z.append(z)
Z=np.array(Z)

plt.imshow(Z,cmap='hot')
plt.show()


plt.figure()
ax = plt.axes(projection='3d')
ax.plot_surface(X,Y,Z,cmap='rainbow')
plt.show()

下面复现课本的等价核：

高斯核

import numpy as np
import matplotlib.pyplot as plt

def gaussian_kernel(x, y, sigma):
    return np.exp(-np.linalg.norm(x - y)**2 / (2 * (sigma ** 2)))

def compute_kernel_matrix(X, sigma):
    n = X.shape[0]
    K = np.zeros((n, n))
    for i in range(n):
        for j in range(i, n):
            K[i, j] = gaussian_kernel(X[i], X[j], sigma)
            K[j, i] = K[i, j]
    return K

# 生成示例数据集
X = np.linspace(-5, 5, 100).reshape(-1, 1)
y = np.sin(X)

# 计算不同带宽参数下的高斯等价核矩阵
sigmas = [0.1, 1, 10]
for sigma in sigmas:
    K = compute_kernel_matrix(X, sigma)

    # 绘制高斯核形状图像
    plt.figure()
    plt.imshow(K, cmap='jet', interpolation='nearest')
    plt.title(r'Gaussian Kernel with $\sigma=$' + str(sigma))
    plt.colorbar()
    plt.show()

多项式核

import numpy as np
import matplotlib.pyplot as plt

def polynomial_kernel(x, y, degree, coef0=0):
    return (np.dot(x, y) + coef0) ** degree

def compute_kernel_matrix(X, degree, coef0=0):
    n = X.shape[0]
    K = np.zeros((n, n))
    for i in range(n):
        for j in range(i, n):
            K[i, j] = polynomial_kernel(X[i], X[j], degree, coef0)
            K[j, i] = K[i, j]
    return K

# 生成示例数据集
X = np.linspace(-5, 5, 100).reshape(-1, 1)
y = np.sin(X)

# 计算不同阶数和常数项参数下的多项式核矩阵
degrees = [2, 3, 4]
coef0s = [-1, 0, 1]
for degree in degrees:
    for coef0 in coef0s:
        K = compute_kernel_matrix(X, degree, coef0)

        # 绘制多项式核形状图像
        plt.figure()
        plt.imshow(K, cmap='jet', interpolation='nearest')
        plt.title('Polynomial Kernel with degree=' + str(degree) + ' and coef0=' + str(coef0))
        plt.colorbar()
        plt.show()

Sigmoid核

import numpy as np
import matplotlib.pyplot as plt

def sigmoid_kernel(x, y, alpha, c):
    return np.tanh(alpha * np.dot(x, y) + c)

def compute_kernel_matrix(X, alpha, c):
    n = X.shape[0]
    K = np.zeros((n, n))
    for i in range(n):
        for j in range(i, n):
            K[i, j] = sigmoid_kernel(X[i], X[j], alpha, c)
            K[j, i] = K[i, j]
    return K

# 生成示例数据集
X = np.linspace(-5, 5, 100).reshape(-1, 1)
y = np.sin(X)

# 计算不同超参数下的 Sigmoid 核矩阵
alphas = [0.1, 0.5, 1]
cs = [-1, 0, 1]
for alpha in alphas:
    for c in cs:
        K = compute_kernel_matrix(X, alpha, c)

        # 绘制 Sigmoid 核形状图像
        plt.figure()
        plt.imshow(K, cmap='jet', interpolation='nearest')
        plt.title('Sigmoid Kernel with alpha=' + str(alpha) + ' and c=' + str(c))
        plt.colorbar()
        plt.show()

似然

from scipy import stats
import numpy as np
import random
import matplotlib.pyplot as plt

random.seed(615)
def get_synth_data(N, beta):
    X_N =np.random.uniform(-1,1, N)
    X_N = np.column_stack((np.ones(N), X_N))
    # for this example the true function is f(x,a) = a_0 + a_1*x
    # where a_0 = -.3 and a_1 = .5 are the parameters that we
    # are going to estimate
    a = np.array([-.3, .5])
    t = np.dot(X_N, a)
    return X_N, t + np.random.normal(loc=0.0, scale=np.sqrt(1./beta), size=N)

beta_ = 25.0
alpha = 2.0
N = 20
X_N, t_N = get_synth_data(N, beta_)
x = np.linspace(-1, 1, 70)
y = np.linspace(-1, 1, 70)



w0, w1 = np.meshgrid(x, y)
m_0 = [0, 0]
S_0 = 1/alpha * np.eye(2)

n = 1
X_n = X_N[range(n), :]
t_n = t_N[range(n)]
plt.xlim(-1, 1);
plt.ylim(-1, 1)
plt.scatter(X_n[:, 1], t_n)
plt.title("First point in the data set")
plt.xlabel('x');
plt.ylabel('y')
plt.show()


def likelihood(t_, x, w, beta):
    from math import sqrt
    return stats.norm.pdf(t_,loc=np.dot(w,x), scale=sqrt(1./beta))

Z = np.zeros((len(y), len(x)))
for i, w1 in enumerate(y):
    for j, w0 in enumerate(x):
        Z[i, j] = likelihood(t_n[-1], X_n[-1,:], np.array([w0, w1]), beta_)
extent = (-1,1,-1,1)
plt.imshow(Z, extent=extent, origin='lower')
plt.plot(-0.3, 0.5, 'w+', markeredgewidth=2, markersize=12)
plt.title("Likelihood of the first point in the data set, white cross = true value")
plt.xlabel("w0")
plt.ylabel("w1")
plt.show()

作业一多项式拟合

理论推导

\[ \min ||f(\omega;x)-t||^2\\ \sum_{i=0}^{n}{\omega_i*x_i^j}=t \]

写成矩阵形式： \[ XW=T \] 使用平方和最小来衡量，设损失函数： \[ L(x)=\frac{1}{2}\sum^{N}_{i=1}{(\sum_{j=0}^{M}{\omega_jx_i^j-y_i)^2}} \] 对损失函数求导，令导数等于0： \[ \begin{array}{l} \frac{\partial L(x;\omega)}{\partial \omega_i}=0 \\ \\ \frac{1}{2}\sum_{i=1}^{N}{2(\sum^{j=0}_{M}{\omega_jx_i^j-y_i)}\times x_i^k}=0 \\ \\ \sum_{i=1}^{N}{\sum^{M}_{j=1}{\omega_ix_i^{j+k}}}=\sum_{j=1}^{M}{x_i^ky_i}(k=0,1,2,3,\cdots,M) \end{array} \] 那么就有： \[ \begin{array}{l} X=\sum^M_{j=1}{x_i^{j+k}} \\ W=\omega_i \\ Y=\sum_{i=1}^{M}{x_i^ky_i}\\ XW=Y \end{array} \] 将以\(x\)为参数的非线性模型转化为以\(w\)的线性模型，从而转化为矩阵方程求解问题,需要将方程特征进行组合，实现上使用sklearn，进行多项式特征组合，然后使用线性回归进行预测：

算法语言描述：

生成一个（特征数目+1，特征数目+1）的矩阵
分别计算0-特征阶数的幂
返回特征混合矩阵
线性回归

代码实现

import numpy as np

class Polyfeature():
    def __init__(self,n_features):
        self.n_features=n_features

    # 一元特征混合
    def fit(self, x):
        xf = np.zeros((len(x), self.n_features + 1))
        for i in range(self.n_features + 1):
            xf[:,i] = np.power(x, i).reshape(np.shape(xf[:,i]))
        self.xf=xf
        return xf

    def transform(self,x):
        xf = np.zeros((len(x), self.n_features + 1))
        for i in range(self.n_features + 1):
            xf[:, i] = np.power(x, i).reshape(np.shape(xf[:, i]))
        self.xf = xf
        return xf

    def fit_transform(self, x):
        xf = np.zeros((len(x), self.n_features + 1))
        for i in range(self.n_features + 1):
            xf[:, i] = np.power(x, i).reshape(np.shape(xf[:, i]))
            self.xf = xf
        return xf

import numpy as np
class linearregression():
    def fit(self,x,y,Lambda=0):
        c = np.dot(x.T, x)
        I = np.eye(np.shape(c)[0])
        d = np.dot(Lambda, I)
        e = (c + d)
        e = np.linalg.inv(e)
        w = np.dot(x.T, y)
        w = np.dot(e, w)
        self.w = w
    def predict(self,x):
        y=np.dot(x,self.w)
        return y

import math

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

class polyregression():
    def __init__(self,features):
        self.n_features=features
    # 生成数据
    def create_data(self,n):
        self.n=n
        X=np.random.rand(n,1)
        # 考虑到需要保证高斯白噪声的0均值
        noise=0.3*np.random.uniform(low=-1, high=1, size=(n,1))
        t=np.sin(X*2*math.pi)+noise
        self.X=X
        self.t=t

    # 划分数据集
    def splitData(self):
        X_train, X_test, T_train, T_test = train_test_split(self.X, self.t, test_size=0.3, random_state=0)
        self.X_train=X_train
        self.X_test=X_test
        self.T_train=T_train
        self.T_test=T_test

    # 多项式拟合
    def fit(self):
        poly = PolynomialFeatures(self.n_features)
        x_train_poly = poly.fit_transform(self.X_train)

        lin=LinearRegression()
        lin.fit(x_train_poly,self.T_train)

        self.lin=lin
        self.poly=poly

    def draw(self):
        plt.style.use('seaborn')
        plt.figure()
        plt.plot(np.linspace(0,1,50),np.sin(np.linspace(0,1,50)*2*math.pi),color='green')
        plt.scatter(self.X,self.t,marker='o',edgecolor='blue',color='white',linewidths='1.1')
        plt.plot(np.linspace(0,1,50),self.lin.predict(self.poly.transform(np.linspace(0,1,50).reshape(-1,1))),color='red')
        s='n_features='+str(self.n_features)
        plt.text(0.7,1,s)
        plt.xlabel('X')
        plt.ylabel('Y')
        plt.savefig('features='+str(self.n_features)+' samples='+str(self.n)+'.png')
        plt.show()
    # 衡量拟合的标准
    def RSME(self):
        T_test_predict=self.lin.predict(self.poly.transform(self.X_test))
        SSE=np.sum(np.square(self.T_test-T_test_predict))
        MSE=SSE/len(T_test_predict)
        rsme=np.sqrt(MSE)
        return rsme
if __name__=='__main__':
    # 对不同特征选择的比较
    def features_test():
        RSME=[]
        for i in range(10):
            poly=polyregression(i)
            poly.create_data(20)
            poly.splitData()
            poly.fit()
            poly.draw()
            RSME.append(poly.RSME())

        plt.figure()
        plt.plot(np.linspace(0,10,10),RSME,marker='^',label="points")
        plt.show()
    # 对不同样本数量的比较
    def samples_test():
        for j in [20,200,500]:
            RSME=[]
            for i in range(50):
                poly=polyregression(i)
                poly.create_data(j)
                poly.splitData()
                poly.fit()
                poly.draw()
                RSME.append(poly.RSME())
            plt.figure()
            plt.plot(np.linspace(0,50,50),RSME,marker='^',label="points")
            plt.savefig('sample='+str(j)+'.png')
            plt.show()

    # features_test()
    samples_test()

结果对比

不同的特征数量，首先全部针对样本数量为10的情况： features=0 samples=10

从中可以大致看出，在选择特征过少时，会出现欠拟合现象，选择过多后则会过拟合，针对样本量为20的情况下，RSME曲线如图： sample=20.png 图上显示的情况，并不是与我们预期中的情况完全吻合，其原因可能是因为测试集样本数量过少，导致无法很好的捕捉曲线拟合的问题，于是我们对其他样本数量进行对比、 sample=200.png sample=500.png
可以看出样本数量增多后，特征的选择有明显的趋势。尝试复现使用样本为10的情况： sample=10 且样本数量会明显的影响拟合的效果。

针对样本为10，特征为5的情况下进行正则化：

\(\lambda\)=0.7740859059011267

作业二

频率学派与贝叶斯学派

频率观点认为频率能够趋近概率，模型的参数是一定的，只要采样数目足够多就能逼近参数值，我的理解是其符合大数定律的思想（此为辛钦大数定律）： \[ \lim_{n\to \infty}P\left(\left|\frac{1}{n}\sum_{i=1}^{n}{a_i-\mu} \right|< \varepsilon\right)=1 \]
贝叶斯观点认为模型的参数是在变化的，样本是一定的。 \[ p(w|D)=\frac{p(D|w)p(w)}{p(D)} \] 以一元高斯分布为例，解释样本分布问题：
频率：频率的思想是以样本代替总体，从而得到模型参数，假设有数据集\(D=\{x_1,x_2,\cdot\cdot\cdot,x_n\}\)，则估计得到的一元高斯分布的参数为 \[ \hat\mu=\frac{1}{n}\sum_{i=1}^{n}{x_i} \] \[ \hat{\sigma^2}=\frac{1}{n}\sum_{i=1}^{n}{(x_i-\hat\mu)}^2 \] 样本分布显然是和总体有差异的，对其求期望可以得到，均值为无偏估计，而方差存在偏差，而偏差随着样本数目n的增大而减小（\(E(\hat\sigma^2)=\frac{n-1}{n}\sigma^2\)）。
贝叶斯：贝叶斯的思想认为所有参数都是一个分布，而样本是固定不变的量。所以通过先验尽可能使逼近高斯分布。 \[ p(D|w)=\frac{p(w|D)p(D)}{p(w)} \] 有似然函数，其中\(p(D)\)为一个常数，可被忽略（归一化），根据课本给出的高斯分布求解结果，发现他的方差和均值均为无偏估计。 ## 公式推导

引入先验

在这里将数据的概率分布进行了归一，认为样本概率为一常数

然后对\(\ln p(\mathbf{w}|\alpha)\)求解：

取负数：
\[ -\ln p(\mathbf{t|x},w,\beta)p(\mathbf{w}|\alpha)= \frac{\beta}{2}\sum_{n=1}^{N} {(x-y_n(x_n,\mathbf{w}))^2 } +\frac{\alpha}{2}\mathbf{w^T}\mathbf{w} -\frac{n}{2}\ln \frac{\beta}{2\pi}-\frac{M+1}{2}\ln \alpha+\frac{M+1}{2} \ln 2\pi \] 后三项均为常数，对最大化没有任何影响，因此损失函数为：

作业三

基函数图像复现

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import MultipleLocator

x_major_locator=MultipleLocator(1)
y_major_locator=MultipleLocator(0.25)

plt.figure(figsize=(15,5))
plt.subplot(1,3,1)
def sigma(x,b):
    return 1/(np.exp((-x+b)*10)+1)
x=np.linspace(-1,1,100)
b=np.linspace(-1,1,11).tolist()
for bi in b:
    plt.plot(x,sigma(x,bi))
plt.xlim(-1,1)
plt.ylim(0,1)
ax=plt.gca()
ax.xaxis.set_major_locator(x_major_locator)
ax.yaxis.set_major_locator(y_major_locator)


'''
def gaussian(mu,sigma,x,b):
    return 1/(np.sqrt(2*np.pi)*sigma)*np.exp(-np.square((x+b)*5-mu)/2*np.square(sigma))

plt.subplot(1,3,2)
for bi in b:
    plt.plot(x,gaussian(0,1,x,bi))
plt.xlim(-1,1)
plt.ylim(0,0.4)
plt.show()
'''


def poly(x,n):
    return np.power(x,n)
plt.subplot(1,3,2)
n=np.linspace(1,10,10).tolist()
for i in n:
    plt.plot(x,poly(x,i))
plt.xlim(-1,1)
plt.ylim(-1,1)
ax=plt.gca()
y_major_locator1=MultipleLocator(0.5)
ax.xaxis.set_major_locator(x_major_locator)
ax.yaxis.set_major_locator(y_major_locator1)


def gaussian1(mu,sigma,x,b):
    return np.exp(-np.square((x+b)*5-mu)/2*np.square(sigma))
plt.subplot(1,3,3)
for bi in b:
    plt.plot(x,gaussian1(0,1,x,bi))

ax=plt.gca()
ax.xaxis.set_major_locator(x_major_locator)
ax.yaxis.set_major_locator(y_major_locator)

plt.xlim(-1,1)
plt.ylim(0,1)
plt.tight_layout()
plt.show()

作业四多项式拟合（加强版）

实验项目结构

/>其中，data储存固化的数据

将数据生成函数与多项式拟合功能分离，建立createData.py

交叉验证函数

线性回归

多项式特征生成和多项式回归

createData.py
__________________
import numpy as np
import matplotlib as plt
import pandas as pd
from sklearn.model_selection import train_test_split


# 生成数据集
def create_data(n,beta):
    n = n
    X = np.linspace(0, 1, n).reshape(n, 1)
    noise = beta * np.random.uniform(low=-1, high=1, size=(n, 1))
    t = np.sin(X * 2 * np.pi) + noise
    X = X
    t = t
    data=np.hstack((X,t))
    return data

def splitData(X,t):
    X_train, X_test, T_train, T_test = train_test_split(X, t, test_size=0.2, random_state=0)
    return X_train,X_test,T_train,T_test

# 数据固化
def tocsv(data):
    df=pd.DataFrame(data)
    df.to_csv('.\data\sample='+str(data.shape[0])+'.csv')

if __name__ == '__main__':
    data=create_data(10,0.1)
    tocsv(data)

Cross_Validation.py
_________________________________________
from sklearn.model_selection import KFold

class Cross_Validation():
    def __init__(self, n_folds):
        self.n_folds = n_folds


    def CV(self,x,y):
        kf = KFold(n_splits=self.n_folds)
        x_train=[]
        x_test=[]
        y_train=[]
        y_test=[]
        for train_index,test_index in kf.split(x):
            xtr, xte = x[train_index], x[test_index]
            ytr, yte = y[train_index], y[test_index]
            x_train.append(xtr)
            x_test.append(xte)
            y_train.append(ytr)
            y_test.append(yte)
        return x_train, x_test, y_train, y_test

linearregression.py
_______________________
import numpy as np
class linearregression():
    def fit(self,x,y,Lambda=0):
        c = np.dot(x.T, x)
        I = np.eye(np.shape(c)[0])
        d = np.dot(Lambda, I)
        e = (c + d)
        e = np.linalg.inv(e)
        w = np.dot(x.T, y)
        w = np.dot(e, w)
        self.w = w
    def predict(self,x):
        y=np.dot(x,self.w)
        return y

Polyfeature.py
____________________
import numpy as np

class Polyfeature():
    def __init__(self,n_features):
        self.n_features=n_features

    # 一元特征混合
    def fit(self, x):
        # 单个数字
        if isinstance(x,int):
            x=np.array([x])
        # 数组计算
        xf = np.zeros((len(x), self.n_features + 1))
        for i in range(self.n_features + 1):
            xf[:,i] = np.power(x, i).reshape(np.shape(xf[:,i]))
        self.xf=xf
        return xf

    def fit_transform(self,x):
        # 单个数字
        if isinstance(x, int):
            x = np.array([x])
        if isinstance(x,float):
            x = np.array([x])
        # 数组计算
        xf = np.zeros((len(x), self.n_features + 1))
        for i in range(self.n_features + 1):
            xf[:, i] = np.power(x, i).reshape(np.shape(xf[:, i]))
        return xf

PolyRegress.py
______________________________________
import math

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
import matplotlib.ticker as mticker


from Polyfeature import Polyfeature
from linearregression import linearregression
from Cross_Validation import Cross_Validation
from createData import create_data
from createData import splitData

class polyRegression():
    def __init__(self, features, n_lambda):
        self.n_features = features
        self.n_lambda = n_lambda

    # 拟合
    def fit(self,x,y,l):
        # 混合特征
        pf=Polyfeature(self.n_features)
        pf.fit(x)
        # 线性回归
        lin = linearregression()
        lin.fit(pf.xf,y,Lambda=l)

        self.lin = lin
        self.poly = pf

    def predict(self,x):
        x=self.poly.fit_transform(x)
        prediction=self.lin.predict(x)
        return prediction

    def draw(self,X,t):
        plt.style.use('seaborn')
        plt.figure()
        plt.plot(np.linspace(0,1,50),np.sin(np.linspace(0,1,50)*2*math.pi),c='green')
        plt.scatter(X,t,marker='o',edgecolor='blue',c='white',linewidths=1.1)
        plt.plot(np.linspace(0,1,50),
                 self.lin.predict(self.poly.fit_transform(np.linspace(0,1,50).reshape(-1,1))),
                 c='red')
        s='n_features='+str(self.n_features)
        plt.text(0.7,1,s)
        plt.xlabel('X')
        plt.ylabel('Y')
        s='cv\\features='+str(self.n_features)+' samples='+str(1)+'.png'
        plt.savefig('11.png')
        plt.show()

    def cv_pridict(self, X, y, n_fold):
        cv = Cross_Validation(n_fold)
        X_train, X_test, y_train, y_test = cv.CV(X,y)
        y_allPredict = np.ones((1, self.n_lambda))
        Lambda = np.logspace(-10, -7, self.n_lambda)
        w=[]

        for i in range(n_fold):
            y_predict = np.zeros((y_test[i].shape[0], self.n_lambda))
            k=0
            for j in np.nditer(Lambda) :
                print('第',k+1+self.n_lambda*i,'次:',j)
                poly = self.fit(X_train[i], y_train[i], j)
                y_pre = self.predict(X_test[i])
                w.append(self.lin.w)
                y_predict[:, k] = y_pre.ravel()
                k=k+1
            y_allPredict = np.vstack((y_allPredict, y_predict))
        y_allPredict = y_allPredict[1:]
        return y_allPredict, cv, y_test, y_train, X_test, X_train,w

    def RMSE_CV(self, y_allPredict, y_measure, n):
        press = np.square(np.subtract(y_allPredict, y_measure.reshape(-1,1)))
        press_all = np.sum(press, axis=0)
        RMSECV = np.sqrt(press_all / n)
        lambda_best_index= np.argmin(RMSECV)
        lambda_best=np.logspace(-10,-7, self.n_lambda)[lambda_best_index]
        return RMSECV, lambda_best

    def bias_cv(self, y_allPredict, y_expect, n):
        press = np.square(np.subtract(y_allPredict, y_expect.reshape(-1,1)))
        press_all = np.sum(press, axis=0)
        bias_cv = press_all / n
        return bias_cv

    def variance_cv(self,y_allPredict, y_expect):
        press = np.square(np.subtract(y_allPredict, y_expect.reshape(-1,1)))
        variance=np.average(press,axis=0)
        return variance

    def test_error_cv(self,y_allPredict,y,Lambda,w):
        press = np.square(np.subtract(y_allPredict, y.reshape(-1, 1)))
        error1 = np.sum(press, axis=0)/2
        error2 = np.linalg.norm(w,axis=1)
        error_temp=[]
        for i in range(100):
            e=np.average(error2[i*5:(i+1)*5])
            error_temp.append(e)
        error2=np.array(error_temp)*Lambda/2
        error=error1+error2
        return error

    def show_Select_lambda(self,RMSECV):
        x=np.logspace(-10,-7,self.n_lambda)
        plt.plot(x,
                 RMSECV,marker='^',
                 markersize=10,
                 markerfacecolor='orange',
                 color='black')
        plt.xlabel('lambda')
        plt.ylabel('RMSECV')
        ax = plt.gca()
        ax.set_xscale("log")
        ax.set_xlim(ax.get_xlim()[::-1])

    def RSME(self,X,t):
        T_predict = self.lin.predict(self.poly.fit_transform(X))
        SSE = np.sum(np.square(t - T_predict))
        MSE = SSE / len(T_predict)
        rsme = np.sqrt(MSE)
        return rsme

调整\(\beta\)和\(\lambda\)观察RMSE变化

代码如下：

if __name__=='__name__':
   beta=np.linspace(0,0.5,1000)
   Lambda=np.logspace(-10,0,1000)
   RMSE_SUM=[]
   # beta和lambda对拟合效果影响
   for j in beta:
        poly = polyRegression(8, 1000)
        data=create_data(10,j)
        X=data[:,0]
        t=data[:,1]
        y_allPredict, cv, y_test, y_train, X_test, X_train = poly.cv_pridict(X, t, 5)
        RMSECV, best_lambda = poly.RMSE_CV(y_allPredict, t, X.shape[0])
        RMSE_SUM.append(RMSECV)
    RMSE=np.array(RMSE_SUM)
    X, Y = np.meshgrid(Lambda, beta)
    fig = plt.figure(figsize=(15,15))
    ax = plt.axes(projection='3d')
    ax.set_xlabel('beta')
    ax.set_ylabel('lambda')
    ax.set_zlabel('RSME')
    ax.plot_surface(np.log(Y), np.log(X), RMSE, rstride=1, cstride=1, cmap=plt.get_cmap('rainbow'))
    ax.set_title('Surface plot')
    plt.show()

右图\(-\ln \beta\)值作为x轴，\(\ln \lambda\)作为y轴，左图反之。

为了方便观察，对局部作图：

首先，从\(\ln \lambda\)方向观察，可以看出\(\lambda\)对RMSE的影响呈波浪状，存在多个极小值，当\(\lambda\)取很小的值时，在\(-ln\beta\)较大的时候出现了比较严重的过拟合。

然后从\(-\ln \beta\)方向观察，可以看出\(\beta\)对RMSE的影响实际上并不大，呈小而密的波浪趋势。

正则化前后回归系数

代码如下：

# 选择参数lambda，给出拟合表达
    df=pd.read_csv('./data/sample=10.csv')
    X=df.values[:,1]
    t=df.values[:,2]
    x_train, x_test, T_train, T_test=splitData(X,t)
    # 正则化
    for i in range(10):
        poly = polyRegression(i, 50)
        y_allPredict, cv, y_test, y_train, X_test, X_train = poly.cv_pridict(X, t, 5)
        RMSECV, best_lambda = poly.RMSE_CV(y_allPredict, t, X.shape[0])
        poly_best=polyRegression(i,5)
        poly_best.fit(x_train,T_train,best_lambda)
        print('次数为',i,'时的权重集合：')
        print(poly_best.lin.w)
    # 无正则化
    print("无正则化\n")
    for i in range(10):
        poly = polyRegression(i, 50)
        poly.fit(x_train, T_train, 0)
        print('次数为', i, '时的权重集合：')
        print(poly.lin.w)
    rmse_train=[]
    rmse_test=[]

无正则化：

0	1	2	3	9
-0.08673405	0.43162238	0.5695881	-0.05816218	-0.07491257
	-1.05343403	-2.17473669	11.14510403	10.41748047
		1.16724815	-33.31060868	-34.4765625
			22.29544252	56.09375
				-60.
				-14.375
				34.796875
				44.5625
				-2.0625
				-35.390625

正则化后：

0	1	2	3	9
-0.07776724	0.15514244	0.17544013	0.33805469	0.21515437
	-0.52292037	-0.38363097	-0.80726911	-0.44372143
		-0.25709792	-0.42717323	-0.42050595
			0.41092392	-0.25649878
				-0.10138744
				0.02127489
				0.1134249
				0.1815334
				0.23170656
				0.26874031

bias-variance结构

程序如下：

# 期望计算
    predictAll=[]
    T11=[]
    for i in range(100):
        data1 = create_data(25, np.random.uniform(0,0.5))
        X1 = data1[:,0]
        t1 = data1[:,1]
        x1_train, x1_test, T1_train, T1_test = splitData(X1, t1)
        poly_evey=polyRegression(25,100)
        y_allPredict, cv, y_test, y_train, X_test, X_train,w = poly_evey.cv_pridict(X1, t1, 5)
        predictAll.append(y_allPredict)
        RMSECV, best_lambda = poly_evey.RMSE_CV(y_allPredict, t1, X1.shape[0])
        poly_evey.fit(x1_train,T1_train,best_lambda)
        t_predict=poly_evey.predict(X1)
        T11.append(t_predict)

    T1=np.array(T11)
    f_hat=np.average(T1,axis=0)
    poly11=polyRegression(25,100)
    variances=[]
    for i in predictAll:
        variance=poly11.variance_cv(i,f_hat)
        variances.append(variance)
    variance=np.average(variances,axis=0)

    # 生成一个新数据集
    data2 = create_data(25, np.random.uniform(0, 0.5))
    X2 = data2[:, 0]
    t2 = data2[:, 1]
    x2_train, x2_test, T2_train, T2_test = splitData(X2, t2)
    poly2=polyRegression(25,100)
    y_allPredict, cv, y_test, y_train, X_test, X_train, w = poly2.cv_pridict(X2, t2, 5)
    RMSECV, best_lambda = poly2.RMSE_CV(y_allPredict, t2, X2.shape[0])
    poly2.show_Select_lambda(RMSECV)
    bias_2=poly2.bias_cv(y_allPredict,f_hat,X2.shape[0])
    w=np.array(w)
    error=poly2.test_error_cv(y_allPredict,t2,np.logspace(-10,-7,100),w)
    print(bias_2)

    plt.figure()
    plt.plot(np.logspace(-10,-7, 100),bias_2,label='bias^2',color='blue')
    plt.plot(np.logspace(-10,-7, 100),variance, label='variance', color='red')
    plt.plot(np.logspace(-10,-7, 100),bias_2+variance, label='bias^2+variance', color='pink')
    plt.plot(np.logspace(-10,-7, 100),error,label='test error', color='black')
    ax = plt.gca()
    ax.set_xscale("log")
    plt.legend()
    plt.show()

\(degree\)、\(lambda\)寻优过程

rmse_train=[]
rmse_test=[]
# 寻找最优次数
for i in range(10):
    poly = polyRegression(i, 50)
    poly.fit(x_train,T_train,0)
    rmse_train.append(poly.RSME(x_train,T_train))
    rmse_test.append(poly.RSME(x_test,T_test))
plt.style.use('seaborn')
plt.figure()
plt.plot(np.linspace(0,10,10),
         rmse_test,
         label='test',
         marker='o',
         markerfacecolor='white'
         , markeredgecolor='b',
         color='b',
         markeredgewidth=1.5)
plt.plot(np.linspace(0,10,10),rmse_train,
         label='train',
         marker='o',
         markerfacecolor='white',
         markeredgecolor='r',
         color='r',
         markeredgewidth=1.5)
plt.ylabel('RMSE')
plt.xlabel('degree')
plt.tight_layout()
plt.legend()
plt.show()
best_degree=np.argmin(np.array(rmse_test))
print('best degree is',best_degree)
print('min RMSE in test sets:',np.min(rmse_test))

# 对最优次数寻找合适的lambda
poly = polyRegression(best_degree, 50)
poly.fit(x_train, T_train, 0)
poly.draw(X,t)
y_allPredict, cv, y_test, y_train, X_test, X_train,w = poly.cv_pridict(X, t, 5)
RMSECV, best_lambda = poly.RMSE_CV(y_allPredict, t, X.shape[0])
poly.show_Select_lambda(RMSECV)
print('best lambda is',best_lambda)
print('min RMSE with regularization:',np.min(RMSECV))
poly_best = polyRegression(best_degree, 50)
poly_best.fit(x_train, T_train, best_lambda)
poly_best.draw(X, t)

degree寻优：

\(lambda\)寻优：

正则化前后图像对比：

左图为未正则化，右图为正则化后。输出结果为：

best degree is 5
min RMSE in test sets: 0.11753272730633338
best lambda is 1.8420699693267162e-08
min RMSE with regularization: 0.0998645060490816

样本数目影响

代码如下：

RMSEs=[]
for i in range(10,50):
    data=create_data(i,0.25)
    X=data[:,0]
    t=data[:,1]
    X_train, X_test, T_train, T_test=splitData(X,t)
    poly=polyRegression(8,50)
    poly.fit(X_train,T_train,0)
    rmse=poly.RSME(X,t)
    RMSEs.append(rmse)

plt.style.use('seaborn')
plt.figure()
plt.plot(np.linspace(10,50,40),RMSEs)
plt.ylabel("RMSE")
plt.xlabel('samples')
plt.tight_layout()
plt.show()

上图是样本数量在（10,200）的RMSE图，下图是样本数量在（10,20）的RMSE图，可以明显看出随着样布数目增长RMSE逐渐趋于稳定，仅有很小的波动。

作业九 反向传播算法（BP）

算法推导

拟合sin曲线

作业八 神经网络初步

MLPClassifier二分类

MLPClassifier多分类

MLPRegressor

作业七 逻辑回归分类

二分类逻辑回归

多分类逻辑回归

作业六

Bayesian Linear Regression

不同样本数量影响拟合效果

交叉验证调优

模拟拟合过程（似然、先验）

作业五

等价核绘制

似然

作业一 多项式拟合

理论推导

代码实现

结果对比

作业二

频率学派与贝叶斯学派

作业三

基函数图像复现

作业四 多项式拟合（加强版）

实验项目结构

调整\(\beta\)和\(\lambda\)观察RMSE变化

正则化前后回归系数

bias-variance结构

\(degree\)、\(lambda\)寻优过程

样本数目影响

作业九反向传播算法（BP）

作业八神经网络初步

作业七逻辑回归分类

作业一多项式拟合

作业四多项式拟合（加强版）