线性回归原理推导及代码实现

2023年6月18日下午1:00 • 人工智能 • 阅读 102

1.线性回归概述

实例：

数据：工资和年龄（2个特征）

目标：预测银行会贷款给我多少钱（标签）

考虑：工资和年龄都会影响最终银行贷款的结果那么它们各自有多大的影响呢？（参数）

X1,X2就是我们的两个特征（年龄，工资）Y是银行最终会借给我们多少钱

找到最合适的一条线（想象一个高维）来最好的拟合我们的数据点

误差：

真实值和预测值之间肯定是要存在差异的（用

来表示该误差）

独立同分布：

独立：数据之间没有任何影响

同分布：所有数据均来自于同一分布

在实际应用中，数据尽可能满足上诉条件既可以使用

2.极大似然估计

解释：什么样的参数跟我们的数据组合后恰好是真实值

解释：乘法难解，加法就容易了，对数里面乘法可以转换成加法

目标：让似然函数（对数变换后也一样）越大越好

3.梯度下降

寻找山谷的最低点，也就是我们的目标函数终点（什么样的参数能使得目标函数达到极值点）

（容易得到最优解，但是由于每次考虑所有样本，速度很慢）

（每次找一个样本，迭代速度快，但不一定每次都朝着收敛的方向）

（每次更新选择一小部分数据来算，实用！）

学习率（步长）：对结果会产生巨大的影响，一般小一些

如何选择：从小的时候，不行再小

批处理数量：32，64，128都可以，很多时候还得考虑内存和效率

4.代码实现

线性回归代码实现

实验分析：

单变量

import sys
sys.path.append('E:/PycharmProjects/machine learning/code refactoring/&#x7EBF;&#x6027;&#x56DE;&#x5F52;-&#x4EE3;&#x7801;&#x5B9E;&#x73B0;/')
sys.path.append('E:/PycharmProjects/machine learning/code refactoring/&#x7EBF;&#x6027;&#x56DE;&#x5F52;-&#x4EE3;&#x7801;&#x5B9E;&#x73B0;/LinearRegression')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from linear_regression1 import LinearRegression

data = pd.read_csv('E:/PycharmProjects/machine learning/code refactoring/&#x7EBF;&#x6027;&#x56DE;&#x5F52;-&#x4EE3;&#x7801;&#x5B9E;&#x73B0;/data/world-happiness-report-2017.csv')

&#x5F97;&#x5230;&#x8BAD;&#x7EC3;&#x548C;&#x6D4B;&#x8BD5;&#x6570;&#x636E;
train_data = data.sample(frac = 0.8)
test_data = data.drop(train_data.index)

input_param_name = 'Economy..GDP.per.Capita.'
output_param_name = 'Happiness.Score'

x_train = train_data[[input_param_name]].values
y_train = train_data[[output_param_name]].values

x_test = test_data[input_param_name].values
y_test = test_data[output_param_name].values

plt.scatter(x_train,y_train,label='Train data')
plt.scatter(x_test,y_test,label='test data')
plt.xlabel(input_param_name)
plt.ylabel(output_param_name)
plt.title('Happy')
plt.legend()
plt.show()

num_iterations = 500
learning_rate = 0.01

linear_regression = LinearRegression(x_train,y_train)
(theta,cost_history) = linear_regression.train(learning_rate,num_iterations)

print ('&#x5F00;&#x59CB;&#x65F6;&#x7684;&#x635F;&#x5931;&#xFF1A;',cost_history[0])
print ('&#x8BAD;&#x7EC3;&#x540E;&#x7684;&#x635F;&#x5931;&#xFF1A;',cost_history[-1])

plt.plot(range(num_iterations),cost_history)
plt.xlabel('Iter')
plt.ylabel('cost')
plt.title('GD')
plt.show()

predictions_num = 100
x_predictions = np.linspace(x_train.min(),x_train.max(),predictions_num).reshape(predictions_num,1)
y_predictions = linear_regression.predict(x_predictions)

plt.scatter(x_train,y_train,label='Train data')
plt.scatter(x_test,y_test,label='test data')
plt.plot(x_predictions,y_predictions,'r',label = 'Prediction')
plt.xlabel(input_param_name)
plt.ylabel(output_param_name)
plt.title('Happy')
plt.legend()
plt.show()

数据情况：

损失情况：

开始时的损失： 14.686657434569007
训练后的损失： 0.23474965096256492

多变量实验分析

数据情况：

import sys
sys.path.append('E:/PycharmProjects/machine learning/code refactoring/&#x7EBF;&#x6027;&#x56DE;&#x5F52;-&#x4EE3;&#x7801;&#x5B9E;&#x73B0;/')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly
import plotly.graph_objs as go

plotly.offline.init_notebook_mode()
from linear_regression1 import LinearRegression

data = pd.read_csv('E:/PycharmProjects/machine learning/code refactoring/&#x7EBF;&#x6027;&#x56DE;&#x5F52;-&#x4EE3;&#x7801;&#x5B9E;&#x73B0;/data/world-happiness-report-2017.csv')

train_data = data.sample(frac=0.8)
test_data = data.drop(train_data.index)

input_param_name_1 = 'Economy..GDP.per.Capita.'
input_param_name_2 = 'Freedom'
output_param_name = 'Happiness.Score'

x_train = train_data[[input_param_name_1, input_param_name_2]].values
y_train = train_data[[output_param_name]].values

x_test = test_data[[input_param_name_1, input_param_name_2]].values
y_test = test_data[[output_param_name]].values

Configure the plot with training dataset.

plot_training_trace = go.Scatter3d(
    x=x_train[:, 0].flatten(),
    y=x_train[:, 1].flatten(),
    z=y_train.flatten(),
    name='Training Set',
    mode='markers',
    marker={
        'size': 10,
        'opacity': 1,
        'line': {
            'color': 'rgb(255, 255, 255)',
            'width': 1
        },
    }
)

plot_test_trace = go.Scatter3d(
    x=x_test[:, 0].flatten(),
    y=x_test[:, 1].flatten(),
    z=y_test.flatten(),
    name='Test Set',
    mode='markers',
    marker={
        'size': 10,
        'opacity': 1,
        'line': {
            'color': 'rgb(255, 255, 255)',
            'width': 1
        },
    }
)

plot_layout = go.Layout(
    title='Date Sets',
    scene={
        'xaxis': {'title': input_param_name_1},
        'yaxis': {'title': input_param_name_2},
        'zaxis': {'title': output_param_name}
    },
    margin={'l': 0, 'r': 0, 'b': 0, 't': 0}
)

plot_data = [plot_training_trace, plot_test_trace]

plot_figure = go.Figure(data=plot_data, layout=plot_layout)

plotly.offline.plot(plot_figure)

num_iterations = 500
learning_rate = 0.01
polynomial_degree = 0
sinusoid_degree = 0

linear_regression = LinearRegression(x_train, y_train, polynomial_degree, sinusoid_degree)

(theta, cost_history) = linear_regression.train(
    learning_rate,
    num_iterations
)

print('&#x5F00;&#x59CB;&#x635F;&#x5931;',cost_history[0])
print('&#x7ED3;&#x675F;&#x635F;&#x5931;',cost_history[-1])

plt.plot(range(num_iterations), cost_history)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Gradient Descent Progress')
plt.show()

predictions_num = 10

x_min = x_train[:, 0].min();
x_max = x_train[:, 0].max();

y_min = x_train[:, 1].min();
y_max = x_train[:, 1].max();

x_axis = np.linspace(x_min, x_max, predictions_num)
y_axis = np.linspace(y_min, y_max, predictions_num)

x_predictions = np.zeros((predictions_num * predictions_num, 1))
y_predictions = np.zeros((predictions_num * predictions_num, 1))

x_y_index = 0
for x_index, x_value in enumerate(x_axis):
    for y_index, y_value in enumerate(y_axis):
        x_predictions[x_y_index] = x_value
        y_predictions[x_y_index] = y_value
        x_y_index += 1

z_predictions = linear_regression.predict(np.hstack((x_predictions, y_predictions)))

plot_predictions_trace = go.Scatter3d(
    x=x_predictions.flatten(),
    y=y_predictions.flatten(),
    z=z_predictions.flatten(),
    name='Prediction Plane',
    mode='markers',
    marker={
        'size': 1,
    },
    opacity=0.8,
    surfaceaxis=2,
)

plot_data = [plot_training_trace, plot_test_trace, plot_predictions_trace]
plot_figure = go.Figure(data=plot_data, layout=plot_layout)
plotly.offline.plot(plot_figure)

Original: https://blog.csdn.net/qq_52053775/article/details/125959287
Author: 樱花的浪漫
Title: 线性回归原理推导及代码实现

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/635185/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

Vue考试题单选、多选、判断页面渲染和提交

前言: 这几天在写简易版的学生考试,当我们获取到后端返给我们的试题如何再提交给后端呢?这里我的题目只有单选、多选、判断题目首先我们看一下获取试题的数据结构,(这里的字段命名就不要吐…

人工智能 2023年6月27日
0086
【转】自然语言系列学习之表示学习与知识获取（三）知识图谱

分布式表示可以非常好的建立跨领域跨对象的知识迁移。有非常多的工作是去学习不同语言的各种词汇在同一个空间里统一的表示，这种表示对构建跨语言的知识迁移或者进行机器翻译都是非常重要的。此…

人工智能 2023年6月10日
0089
OpenCV变脸大法–‘让妖怪现原形‘(附源码)

导读本文将介绍如何使用OpenCV和Dlib实现人脸变形(人脸->人脸和人脸->动物脸)。（公众号：OpenCV与AI深度学习）背景介绍我们常常在影视作品中看到一…

人工智能 2023年7月20日
0088
JDBC

JDBC连接数据库的方式方式一： public void test1() { try { Driver driver = new Driver(); //获得一个驱动 //连接数…

人工智能 2023年6月4日
00136
Windows下安装使用ADB，简单易懂教程

下载因为adb工具是放在android sdk里platform-tools中的，所以只需要到浏览器（推荐使用Google Chroma）中搜索android sdk到官网进行下…

人工智能 2023年5月30日
00107
地物分类：基于Unet的建筑物轮廓识别

地物分类：基于Unet的建筑物轮廓识别 Unet模型 Unet语义分割模型在kaggle竞赛中的一些图像识别任务比较火，比如data-science-bowl-2018，airbu…

人工智能 2023年7月1日
00102
QT designer安装及运用

在PyQt编辑界面，可以使用可视化工具Qt Designer来完成，通过拖拽控件，编辑数值，可以直接看到效果 QT designer的安装第一种方式：如果有安装pycharm可…

人工智能 2023年7月4日
00106
TF-IDF的算法原理以及Python实现

算法原理 TF-IDF（Term Frequency-Inverse Document Frequency）是词频-逆文档频率，主要实现在一个文章集中找到每篇文章的关键字（也就是…

人工智能 2023年5月28日
0090
用 Python 开发的 PDF 抽取Excel表格 2.0版

前些天向大家介绍了我开发的从PDF抽取表格小工具的使用方法（⬅️点击直达），有同学反馈说有一些问题：一页PDF有多张表，只能抽取第一个有些表格线条是透明的，无法抽取一页一页处…

人工智能 2023年6月4日
0085
Python数据挖掘——基础知识

Python数据挖掘——基础知识数据挖掘又称从数据中挖掘知识、知识提取、数据／模式分析即为：从数据中发现知识的过程 1、数据清理（消除噪声，删除不一致数据） 2、数据集成 …

人工智能 2023年6月4日
00116
OpenCV DNN调用训练好的caffe 模型(目标检测)

Original: https://blog.csdn.net/Msyusheng/article/details/122736066Author: splendid.rain生T…

人工智能 2023年7月10日
0070
[ 注意力机制 ] 经典网络模型3——ECANet 详解与复现

🤵 Author ：Horizon Max ✨ 编程技巧篇：各种操作小结 🎇 机器视觉篇：会变魔术 OpenCV 💥 深度学习篇：简单入门 PyTorch 🏆 神经网络篇：经典网络…

人工智能 2023年7月25日
0088
Python:计算机视觉实现视频的AI换脸（最基础）

一、实验要求 1、手动点击关键点进行替换 2、利用光流对相邻的视频帧进行关键点的追踪二、实验结果三、实验代码 1、手动进行图像关键点的点击 import cv2 import …

人工智能 2023年5月28日
0087
如何使用PyTorch进行模型训练

如何使用PyTorch进行模型训练在本文中，我们将详细介绍如何使用PyTorch进行模型训练。我们将涵盖算法原理、公式推导、计算步骤、复杂Python代码示例，并解释代码细节。 …

人工智能 2024年1月2日
0053
实验二-洗衣机模糊推理实验-matlab/python

实验二洗衣机模糊推理系统实验一、实验目的理解模糊逻辑推理的原理及特点，熟练应用模糊推理。二、实验内容采用Matlad 7.0 的Fuzzy Logic Tool 设计…

人工智能 2023年7月5日
0082
mmdetection ValueError: need at least one array to concatenate解决方案

在mmdetection中有时候训练模型会出现ValueError: need at least one array to concatenate的错误，详情如下图所示。只要配置…

人工智能 2023年7月26日
0067

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

线性回归原理推导及代码实现

误差：

独立同分布：

线性回归代码实现

实验分析：

多变量实验分析

大家都在看