吴恩达机器学习作业一：利用多元线性回归模型实现房价预测（python实现）

2023年6月15日上午7:27 • 人工智能 • 阅读 96

吴恩达机器学习作业一：利用多元线性回归模型实现房价预测（python实现）

该文是针对吴恩达机器学习作业任务二和任务三中，利用多元线性回归模型实现房价预测以及使用正规方程求得最佳theta的取值，使代价值最小，对于任务一利用单变量线线性回归模型实现餐车利润预测见博客：传送门

文章目录

*
– 吴恩达机器学习作业一：利用多元线性回归模型实现房价预测（python实现）
–
+ 任务
+
* 归一化处理
* 初始化值的设置
* 代价函数的计算
* 梯度下降算法
* 绘制图谱，观察学习取不同值时，随着迭代次数的增加，代价值的变化速度
* 正规方程求解
* 全部代码

任务

In this part, you will implement linear regression with multiple variables to
predict the prices of houses. Suppose you are selling your house and you
want to know what a good market price would be. One way to do this is to
first collect information on recent houses sold and make a model of housing
prices.

The file ex1data2.txt contains a training set of housing prices in Port-
land, Oregon. The first column is the size of the house (in square feet), the
second column is the number of bedrooms, and the third column is the price
of the house.

利用多元线性回归模型预测房价，给出的训练数据集中包括房子面积，卧室数量，房价。
由于房子的大小是卧室数量的1000倍，因此需要将所有的训练数据进行归一化处理。
学习率alpha的取值不同，在运行梯度下降算法时，效果不同，在该任务重分别设置alpha[0.01,0.03,0.1,0.3],计算在迭代50次下，代价值的变换情况，并绘制图片进行展示。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


path_file="../code/ex1-linear regression/ex1data2.txt"
data=pd.read_csv(path_file,names=['Size', 'Bedrooms', 'Price'])
data.head()

归一化处理

这里采用的是Z-score标准化方法，即给予原始数据的均值（mean）和标准差（standard deviation）进行数据的标准化。经过处理的数据符合标准正态分布，即均值为0，标准差为1。


data=(data-data.mean())/data.std()
data.head()

初始化值的设置

对于x0的值都为1，因此需要在训练数据的第一列添加一列，值为1
学习率设定为[0.01,0.03,0.1,0.3],theta初始化值设为0，迭代次数设为50次，m表示总共有多少个输入行

data.insert(0,"ones",1)

theta=np.zeros((1,X.shape[1]))
print(theta)
alpha=np.array([0.01,0.03,0.1,0.3])
print(alapha)
iters=50
m=X.shape[0]
col=data.shape[1]
X=np.array(data.iloc[:,0:col-1])
Y=np.array(data.iloc[:,col-1:col])

代价函数的计算

def computeCost(X,Y,theta):
    erros=np.power(np.dot(X,(theta.T))-Y,2)
    return np.sum(erros)/(2*len(erros))
computeCost(X,Y,theta)

0.48936170212765967

梯度下降算法

因为最后需要绘制学习率在取不同值的情况下，代价值随迭代次数的变化，因此需要利用cost_list将不同学习率下每次迭代后的代价值存储起来,cur_cost表示当前alpha取值下，每次迭代下代价的取值。

def gradientDescent(X,Y,theta,iters,alpha):
    cur_cost=np.zeros((1,iters))
    for i in range(iters):
        cur_cost[0,i]=computeCost(X,Y,theta)
        error=np.dot(X,(theta.T))-Y
        temp=np.dot(error.T,X)

        theta=theta-alpha*temp/m

    return theta,cur_cost

costs_list=np.zeros((4,iters))

for i in range(len(alpha)):
    cur_theta,cur_cost=gradientDescent(X,Y,theta,iters,alpha[i])
    costs_list[i]=cur_cost

绘制图谱，观察学习取不同值时，随着迭代次数的增加，代价值的变化速度

def drawFigure(inters,costs_list):
    fig,ax=plt.subplots(figsize=(12,8))
    ax.plot(inters,costs_list[0],'r',label='alpha=0.01')
    ax.plot(inters,costs_list[1],'b',label='alpha=0.03')
    ax.plot(inters,costs_list[2],'y',label='alpha=0.1')
    ax.plot(inters,costs_list[3],'g',label='alpha=0.3')
    ax.legend(loc=2)
    ax.set_xlabel('Interations')
    ax.set_ylabel('Cost')
    ax.set_title('Error vs. Training Epoch')
    plt.show()
drawFigure(np.arange(iters),costs_list)

正规方程求解

利用正规方程求解最佳theta

def compute_theta(X,Y):
    theta=np.linalg.inv(np.dot(X.T,X)).dot(X.T).dot(Y)
    return theta
theta=compute_theta(X,Y)
cost=computeCost(X,Y,theta.T)
print(cost)

0.13068648053904197

全部代码

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def computeCost(X,Y,theta):
    erros=np.power(np.dot(X,(theta.T))-Y,2)
    return np.sum(erros)/(2*len(erros))

def gradientDescent(X,Y,theta,iters,alpha):
    cur_cost=np.zeros((1,iters))
    for i in range(iters):
        cur_cost[0,i]=computeCost(X,Y,theta)
        error=np.dot(X,(theta.T))-Y
        temp=np.dot(error.T,X)

        theta=theta-alpha*temp/m

    return theta,cur_cost

def drawFigure(inters,costs_list):
    fig,ax=plt.subplots(figsize=(12,8))
    ax.plot(inters,costs_list[0],'r',label='alpha=0.01')
    ax.plot(inters,costs_list[1],'b',label='alpha=0.03')
    ax.plot(inters,costs_list[2],'y',label='alpha=0.1')
    ax.plot(inters,costs_list[3],'g',label='alpha=0.3')
    ax.legend(loc=2)
    ax.set_xlabel('Interations')
    ax.set_ylabel('Cost')
    ax.set_title('Error vs. Training Epoch')
    plt.show()

def normalEqn(X,Y):
    theta=np.linalg.inv(np.dot(X.T,X)).dot(X.T).dot(Y)
    cost=computeCost(X,Y,theta.T)
    return theta,cost

path_file="../code/ex1-linear regression/ex1data2.txt"
data=pd.read_csv(path_file,names=['Size', 'Bedrooms', 'Price'])

data=(data-data.mean())/data.std()

data.insert(0,"ones",1)

theta=np.zeros((1,X.shape[1]))

alpha=np.array([0.01,0.03,0.1,0.3])

iters=50
m=X.shape[0]
col=data.shape[1]
X=np.array(data.iloc[:,0:col-1])
Y=np.array(data.iloc[:,col-1:col])

costs_list=np.zeros((4,iters))

for i in range(len(alpha)):
    cur_theta,cur_cost=gradientDescent(X,Y,theta,iters,alpha[i])
    costs_list[i]=cur_cost

drawFigure(np.arange(iters),costs_list)

theta,cost=normalEqn(X,Y)
print(theta)
print(cost)

Original: https://blog.csdn.net/weixin_43765186/article/details/124346385
Author: 墨玲珑
Title: 吴恩达机器学习作业一：利用多元线性回归模型实现房价预测（python实现）

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/614098/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

Yolov7如期而至，奉上ONNXRuntime的推理部署流程(CPU/GPU)

一、V7效果真的的v587，识别率和速度都有了极大的提升，这里先放最新鲜的github链接： https://github.com/WongKinYiu/yolov7 二、v7的训…

人工智能 2023年6月16日
00229
Anaconda基本安装与配置：安装、排错、配置清华源

文章目录 0x00.安装 * 1.下载 – （1）清华镜像站（2）官网 2.安装 – （1）Windows安装（2）Windows安装可能遇到的错误（…

人工智能 2023年7月17日
00101
动手学深度学习：softmax完整代码（pytorch + windows+ pycharm）

删去多余的演示部分，解决了图像无法显示的问题文章目录 softmax从零开始实现 softmax简洁实现遇到的问题 * pycharm无法多进程读取数据导致的报错 pychar…

人工智能 2023年7月22日
00106
Yolov7：最新最快的实时检测框架，最详细分析解释（附源代码）

关注并星标从此不迷路计算机视觉研究院公众号ID｜ ComputerVisionGzq 学习群｜扫码在主页获取加入方式论文地址：https://arxiv.org/pdf/…

人工智能 2023年6月12日
0087
【人工智能】Fisher 线性分类器的设计与实现（QDU）

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。 Original: https://blo…

人工智能 2023年7月28日
0067
tensorflow2.x（二）多进程并行模型

在上一篇文章中，我们解决了tensorflow在大样本训练中内存不足的问题，但是可能无法最大化利用GPU资源，在这篇文章中，我们继续挖掘如何充分利用GPU资源，将显卡的内存、算力全…

人工智能 2023年5月25日
00136
绕不过的并发编程–synchronized原理

并发编程的三大问题什么是并发编程的三大问题？为什么有这些问题？具体的例子呢？可见性什么是可见性？「可见性」是指多个线程访问一个资源时，该资源的状态、值信息等对于其他线程都是…

人工智能 2023年6月27日
0079
模型与logit_嵌套logit模型的STATA应用及结果解读

选择实验获得的数据主要通过离散选择模型来完成。离散选择模型中，最主要的是logit模型。之前已经介绍了二项logit模型回归的STATA实现（有修改）,多项logit模型详解,多…

人工智能 2023年6月18日
0091
python快速实现简易超级玛丽小游戏

《超级玛丽》是一款超级马里奥全明星的同人作品，也是任天堂公司出品的著名横版游戏。《超级马里奥》是一款经典的像素冒险过关游戏。最早在红白机上推出，有多款后续作品，迄今多个版本合共销…

人工智能 2023年6月30日
00106
数据可视化

数据可视化数据可视化指的是通过可视化表示来探索数据，它与数据挖掘紧密相关，而数据挖掘指的是使用代码来探索数据集的规律和关联。 1.matplotlib pip install m…

人工智能 2023年7月17日
0087
基于移动边缘计算机如何赋能传统企业园区

随着社会分工的精细化和专业化，越来越多有一技之长的中小企业孕育而生。小企业园区在日常管理中存在出入人员多且复杂，员工财产、公共设施、重要资料等易被盗等风险。同时，小企业出于运营成本…

人工智能 2023年6月4日
0067
FPN网络详解

1 特征金字塔特征金字塔(Feature Pyramid Networks， FPN)的基本思想是通过构造一系列不同尺度的图像或特征图进行模型训练和测试，目的是提升检测算法对…

人工智能 2023年6月23日
0097
opencv 直方图均衡化(-215:Assertion failed) _src.type() == CV_8UC1 in function ‘equalizeHist‘

…… median = cv2.medianBlur(img_dif2, 5)print(type(median),median.shape,median….

人工智能 2023年6月19日
0093
ClickHouse学习笔记之监控

概述 ClickHouse运行时会将一些自身的运行状态记录到众多系统表中(system.*)，所以我们对于ClickHouse的运行指标的监控，也主要来自于这些系统表，但是这种方式…

人工智能 2023年6月27日
0072
二手房数据分析预测系统

©作者 | leo 随着科技的进步，信息已经成为了推动科技发展的重要元素。通过对海量数据的分析能够更好的服务于未来的生产生活，并且能够及时调整策略，未雨绸缪。今天我们为大家展示一…

人工智能 2023年7月16日
0073
yolov5口罩检测flask部署

这个报错不知道怎么解决，网上查了解决方案加 import matplotlib 和 matplotlib.use(‘Agg’) 还是没用。不过只是关闭界面才…

人工智能 2023年7月23日
0059

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

吴恩达机器学习作业一：利用多元线性回归模型实现房价预测（python实现）

吴恩达机器学习作业一：利用多元线性回归模型实现房价预测（python实现）

文章目录

任务

归一化处理

初始化值的设置

代价函数的计算

梯度下降算法

绘制图谱，观察学习取不同值时，随着迭代次数的增加，代价值的变化速度

正规方程求解

全部代码

大家都在看