第五课：回归分析

2023年6月17日下午2:12 • 人工智能 • 阅读 86

三 Nonparametric methods

一线性回归

2.代码

% Perform the parametric linear regression
clear, close all
clc

%% load Advertising data:
% Sales (in thousands of units) for a particular product as a function of advertising budgets
% (in thousands of dollars) for TV, radio, and newspaper media.

[num,txt,raw]=xlsread('Advertising.csv');
Sales=num(:,5);
TV=num(:,2);
Radio=num(:,3);
Newspaper=num(:,4);

%% Perform univariate linear regression on TV advertising budget
%%% Regress "Sales" on "TV"
mdl_tv = fitlm(TV,Sales)
% mdl_tv.Coefficients
% mdl_tv.RMSE
% mdl_tv.Rsquared.Ordinary

figure,
scatter(TV,Sales), hold on
plot([min(TV), max(TV)],[predict(mdl_tv,min(TV)), predict(mdl_tv,max(TV))],'k-','linewidth',2)
xlabel('TV budgets'), ylabel('Sales of product')
title(['Sales=' num2str(mdl_tv.Coefficients.Estimate(1)) '+' ...

          num2str(mdl_tv.Coefficients.Estimate(2)) '*TV',...

          ', R^2=' num2str(mdl_tv.Rsquared.Ordinary)])

%% Perform multiple linear regression on all advertising budgets
mdl_multiple = fitlm([TV,Radio,Newspaper],Sales)

%%% check the correlations between different types of budgets
[rho,p]=corr([TV,Radio,Newspaper]);

结果如下：

scatter(TV,Sales) 画出散点图

mdl_tv = fitlm(TV,Sales) 表示单因子线性回归分析

plot([min(TV), max(TV)],[predict(mdl_tv,min(TV)), predict(mdl_tv,max(TV))],’k-‘,’linewidth’,2)

[min(TV), max(TV)] 给出图形的范围

predict函数表示给定一个模型，还有一个输入参数，能够得出模型的输出参数

mdl_multiple = fitlm([TV,Radio,Newspaper],Sales) 多因子回归分析

[rho,p]=corr([TV,Radio,Newspaper]) 衡量三个系数之间的相关性。

rho得出任意两个系数之间的相关系数，是一个系数矩阵，越大则相关性越强；p得出相关系数的显著性，是一个矩阵，越小则越显著

R^2：代表拟合效果，越接近1越好。 F：越大越好 pValue：代表显著性，越小越好

二非线性回归

1.理论

% Perform the parametric polynomial regression
clear, close all

%% load Auto data:
[num,txt,raw]=xlsread('Auto.csv');
mpg=num(:,1); % gas mileage in miles per gallon
horsepower=num(:,4);

excludeind=find(isnan(horsepower));  % identify the indices with missing horsepower data
mpg(excludeind,:)=[]; % remove the observations with missing horsepower data
horsepower(excludeind,:)=[]; % remove the observations with missing horsepower data

%% perform polynomial regression
p_d1=polyfit(horsepower,mpg,1); % linear
RSS=sum((polyval(p_d1,horsepower)-mpg).^2); % residual sum of square
TSS=sum((mpg-mean(mpg)).^2); % total sum of square
R_square_d1=1-RSS/TSS;

p_d2=polyfit(horsepower,mpg,2); % Degree 2
RSS=sum((polyval(p_d2,horsepower)-mpg).^2); % residual sum of square
TSS=sum((mpg-mean(mpg)).^2); % total sum of square
R_square_d2=1-RSS/TSS;

p_d5=polyfit(horsepower,mpg,5); % Degree 5
RSS=sum((polyval(p_d5,horsepower)-mpg).^2); % residual sum of square
TSS=sum((mpg-mean(mpg)).^2); % total sum of square
R_square_d5=1-RSS/TSS;

figure,
scatter(horsepower,mpg), hold on
x=sort(horsepower);
plot(x,polyval(p_d1,x),'k-','linewidth',2)
plot(x,polyval(p_d2,x),'r-','linewidth',2)
plot(x,polyval(p_d5,x),'g-','linewidth',2)
xlabel('horsepower'), ylabel('mpg')
legend('Observations',...

             ['Linear, R^2=' num2str(R_square_d1)],...

             ['Degree 2, R^2=' num2str(R_square_d2)],...

             ['Degree 5, R^2=' num2str(R_square_d5)])

p_d2=polyfit(horsepower,mpg,2);

horsepower：输入，x轴 mpg：y轴 2：最高只有2次项

polyval(p_d2,horsepower) 预测函数

三 Nonparametric methods

3.1理论

3.2Decision Trees

3.3代码

准备数据集，数据预处理

将数据转换为table类型，并且第一行的数据不转换：

data=cell2table(raw(2:end,:))&#xA0; &#xA0; &#xA0; &#xA0;

定义变量名：

for i=1:size(data,2)
    data.Properties.VariableNames{i}=raw{1,i}; % assign the VariableNames
end

去除无用数据：

excludeind=find(isnan(data.Salary));  % identify the indices with missing Salary data
data(excludeind,:)=[]; % remove the players with missing Salary data

取log函数（其实是ln，自然对数为底）使数据更平滑：

data.Salary=log(data.Salary); % log-transform to have a typical bell-shape distribution

数据集分解：

建立模型

第一步：

选定六个指标；选定salary作为衡量标准，交叉验证，分为5折；view函数为画出树图，如下：

第二步：修建树图

改prunelevel或者prunealpha

也可以直接在图片上面修改：

查看参数的重要程度：

imp = predictorImportance(tree_6v);

第三步：只用两个参数运行模型

完整代码：

% Perform the regression tree using "fitrtree" function
clear, close all

%% load Hitters data: Major League Baseball Data from the 1986 and 1987 seasons.

[num,txt,raw]=xlsread('Hitters.csv');

% construct Table array
data=cell2table(raw(2:end,:));
for i=1:size(data,2)
    data.Properties.VariableNames{i}=raw{1,i}; % assign the VariableNames
end
excludeind=find(isnan(data.Salary));  % identify the indices with missing Salary data
data(excludeind,:)=[]; % remove the players with missing Salary data

data.Salary=log(data.Salary); % log-transform to have a typical bell-shape distribution

%% Separate data into training (70%) and test (30%) datasets
rng(0,'twister')  % For reproducibility&#xFF0C;&#x53EF;&#x5220;&#x9664;&#x8FD9;&#x4E00;&#x884C;&#x4EE3;&#x7801;&#xFF0C;&#x53EA;&#x662F;&#x4E3A;&#x4E86;&#x4FDD;&#x8BC1;&#x968F;&#x673A;&#x6570;&#x662F;&#x4E00;&#x6837;&#x7684;

C = cvpartition(size(data,1),'holdout',0.30); % randomly hold out 30% of subjects for test
dataTrain = data(C.training,:);
dataTest = data(C.test,:);

%% [Method 1] Construct a regression tree using six variables/features
% Construct a regression tree using the training dataset
predictors={'Years','Hits','RBI','PutOuts','Walks','Runs'};
tree_6v = fitrtree(dataTrain,'Salary','PredictorNames',predictors,...

    'OptimizeHyperparameters','all',...

    'HyperparameterOptimizationOptions',...

    struct('AcquisitionFunctionName','expected-improvement-plus','kfold',5));
view(tree_6v,'Mode','graph')

% Test the tree model on the test dataset
Salary_predict=predict(tree_6v,dataTest);
MSE=loss(tree_6v,dataTest);  % mean squared error, MSE=mean((Salary_predict-dataTest.Salary).^2);

RSS=sum((Salary_predict-dataTest.Salary).^2); % residual sum of square
TSS=sum((dataTest.Salary-mean(dataTest.Salary)).^2); % total sum of square
R_square=1-RSS/TSS;

figure,plot(1:size(dataTest,1),dataTest.Salary,'r.-',1:1:size(dataTest,1),Salary_predict,'b.-')
title(['[Method 1] MSE = ' num2str(MSE) ', R^2 = ' num2str(R_square)])
legend('True salaries','Predicted salaries')
xlabel('Players'),ylabel('Salaries')

%% [Method 2] Construct a regression tree using six variables/features with pruning
prunelevel=6;
tree_prune = prune(tree_6v,'level',prunelevel);

% prunealpha=0.03;
% tree_prune = prune(tree_6v,'alpha',prunealpha);

view(tree_prune,'Mode','graph')

% Test the tree model on the test dataset
Salary_predict=predict(tree_prune,dataTest);
MSE=loss(tree_prune,dataTest);  % mean squared error, MSE=mean((Salary_predict-dataTest.Salary).^2);

RSS=sum((Salary_predict-dataTest.Salary).^2); % residual sum of square
TSS=sum((dataTest.Salary-mean(dataTest.Salary)).^2); % total sum of square
R_square=1-RSS/TSS;

figure,plot(1:size(dataTest,1),dataTest.Salary,'r.-',1:1:size(dataTest,1),Salary_predict,'b.-')
title(['[Method 2] MSE = ' num2str(MSE) ', R^2 = ' num2str(R_square)])
legend('True salaries','Predicted salaries')
xlabel('Players'),ylabel('Salaries')

%% Variable/Feature selection based on the importance scores
imp = predictorImportance(tree_6v);

figure;
bar(imp);
title('Predictor Importance Estimates');
ylabel('Estimates');
xlabel('Predictors');
h = gca;
h.XTickLabel = tree_6v.PredictorNames;
h.XTickLabelRotation = 45;
h.TickLabelInterpreter = 'none';

%% [Method 3] Construct a regression tree using key variables/features
% Construct a regression tree using the training dataset
predictors={'Years','Hits'};  % only use the two key variables/features
tree_2v = fitrtree(dataTrain,'Salary','PredictorNames',predictors,...

    'OptimizeHyperparameters','all',...

    'HyperparameterOptimizationOptions',...

    struct('AcquisitionFunctionName','expected-improvement-plus','kfold',5));
view(tree_2v,'Mode','graph')

% Test the tree model on the test dataset
Salary_predict=predict(tree_2v,dataTest);
MSE=loss(tree_2v,dataTest);  % mean squared error, MSE=mean((Salary_predict-dataTest.Salary).^2);

RSS=sum((Salary_predict-dataTest.Salary).^2); % residual sum of square
TSS=sum((dataTest.Salary-mean(dataTest.Salary)).^2); % total sum of square
R_square=1-RSS/TSS;

figure,plot(1:size(dataTest,1),dataTest.Salary,'r.-',1:1:size(dataTest,1),Salary_predict,'b.-')
title(['[Method 3] MSE = ' num2str(MSE) ', R^2 = ' num2str(R_square)])
legend('True salaries','Predicted salaries')
xlabel('Players'),ylabel('Salaries')

四 APP使用

比较适合熟悉基本的模型，但是因为是一个包装好的模型，所以不能调节具体的参数，如果真正的要运行一个模型并且查看和调节各种参数，还是需要自己去写代码来运行。

下面是使用APP的具体方法！

开始后会回到新建界面，如下：

如下所示，也可以贪心一点，全部都选，选择All

单机任意一个模型之后，可以查看各种效果图：

Original: https://blog.csdn.net/pxyp123/article/details/123425987
Author: pxyp123
Title: 第五课：回归分析

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/630361/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

【语音增强论文解读 01】 A Convolutional Recurrent Neural Network for Real-Time SpeechEnhancement

作者：Ke Tan, DeLiang Wang 文末附文章地址与开源代码地址动机语音增强的许多实际应用，如助听器和人工耳蜗都需要无延迟或低延迟的实时处理、 2. 方法受CRN…

人工智能 2023年5月31日
00104
时间序列预测_基于Excel的时间序列分析和预测

时间序列用于描述某种现象随时间发展变化的特征。在生活和工作中经常需要作出预测。比如，预测一只股票价格的走势，预测下一年度的销售额等。时间序列基础概念 1、什么是时间序列？时间序…

人工智能 2023年6月1日
00108
李宏毅《深度学习》- Self-attention 自注意力机制

Transformer & BERT PPT: https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/self…

人工智能 2023年5月25日
0066
OSV_q AttributeError: ‘numpy.ndarray‘ object has no attribute ‘clone‘

终于进入训练过程了 Load Data: 11====>> Build Net ====>> Load I model ====>> Train…

人工智能 2023年7月23日
0071
丰收互联蓝牙key怎么开机_【测评】百元性价比之王，Redmi AirDots2 蓝牙耳机

新品测评米家在你的生活中，你会遇到各种各样的人，他们大多只是相遇或擦肩而过，有人会成为你生命中某个阶段的一道风景，真正陪你走到最后的人必须和你有同样的灵魂–有趣和有爱…

人工智能 2023年5月27日
00202
新闻分析报告：ActiveDirectory证书服务是企业网络的一大安全盲点

Microsoft 的 Active Directory PKI 组件通常存在配置错误，允许攻击者获得账户和域级别的权限。作为 Windows 企业网络的核心，处理用户和计算机身…

人工智能 2023年7月23日
0065
CMake Error at 3rdparty/ippicv/downloader.cmake:77 (message): ICV: Failed to download ICV package:

编译OPENCV3.1.0报错 — Looking for ffmpeg/avformat.h – not found— Checking fo…

人工智能 2023年7月20日
0056
使用Dask，SBERT SPECTRE和Milvus构建自己的ARXIV论文相似性搜索引擎

通过矢量相似性搜索，可以在〜50ms内响应〜640K论文上的语义搜索查询 Arxiv.org大家一定都不陌生，学习数据科学的最佳方法之一是阅读Arxiv.org上的开源研究论文。但…

人工智能 2023年5月27日
0057
【Matlab】一键Matlab代码转python代码详细教程

Motivation 博主最近在看的一篇做biomedical image SR的论文，其对数据的预处理用matlab做的…要在集群上跑的话还要重新配环境装matlab…

人工智能 2023年7月5日
0051
新手入门深度学习 | 3-1：数据管道Dataset

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年5月25日
0072
机器视觉系列（四）——相机部分

一、工业相机和其他数码相机的区别工业相机和普通数码相机的区别：①工业相机是工业用品，需要连续长时间运行，所以在性能可靠性、稳定性、环境稳定、防水、连续运行时间上有明显的优势。工…

人工智能 2023年6月24日
0071
python机器学习

Python拥有大量的数据分析、统计和机器学习库，使其成为许多数据科学家的首选语言。一些广泛使用的机器学习和其他数据科学应用程序包被列在下面： SciPy 栈（SciPy sta…

人工智能 2023年6月16日
0067
矩阵求导方法

矩阵求导方法在机器学习过程中，我们经常会对矩阵进行相关的操作，现对矩阵求导方法进行概况与推导。 1> 若矩阵A是1×1 的矩阵（即一个数），矩阵B也是1&#215…

人工智能 2023年6月15日
00124
信创大潮下，产业金融路在何方？

一个金融数字化转型的底层逻辑正在显现。从客户需求出发，应用层、平台层升级、迭代，当达到开发天花板时，将会倒逼底层基础设施升级、迭代，前后端加速融合。作者|斗斗编辑|皮爷出品|…

人工智能 2023年6月28日
00123
BAT工作超十年，总结这份文档让您成为Java岗位offer收割机

从各个时期，分别规划好自己的计划，让自己不急不躁、稳中求胜。在求职前期，通过实习和自我认知，给自己一个合适的定位，根据这个定位制作简历、补足知识。在求职过程中，保持良好的心态，不断…

人工智能 2023年6月29日
0082
文本生成：自动摘要评价指标 Rouge

Rouge的全名是Recall-Oriented Understudy for Gisting Evaluation，单看名字就会发现Rouge是由召回率演变而来的指标，用于衡量模…

人工智能 2023年5月31日
0072

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

第五课：回归分析

2.代码

1.理论

3.1理论

3.2Decision Trees

3.3代码

大家都在看