机器学习：使用matlab实现逻辑回归解决数字识别（多元分类）问题

2023年7月2日上午1:44 • 人工智能 • 阅读 110

文章目录

前置章节
数据载入
数据可视化
代价-梯度函数
一对多完成多元分类
预测

前置章节

这次做多元分类，目的是能够识别20 × 20 20\times20 2 0 ×2 0像素的手写体数字。

数据载入

来自MINST的手写数字数据库，矩阵 X中是5000张图片的灰度构成的数据集，因为图片像素是20×20，所以矩阵每行有400个元素，为图片的灰度。 y向量表明图片中的数字是什么，因为matlab数组下标从1开始，所以我们把数字0对应的数据映射到10上。

% Load saved matrices from file
load('ex3data1.mat');
% The matrices X and y will now be in your MATLAB environment

数据可视化

随机抽出几张图片瞅一瞅，吴恩达的代码没仔细研究，先放在这里：

function [h, display_array] = displayData(X, example_width)
%DISPLAYDATA Display 2D data in a nice grid
%   [h, display_array] = DISPLAYDATA(X, example_width) displays 2D data
%   stored in X in a nice grid. It returns the figure handle h and the
%   displayed array if requested.

% Set example_width automatically if not passed in
if ~exist('example_width', 'var') || isempty(example_width)
    example_width = round(sqrt(size(X, 2)));
end

% Gray Image
colormap(gray);

% Compute rows, cols
[m n] = size(X);
example_height = (n / example_width);

% Compute number of items to display
display_rows = floor(sqrt(m));
display_cols = ceil(m / display_rows);

% Between images padding
pad = 1;

% Setup blank display
display_array = - ones(pad + display_rows * (example_height + pad), ...

                       pad + display_cols * (example_width + pad));

% Copy each example into a patch on the display array
curr_ex = 1;
for j = 1:display_rows
    for i = 1:display_cols
        if curr_ex > m,
            break;
        end
        % Copy the patch

        % Get the max value of the patch
        max_val = max(abs(X(curr_ex, :)));
        display_array(pad + (j - 1) * (example_height + pad) + (1:example_height), ...
                      pad + (i - 1) * (example_width + pad) + (1:example_width)) = ...
                        reshape(X(curr_ex, :), example_height, example_width) / max_val;
        curr_ex = curr_ex + 1;
    end
    if curr_ex > m,
        break;
    end
end

% Display Image
h = imagesc(display_array, [-1 1]);

% Do not show axis
axis image off

drawnow;

end

m = size(X, 1);
% Randomly select 100 data points to display
rand_indices = randperm(m);
sel = X(rand_indices(1:100), :);
displayData(sel);

代价-梯度函数

ex3的意思是在ex2的基础上让代价-梯度函数向量化，即通过矩阵运算省略 for循环，不过我已经在ex2搞定了😀，有关公式与推导请看链接，这里不再赘述。

function [J, grad] = lrCostFunction(theta, X, y, lambda)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with
%regularization
%   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.

%               You should set J to the cost.

%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
%       efficiently vectorized. For example, consider the computation
%
%           sigmoid(X * theta)
%
%       Each row of the resulting matrix will contain the value of the
%       prediction for that example. You can make use of this to vectorize
%       the cost function and gradient computations.

%
% Hint: When computing the gradient of the regularized cost function,
%       there're many possible vectorized solutions, but one solution
%       looks like:
%           grad = (unregularized gradient for logistic regression)
%           temp = theta;
%           temp(1) = 0;   % because we don't add anything for j = 0
%           grad = grad + YOUR_CODE_HERE (using the temp variable)
%

h=sigmoid(X*theta);
J=y'*log(h)+(1.-y')*log(1.-h);
J=-J/m;
J=J+(lambda/(2*m)).*(theta'*theta-theta(1)^2);

grad=X'*(h-y);
grad=grad./m;
tmp=grad(1);
grad=grad+(lambda/m).*theta;
grad(1)=tmp;

% =============================================================

grad = grad(:);

end

一对多完成多元分类

一对多的思想在这个博客里，大意就是把多元分类转化为二元分类，因此我们需要训练多个（在数字识别问题里为10个）分类器，就有10个参数向量，构成一个参数矩阵。

训练一个分类器的过程跟二元分类时一样，不过在二元分类中我们用的是matlab自带的函数 fminunc，而这里我们用的是吴恩达提供的函数 fmincg，两者优化的结果差不多，但 fmincg在处理参数数量较多的优化时效率更高。

fmincg函数，为了让博客不要太过冗长，我删除了一些注释：

function [X, fX, i] = fmincg(f, X, options, P1, P2, P3, P4, P5)
% Usage: [X, fX, i] = fmincg(f, X, options, P1, P2, P3, P4, P5)

% Read options
if exist('options', 'var') && ~isempty(options) && isfield(options, 'MaxIter')
    length = options.MaxIter;
else
    length = 100;
end

RHO = 0.01;                            % a bunch of constants for line searches
SIG = 0.5;       % RHO and SIG are the constants in the Wolfe-Powell conditions
INT = 0.1;    % don't reevaluate within 0.1 of the limit of the current bracket
EXT = 3.0;                    % extrapolate maximum 3 times the current bracket
MAX = 20;                         % max 20 function evaluations per line search
RATIO = 100;                                      % maximum allowed slope ratio

argstr = ['feval(f, X'];                      % compose string used to call function
for i = 1:(nargin - 3)
  argstr = [argstr, ',P', int2str(i)];
end
argstr = [argstr, ')'];

if max(size(length)) == 2, red=length(2); length=length(1); else red=1; end
S=['Iteration '];

i = 0;                                            % zero the run length counter
ls_failed = 0;                             % no previous line search has failed
fX = [];
[f1 df1] = eval(argstr);                      % get function value and gradient
i = i + (length<0); % count epochs?! s="-df1;" search direction is steepest d1="-s'*s;" this the slope z1="red/(1-d1);" initial step red (|s|+1) while i < abs(length) not finished + (length>0);                                      % count iterations?!

  X0 = X; f0 = f1; df0 = df1;                   % make a copy of current values
  X = X + z1*s;                                             % begin line search
  [f2 df2] = eval(argstr);
  i = i + (length<0); 1 3 % count epochs?! d2="df2'*s;" f3="f1;" d3="d1;" z3="-z1;" initialize point equal to if length>0, M = MAX; else M = min(MAX, -length-i); end
  success = 0; limit = -1;                     % initialize quanteties
  while 1
    while ((f2 > f1+z1*RHO*d1) || (d2 > -SIG*d1)) && (M > 0)
      limit = z1;                                         % tighten the bracket
      if f2 > f1
        z2 = z3 - (0.5*d3*z3*z3)/(d3*z3+f2-f3);                 % quadratic fit
      else
        A = 6*(f2-f3)/z3+3*(d2+d3);                                 % cubic fit
        B = 3*(f3-f2)-z3*(d3+2*d2);
        z2 = (sqrt(B*B-A*d2*z3*z3)-B)/A;       % numerical error possible - ok!

      end
      if isnan(z2) || isinf(z2)
        z2 = z3/2;                  % if we had a numerical problem then bisect
      end
      z2 = max(min(z2, INT*z3),(1-INT)*z3);  % don't accept too close to limits
      z1 = z1 + z2;                                           % update the step
      X = X + z2*s;
      [f2 df2] = eval(argstr);
      M = M - 1; i = i + (length<0); % count epochs?! d2="df2'*s;" z3="z3-z2;" is now relative to the location of z2 end if f2> f1+z1*RHO*d1 || d2 > -SIG*d1
      break;                                                % this is a failure
    elseif d2 > SIG*d1
      success = 1; break;                                             % success
    elseif M == 0
      break;                                                          % failure
    end
    A = 6*(f2-f3)/z3+3*(d2+d3);                      % make cubic extrapolation
    B = 3*(f3-f2)-z3*(d3+2*d2);
    z2 = -d2*z3*z3/(B+sqrt(B*B-A*d2*z3*z3));        % num. error possible - ok!

    if ~isreal(z2) || isnan(z2) || isinf(z2) || z2 < 0 % num prob or wrong sign?

      if limit < -0.5                               % if we have no upper limit
        z2 = z1 * (EXT-1);                 % the extrapolate the maximum amount
      else
        z2 = (limit-z1)/2;                                   % otherwise bisect
      end
    elseif (limit > -0.5) && (z2+z1 > limit)         % extraplation beyond max?

      z2 = (limit-z1)/2;                                               % bisect
    elseif (limit < -0.5) && (z2+z1 > z1*EXT)       % extrapolation beyond limit
      z2 = z1*(EXT-1.0);                           % set to extrapolation limit
    elseif z2 < -z3*INT
      z2 = -z3*INT;
    elseif (limit > -0.5) && (z2 < (limit-z1)*(1.0-INT))  % too close to limit?

      z2 = (limit-z1)*(1.0-INT);
    end
    f3 = f2; d3 = d2; z3 = -z2;                  % set point 3 equal to point 2
    z1 = z1 + z2; X = X + z2*s;                      % update current estimates
    [f2 df2] = eval(argstr);
    M = M - 1; i = i + (length<0); % count epochs?! d2="df2'*s;" end of line search if success succeeded f1="f2;" fx="[fX'" f1]'; fprintf('%s %4i | cost: %4.6e\r', s, i, f1); s="(df2'*df2-df1'*df2)/(df1'*df1)*s" - df2; polack-ribiere direction tmp="df1;" df1="df2;" df2="tmp;" swap derivatives> 0                                      % new slope must be negative
      s = -df1;                              % otherwise use steepest direction
      d2 = -s'*s;
    end
    z1 = z1 * min(RATIO, d1/(d2-realmin));          % slope ratio but max RATIO
    d1 = d2;
    ls_failed = 0;                              % this line search did not fail
  else
    X = X0; f1 = f0; df1 = df0;  % restore point from before failed line search
    if ls_failed || i > abs(length)          % line search failed twice in a row
      break;                             % or we ran out of time, so we give up
    end
    tmp = df1; df1 = df2; df2 = tmp;                         % swap derivatives
    s = -df1;                                                    % try steepest
    d1 = -s'*s;
    z1 = 1/(1-d1);
    ls_failed = 1;                                    % this line search failed
  end
  if exist('OCTAVE_VERSION')
    fflush(stdout);
  end
end
fprintf('\n');
</0);></0);></0);></0);>

之后， for循环做10次求解最小值即可得到参数矩阵（一些常量应该在循环外定义更好）注意这里吴恩达定义的参数矩阵是10×(n+1)的，我们求出来的是(n+1)×1的列向量，所以储存的时候要转置一下：

function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta
%corresponds to the classifier for label i
%   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
%   logistic regression classifiers and returns each of these classifiers
%   in a matrix all_theta, where the i-th row of all_theta corresponds
%   to the classifier for label i

% Some useful variables
m = size(X, 1);
n = size(X, 2);

% You need to return the following variables correctly
all_theta = zeros(num_labels, n + 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================
% Example Code for fmincg:
%
%     % Set Initial theta
%     initial_theta = zeros(n + 1, 1);
%
%     % Set options for fminunc
%     options = optimset('GradObj', 'on', 'MaxIter', 50);
%
%     % Run fmincg to obtain the optimal theta
%     % This function will return theta and the cost
%     [theta] = ...

%         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...

%                 initial_theta, options);
%

for i=1:num_labels
    initial_theta=zeros(n+1,1);
    options = optimset('GradObj', 'on', 'MaxIter', 50);
    [theta]=fmincg(@(t)lrCostFunction(t,X,(y==i),lambda),initial_theta,options);
    all_theta(i,:)=theta';
end

% =========================================================================

end

num_labels = 10; % 10 labels, from 1 to 10
lambda = 0.1;
[all_theta] = oneVsAll(X, y, num_labels, lambda);

运行完上方的代码，我们就得到了参数矩阵。

预测

接下来就可以利用我们求出的参数来进行预测了，通过
h θ ( X ) = g ( X Θ T ) h_\theta(X)=g(X\Theta^T)h θ(X )=g (X ΘT )
算出图像为某个数字的概率，再利用 [~,p]=max(A,[],2)将矩阵每行最大值的下标取出来，就可以知道该图像最可能是哪个数字。

function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels
%are in the range 1..K, where K = size(all_theta, 1).

%  p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
%  for each example in the matrix X. Note that X contains the examples in
%  rows. all_theta is a matrix where the i-th row is a trained logistic
%  regression theta vector for the i-th class. You should set p to a vector
%  of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
%  for 4 examples)

m = size(X, 1);
num_labels = size(all_theta, 1);

% You need to return the following variables correctly
p = zeros(size(X, 1), 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters (one-vs-all).

%               You should set p to a vector of predictions (from 1 to
%               num_labels).

%
% Hint: This code can be done all vectorized using the max function.

%       In particular, the max function can also return the index of the
%       max element, for more information see 'help max'. If your examples
%       are in rows, then, you can use max(A, [], 2) to obtain the max
%       for each row.

%

[~,p]=max(sigmoid(X*all_theta'),[],2);

% =========================================================================

end

最后得到本逻辑回归算法在训练集上的准确率为94.9%。

pred = predictOneVsAll(all_theta, X);
fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == y)) * 100);

Original: https://blog.csdn.net/ShadyPi/article/details/122643694
Author: ShadyPi
Title: 机器学习：使用matlab实现逻辑回归解决数字识别（多元分类）问题

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/664469/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

注意机制(CBAM)理解

论文题目：《CBAM: Convolutional Block Attention Module》论文地址：https://arxiv.org/pdf/1807.06521.pdf…

人工智能 2023年6月16日
0077
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublas‘

调用nn.linear时出现RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublas&…

人工智能 2023年6月16日
0069
[代码实现]用Tensorflow实现MMoE

本文主要介绍tensorflow中mmoe的实现方式。一.mmoe概念先简单回忆下mmoe的概念：二.已有keras版本实现思路分析 class MMoE(Layer): &…

人工智能 2023年5月26日
0096
【视频异常检测-论文阅读】Anomaly Detection in Video via Self-Supervised and Multi-Task Learning

来源：Georgescu, Mariana-Iuliana, et al. “Anomaly Detection in Video via Self-Supervise…

人工智能 2023年7月27日
0075
OpenCV-迷宫解密

一、你能自己走出迷宫吗？如下图所示，可以看到是一张较为复杂的迷宫图，相信也有人尝试过自己一点一点的找出口，但我们肉眼来解谜恐怕眼睛有点小难受，特别是走了半天发现这迷宫无解，代入一…

人工智能 2023年7月19日
0045
2023最新SSM计算机毕业设计选题大全（附源码+LW）之java手游账号交易系统u2741

对于即将毕业或者即将做课设的同学而言，由于经验的欠缺，面临的第一个难题就是选题，确定好题目之后便是开题报告，如果选题首先看自己学习那些技术，不同技术适合做不同的产品，比如自己会些简…

人工智能 2023年6月28日
0094
使用SwinUnet训练自己的数据集

参考博文： https://blog.csdn.net/qq_37652891/article/details/123932772 数据集准备遥感图像多类别语义分割，总共分为7类…

人工智能 2023年6月15日
00124
【华为机试真题详解】冗余覆盖【2022 Q4 | 100分】

文章目录前言题目描述输入描述输出描述示例 1 示例 2 题目解析参考代码 ; 前言《华为机试真题详解 Python实现》专栏含牛客网华为专栏、华为面经试题、华为OD机…

人工智能 2023年7月30日
0065
MATLAB图像处理算法专栏：直方图变化，平滑滤波，边缘检测，阈值分割处理。

一、直方图变化模块(均值化，规定化)。直方图均衡化处理的”中心思想”是把原始图像的灰度直方图从比较集中的某个灰度区间变成在全部灰度范围内的均匀分布。直方图…

人工智能 2023年6月22日
0076
halcon 条形码识别（持续更新）

达到识别目的的最简化结构 *创建条码识别&amp…

人工智能 2023年5月28日
0099
一种基于卷积神经网络的数据驱动故障预测方法（含代码）

本文以CWRU轴承故障的振动信号数据库作为模型的训练集和测试集并根据现有论文的思路和模型框架，用pytorch复现了论文的模型结构和性能，在二分类问题中准确率高达100% 本文在…

人工智能 2023年7月21日
0080
ABB机器人RobotStudio编程指令大全

ABB编程指令大全指令类型指令说明程序的调用 ProcCall 调用例行程序程序的调用 CallByVar 经过带变量的例行程序名称调用例行程序程序的调用 RETURN 返…

人工智能 2023年7月27日
00107
Pycharm上配置和导入已安装的Tensorflow（2021全新教程——基于Anaconda）

承接上文https://blog.csdn.net/weixin_52459924/article/details/120068028 注意： 1.笔者的Tensorflow是安装…

人工智能 2023年5月26日
0098
VS Code 连接访问本地主机上的Docker容器

使用VS Code编辑器修改调试代码很方便，将它与Docker容器结合起来更会大幅提升开发效率。其实VS Code连接本地主机上的Docker容器要比远程连接服务器上的Docker…

人工智能 2023年6月4日
0085
python: float64与float32转换、压缩比较与转换偏差

python原生类型中只有float,并没有float64和float32，但是在numpy中是有float64和32区分的。因此，互转是需要在特定的库下面进行，比如，numpy和…

人工智能 2023年7月6日
0089
机器学习-习题(一)

1.1 表1.1中若只包含编号为1和4的两个样例，试给出相应的版本空间表1.1 西瓜数据集求假设空间 (1)每一个属性的所有取值分别组合形成所有可能性结果。 “色泽…

人工智能 2023年6月4日
0097

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

机器学习：使用matlab实现逻辑回归解决数字识别（多元分类）问题

文章目录

大家都在看