LightGBM 二元分类、多类分类、 Python的回归和分类器应用

2023年6月17日上午11:32 • 人工智能 • 阅读 132

LightGBM是一个梯度提升框架，它使用基于树的学习算法。与其他提升算法相比，它被设计为分布式且高效。可以用于比较的模型是 XGBoost，它也是一种提升方法，与其他算法相比，它的表现非常出色。

然而XGBoost是数据集的好算法升超过10000行ESS，对于大型数据集，所以不推荐。而LightGBM可以处理大量数据，占用内存少，具有并行和GPU 学习，准确率好，训练速度和效率更快。那么是什么让 LightGBM 成为一个更好的模型，对于一个模型来说，它使叶子明智，而其他算法则是明智地增长。

LightGBM 树叶生长

其他算法级别增长

LightGBM的安装：

如果您使用的是 Anaconda：

conda install -c conda-forge lightgbm

LightGBM 支持的操作类型：

回归
二元分类
多类分类
交叉熵
兰姆酒

在本文中，我将向您展示如何执行_二元分类_、 多类分类_和_回归。在开始之前，我想提醒您，我们将使用的数据集是玩具数据集（记录较少），因此它们在 LightGBM中容易过度拟合。为了避免过度拟合，我们可以使用max_depth值。您可能会怀疑 max_depth 用于按级别增长，请放心，即使指定了 max_depth，树也会按叶方向增长。

#importing libraries
import numpy as np
from collections import Counter
import pandas as pd
import lightgbm as lgb
from sklearn.datasets import load_breast_cancer,load_boston,load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import mean_squared_error,roc_auc_score,precision_score
pd.options.display.max_columns = 999

1.使用乳腺癌数据集的二元分类

#loading the breast cancer dataset
X=load_breast_cancer()
df=pd.DataFrame(X.data,columns=X.feature_names)
Y=X.target
#scaling the features using Standard Scaler
sc=StandardScaler()
sc.fit(df)
X=pd.DataFrame(sc.fit_transform(df))
#train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,Y,test_size=0.3,random_state=0)
#converting the dataset into proper LGB format
d_train=lgb.Dataset(X_train, label=y_train)
#Specifying the parameter
params={}
params['learning_rate']=0.03
params['boosting_type']='gbdt' #GradientBoostingDecisionTree
params['objective']='binary' #Binary target feature
params['metric']='binary_logloss' #metric for binary classification
params['max_depth']=10
#train the model
clf=lgb.train(params,d_train,100) #train the model on 100 epocs
#prediction on the test set
y_pred=clf.predict(X_test)

我们已经创建、训练和测试了模型。现在我们将使用来自 sklearn 的roc_auc_score指标来检查模型的执行情况。存储在 y_pred 中的预测看起来像这样[0.04558262, 0.89328757, 0.97349586, 0.97226278, 0.950874]所以我们需要将它们转换成正确的格式。

if>=0.5 ---> 1
else ---->0
#rounding the values
y_pred=y_pred.round(0)
#converting from float to integer
y_pred=y_pred.astype(int)
#roc_auc_score metric
roc_auc_score(y_pred,y_test)
#0.9672167056074766

2.使用Wine数据集进行多类分类

#loading the dataset
X1=load_wine()
df_1=pd.DataFrame(X1.data,columns=X1.feature_names)
Y_1=X1.target
#Scaling using the Standard Scaler
sc_1=StandardScaler()
sc_1.fit(df_1)
X_1=pd.DataFrame(sc_1.fit_transform(df_1))
#train-test-split
X_train,X_test,y_train,y_test=train_test_split(X_1,Y_1,test_size=0.3,random_state=0)
#Converting the dataset in proper LGB format
d_train=lgb.Dataset(X_train, label=y_train)
#setting up the parameters
params={}
params['learning_rate']=0.03
params['boosting_type']='gbdt' #GradientBoostingDecisionTree
params['objective']='multiclass' #Multi-class target feature
params['metric']='multi_logloss' #metric for multi-class
params['max_depth']=10
params['num_class']=3 #no.of unique values in the target class not inclusive of the end value
#training the model
clf=lgb.train(params,d_train,100)  #training the model on 100 epocs
#prediction on the test dataset
y_pred_1=clf.predict(X_test)
#printing the predictions
y_pred_1
[0.95819859, 0.02205037, 0.01975104],
[0.05465546, 0.09575231, 0.84959223],
[0.20955298, 0.69498737, 0.09545964],
[0.95852959, 0.02096561, 0.02050481],
[0.04243184, 0.92053949, 0.03702867],....

在多类问题中，模型产生num_class(3)概率，如上面的输出所示。我们可以使用numpy.argmax()方法输出具有最合理结果的类。

#argmax() method
y_pred_1 = [np.argmax(line) for line in y_pred_1]
#printing the predictions
[0,2,1,0,1,0,0,2,...]
#using precision score for error metrics
precision_score(y_pred_1,y_test,average=None).mean()
0.9545454545454546

3.使用波士顿数据集进行回归

#loading the Boston Dataset
X=load_boston()
df=pd.DataFrame(X.data,columns=X.feature_names)
Y=X.target
#Scaling using the Standard Scaler
sc=StandardScaler()
sc.fit(df)
X=pd.DataFrame(sc.fit_transform(df))
#train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,Y,test_size=0.3,random_state=0)
#Converting the data into proper LGB Dataset Format
d_train=lgb.Dataset(X_train, label=y_train)
#Declaring the parameters
params={}
params['learning_rate']=0.03
params['boosting_type']='gbdt' #GradientBoostingDecisionTree
params['objective']='regression'#regression task
params['n_estimators']=100
params['max_depth']=10
#model creation and training
clf=lgb.train(params,d_train,100)
#model prediction on X_test
y_pred=clf.predict(X_test)
#using RMSE error metric
mean_squared_error(y_pred,y_test)
0.9672167056074766

我已经介绍了使用 LightGBM 创建模型的基础知识。我建议您练习使用相同的数据集，调整参数并尝试改进模型预测。我还建议您熟悉模型的不同参数，以便在使用 API 时可以更自由地移动。一旦您了解 LightGBM，我向您保证，这将成为您执行任何任务的首选算法，因为它快速、轻便且准确无误。

QQ学习群：1026993837 领资料
LightGBM 二元分类、多类分类、使用 Python 的回归就为大家介绍到这里，欢迎学习《Python数据分析与机器学习项目实战》bye！

Original: https://blog.csdn.net/fulk6667g78o8/article/details/121893789
Author: python机器学习建模
Title: LightGBM 二元分类、多类分类、 Python的回归和分类器应用

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/629941/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

立体匹配入门指南（8）：视差图、深度图、点云

本篇是比较简单的基础概念，刚入门的朋友可能是需要的。视差图三维点云首先，我们要介绍下这三个概念。 ; 视差（disparity）视差 d d d 等于同名点对在左视图的列坐…

人工智能 2023年7月30日
0044
ROS创建工作空间

如果有的小伙伴进行了终端优化，有的时候可能单用一个不方便，以下命令可以切换终端。终端切换为bash： exec bash 终端切换为zsh： exec zsh （1）创建工作空间…

人工智能 2023年6月1日
0061
Jena fuseki开启推理机时查询性能问题

本文是针对问题1的。问题描述 OWLFBRuleReasoner推理机开启后查询速度太慢，三元组数量少的情况下尚能使用，当三元组数量增加至约800万时，一个简单的查询耗时超过10…

人工智能 2023年6月10日
0063
Go的函数（function）

函数声明关键字func。go函数声明必须以关键字func开始函数名。函数名是指代函数定义的标识符，函数声明后，我们会通过函数名这个标识符来使用这个函数。在同一个go包中，函数名…

人工智能 2023年6月28日
0084
PyTorch入门使用

张量 Tensor 张量是一个统称，其中包括很多类型：0 阶张量：标量、常数，0-D Tensor1 阶张量：向量，1-D Tensor2 阶张量：矩阵，2-D Tensor3 阶…

人工智能 2023年5月26日
0057
基于SSD目标检测模型的人脸口罩识别

最近学习了SSD算法，了解了其基本的实现思路，并通过SSD模型训练自己的模型。基本环境 torch1.2.0Pillow8.2.0torchvision0.4.0CUDA版本可查…

人工智能 2023年7月9日
0061
Python中缺失值的填充fillna()函数

【小白从小学Python、C、Java】【Python全国计算机等级考试】【Python数据分析考试必会题】● 标题与摘要Python中缺失值的填充fillna()函数 ● 选择题…

人工智能 2023年7月7日
0051
前端如何制作关系图_如何使用字符表示图关系？

作者:等你归去来出处:https://www.cnblogs.com/yougewe/p/13865184.html 知识图谱听起来很高大上，而且也应用广泛。而图数据库，你可以到…

人工智能 2023年6月5日
0076
数字图像处理基础内容

openCV读取图片导入需要的库 import numpy as np import matplotlib.pyplot as plt import cv2 as cv 使用im…

人工智能 2023年7月20日
0056
Django快速入门之项目配置

python:3.6.2 django:2.0.5 用pycharm导入或新建一个Django项目，在目录中存在 manage.py的文件，通过下列指令运行Django后台。 py…

人工智能 2023年6月4日
0062
脚手架开发流程

先把原理讲通，方便后续的开发。后续都拿vue-cli举例脚手架实现原理为什么全局安装 @vue/cli后会添加的命令为 vue？全局安装 @vue/cli时发生了什么？执…

人工智能 2023年6月30日
0090
详解机器学习高维数据降维方法

当特征选择完成后，可以直接训练模型了，但是可能由于特征矩阵过大，导致计算量大，训练时间长的问题，因此降低维度也是必不可少的。常见的降维方法除了以上提到的基于 L1 惩罚项的模型以…

人工智能 2023年5月31日
00113
毕设题目：Matlab智能算法VRP（车辆路径规划）

1 案例背景物流配送是目前物流发展的新趋势,在物流配送中,配送路径规划对于顾客的满意度以及经营总成本有相当大的影响。通过应用智能算法,实现了物流配送VRP的优化过程,建立的算法能在…

人工智能 2023年6月10日
0066
AirSim中获取视觉、惯性数据方法研究

最近在做AirSim中部署V-SLAM，抽空将之前尝试的许多种AirSim获取数据的方法大致总结了一下 AirSim自身提供了ros接口，可以将AirSim仿真中的数据发布到r…

人工智能 2023年5月26日
0075
基于单片机GPS定位导盲杖GSM报警设计（毕设资料）

1.GPS定位功能。 2.震动功能：遇到障碍物有震动提示。 3.光线检测功能：光线比较暗时，led灯亮起。 4.GSM信息发送：按下定位按键的同时可将自己的位置信息发送至收件人手机…

人工智能 2023年5月27日
0079
多模态情感识别数据集和模型(下载地址+最新综述2021.8)

引用论文： Zhao, Sicheng, et al. “Emotion Recognition from Multiple Modalities: Fundament…

人工智能 2023年5月27日
0074

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

LightGBM 二元分类、多类分类、 Python的回归和分类器应用

大家都在看