ID3决策树算法及其Python实现

2023年7月5日上午6:19 • 人工智能 • 阅读 121

决策树是一种基于树结构来进行决策的分类算法，我们希望从给定的训练数据集学得一个模型（即决策树），用该模型对新样本分类。决策树可以非常直观展现分类的过程和结果，一旦模型构建成功，对新样本的分类效率也相当高。
最经典的决策树算法有ID3、C4.5、CART，其中ID3算法是最早被提出的，它可以处理离散属性样本的分类，C4.5和CART算法则可以处理更加复杂的分类问题，本文重点介绍ID3算法。

基础理论

纯度（purity）
对于一个分支结点，如果该结点所包含的样本都属于同一类，那么它的纯度为1，而我们总是希望纯度越高越好，也就是尽可能多的样本属于同一类别。那么如何衡量”纯度”呢？由此引入”信息熵”的概念。
信息熵（information entropy）
假定当前样本集合D中第k类样本所占的比例为pk（k=1,2,…,|y|），则D的信息熵定义为：
Ent(D) = -∑k=1 pk·log2 pk（约定若p=0，则log2 p=0）
显然，Ent(D)值越小，D的纯度越高。因为0
信息增益（information gain）
假定离散属性a有V个可能的取值{a1,a2,…,aV}. 若使用a对样本集D进行分类，则会产生V个分支结点，记Dv为第v个分支结点包含的D中所有在属性a上取值为av的样本。不同分支结点样本数不同，我们给予分支结点不同的权重：|Dv|/|D|, 该权重赋予样本数较多的分支结点更大的影响、由此，用属性a对样本集D进行划分所获得的信息增益定义为： Gain(D,a) = Ent(D)-∑v=1 |Dv|/|D|·Ent(Dv)

其中，Ent(D)是数据集D划分前的信息熵，∑v=1 |Dv|/|D|·Ent(Dv)可以表示为划分后的信息熵。”前-后”的结果表明了本次划分所获得的信息熵减少量，也就是纯度的提升度。显然，Gain(D,a) 越大，获得的纯度提升越大，此次划分的效果越好。

增益率(gain ratio)
基于信息增益的最优属性划分原则——信息增益准则，对可取值数据较多的属性有所偏好。C4.5算法使用增益率替代信息增益来选择最优划分属性，增益率定义为： Gain_ratio(D,a) = Gain(D,a)/IV(a)

其中

          IV(a) = -&#x2211;v=1 |Dv|/|D|&#xB7;log2 |Dv|/|D|

称为属性a的固有值。属性a的可能取值数目越多（即V越大），则IV(a)的值通常会越大。这在一定程度上消除了对可取值数据较多的属性的偏好。

事实上，增益率准则对可取值数目较少的属性有所偏好，C4.5算法并不是直接使用增益率准则，而是先从候选划分属性中找出信息增益高于平均水平的属性，再从中选择增益率最高的。

基尼指数(Gini index)
CART决策树算法使用基尼指数来选择划分属性，基尼指数定义为：

       Gini(D) = &#x2211;k=1 &#x2211;k'&#x2260;1 pk&#xB7;pk' = 1- &#x2211;k=1  pk&#xB7;pk

可以这样理解基尼指数：从数据集D中随机抽取两个样本，其类别标记不一致的概率。Gini(D)越小，纯度越高。属性a的基尼指数定义：

      Gain_index(D,a) = &#x2211;v=1 |Dv|/|D|&#xB7;Gini(Dv)

使用基尼指数选择最优划分属性，即选择使得划分后基尼指数最小的属性作为最优划分属性。

决策树的学习过程

一棵决策树的生成过程主要分为以下3个部分:

特征选择：特征选择是指从训练数据中众多的特征中选择一个特征作为当前节点的分裂标准，如何选择特征有着很多不同量化评估标准标准，从而衍生出不同的决策树算法。

决策树生成：根据选择的特征评估标准，从上至下递归地生成子节点，直到数据集不可分则停止决策树停止生长。树结构来说，递归结构是最容易理解的方式。

剪枝：决策树容易过拟合，一般来需要剪枝，缩小树结构规模、缓解过拟合。剪枝技术有预剪枝和后剪枝两种。

在讲解特征选择前，我们先了解一些概念。

决策树节点的不纯度（impurity）

; ID3算法

ID3算法是最早提出的一种决策树算法，ID3算法的核心是在决策树各个节点上应用信息增益准则来选择特征，递归的构建决策树。具体方法是：从根节点开始，对节点计算所有可能的特征的信息增益，选择信息增益最大的特征作为节点的特征，由该特征的不同取值建立子节点：再对子节点递归的调用以上方法，构建决策树：直到所有的特征信息增益均很小或没有特征可以选择为止。
决策树是根据信息增益来进行特征选择的，信息增益定义为

其中D为总的样本，a为属性，v为在属性a中的v类样本，信息增益越大，表明该属性对分类的相关性越大。Ent()表示信息熵（entropy），公式如下：

k表示在样本D中的第k类样本，pk表示第k类样本所占样本总体的概率。类比于现实中的熵，可以理解为，信息熵越小，表明纯度越高。

二、实现针对西瓜数据集的ID3算法

watermalon.txt文件

; 实现代码

import numpy as np
import pandas as pd
import math
data = pd.read_csv('./watermalon.txt')
data

def info(x,y):
    if x != y and x != 0:

        return -(x/y)*math.log2(x/y) - ((y-x)/y)*math.log2((y-x)/y)
    if x == y or x == 0:

        return 0
        info_D = info(8,17)
        info_D


seze_black_entropy = -(4/6)*math.log2(4/6)-(2/6)*math.log2(2/6)
seze_green_entropy = -(3/6)*math.log2(3/6)*2
seze_white_entropy = -(1/5)*math.log2(1/5)-(4/5)*math.log2(4/5)

seze_entropy = (6/17)*seze_black_entropy+(6/17)*seze_green_entropy+(5/17)*seze_white_entropy
print(seze_entropy)

info_D - seze_entropy

data.根蒂.value_counts()

print(data[data.根蒂=='蜷缩'])
print(data[data.根蒂=='稍蜷'])
print(data[data.根蒂=='硬挺'])

gendi_entropy = (8/17)*info(5,8)+(7/17)*info(3,7)+(2/17)*info(0,2)
gain_col = info_D - gendi_entropy
gain_col

data.敲声.value_counts()

print(data[data.敲声=='浊响'])
print(data[data.敲声=='沉闷'])
print(data[data.敲声=='清脆'])
qiaosheng_entropy = (10/17)*info(6,10)+(5/17)*info(2,5)+(2/17)*info(0,2)
info_gain = info_D - qiaosheng_entropy
info_gain

data.纹理.value_counts()

print(data[data.纹理=="清晰"])
print(data[data.纹理=="稍糊"])
print(data[data.纹理=="模糊"])
wenli_entropy = (9/17)*info(7,9)+(5/17)*info(1,5)+(3/17)*info(0,3)
info_gain = info_D - wenli_entropy
info_gain

data.脐部.value_counts()

print(data[data.脐部=="凹陷"])
print(data[data.脐部=="稍凹"])
print(data[data.脐部=="平坦"])
qidai_entropy = (7/17)*info(5,7)+(6/17)*info(3,6)+(4/17)*info(0,4)
info_gain = info_D - qidai_entropy
info_gain


data.触感.value_counts()

print(data[data.触感=="硬滑"])
print(data[data.触感=="软粘"])
chugna_entropy = (12/17)*info(6,12)+(5/17)*info(2,5)
info_D - chugna_entropy

绘制可视化树

import matplotlib.pylab as plt
import matplotlib

matplotlib.rcParams['font.sans-serif'] = ['SimHei']
matplotlib.rcParams['font.serif'] = ['SimHei']

decisionNode = dict(boxstyle="sawtooth", fc="0.8")

leafNode = dict(boxstyle="round4", fc="0.8")

arrow_args = dict(arrowstyle=")

def plotNode(nodeTxt, centerPt, parentPt, nodeType):
"""
    绘制一个节点
    :param nodeTxt: 描述该节点的文本信息
    :param centerPt: 文本的坐标
    :param parentPt: 点的坐标，这里也是指父节点的坐标
    :param nodeType: 节点类型,分为叶子节点和决策节点
    :return:
"""
    createPlot.ax1.annotate(nodeTxt, xy=parentPt, xycoords='axes fraction',
                            xytext=centerPt, textcoords='axes fraction',
                            va="center", ha="center", bbox=nodeType, arrowprops=arrow_args)

def getNumLeafs(myTree):
"""
    获取叶节点的数目
    :param myTree:
    :return:
"""

    numLeafs = 0

    firstStr = list(myTree.keys())[0]

    secondDict = myTree[firstStr]

    for key in secondDict.keys():

        if type(secondDict[key]).__name__ == 'dict':
            numLeafs += getNumLeafs(secondDict[key])

        else:
            numLeafs += 1
    return numLeafs

def getTreeDepth(myTree):
"""
    得到数的深度层数
    :param myTree:
    :return:
"""

    maxDepth = 0

    firstStr = list(myTree.keys())[0]

    secondDic = myTree[firstStr]

    for key in secondDic.keys():

        if type(secondDic[key]).__name__ == 'dict':

            thisDepth = 1 + getTreeDepth(secondDic[key])

        else:
            thisDepth = 1

        if thisDepth > maxDepth:
            maxDepth = thisDepth

    return maxDepth

def plotMidText(cntrPt, parentPt, txtString):
"""
    计算出父节点和子节点的中间位置，填充信息
    :param cntrPt: 子节点坐标
    :param parentPt: 父节点坐标
    :param txtString: 填充的文本信息
    :return:
"""

    xMid = (parentPt[0]-cntrPt[0])/2.0 + cntrPt[0]

    yMid = (parentPt[1]-cntrPt[1])/2.0 + cntrPt[1]

    createPlot.ax1.text(xMid, yMid, txtString)

def plotTree(myTree, parentPt, nodeTxt):
"""
    绘制出树的所有节点，递归绘制
    :param myTree: 树
    :param parentPt: 父节点的坐标
    :param nodeTxt: 节点的文本信息
    :return:
"""

    numLeafs = getNumLeafs(myTree=myTree)

    depth = getTreeDepth(myTree=myTree)

    firstStr = list(myTree.keys())[0]

    cntrPt = (plotTree.xOff + (1.0 + float(numLeafs))/2.0/plotTree.totalW, plotTree.yOff)

    plotMidText(cntrPt, parentPt, nodeTxt)

    plotNode(firstStr, cntrPt, parentPt, decisionNode)

    secondDict = myTree[firstStr]

    plotTree.yOff = plotTree.yOff - 1.0/plotTree.totalD

    for key in secondDict.keys():

        if isinstance(secondDict[key], dict):
            plotTree(secondDict[key], cntrPt, str(key))
        else:

            plotTree.xOff = plotTree.xOff + 1.0/plotTree.totalW

            plotNode(secondDict[key], (plotTree.xOff, plotTree.yOff), cntrPt, leafNode)

            plotMidText((plotTree.xOff, plotTree.yOff), cntrPt, str(key))

    plotTree.yOff = plotTree.yOff + 1.0/plotTree.totalD

def createPlot(inTree):
"""
    需要绘制的决策树
    :param inTree: 决策树字典
    :return:
"""

    fig = plt.figure(1, facecolor='white')
    fig.clf()
    axprops = dict(xticks=[], yticks=[])
    createPlot.ax1 = plt.subplot(111, frameon=False, **axprops)

    plotTree.totalW = float(getNumLeafs(inTree))

    plotTree.totalD = float(getTreeDepth(inTree))

    plotTree.xOff = -0.5/plotTree.totalW

    plotTree.yOff = 1.0

    plotTree(inTree, (0.5, 1.0), '')

    plt.show()

if __name__ == '__main__':
    createPlot(mytree)

三、C4.5和CART的算法代码实现

C4.5算法

C4.5是决策树算法的一种。决策树算法作为一种分类算法，目标就是将具有p维特征的n个样本分到c个类别中去。相当于做一个投影，c=f(n)，将样本经过一种变换赋予一种类别标签。决策树为了达到这一目的，可以把分类的过程表示成一棵树，每次通过选择一个特征pi来进行分叉。
信息增益比

训练数据关于特征的信息增益为，关于的熵为，，则关于的信息增益比为信息增益为与熵的比值：

代码：

def calcGainRatio(dataSet, labelIndex, labelPropertyi):
"""
    type: (list, int, int) -> float, int
    计算信息增益率,返回信息增益率和连续属性的划分点
    dataSet: 数据集
    labelIndex: 特征值索引
    labelPropertyi: 特征值类型，0为离散，1为连续
"""
    baseEntropy = calcShannonEnt(dataSet, labelIndex)
    featList = [example[labelIndex] for example in dataSet]
    uniqueVals = set(featList)
    newEntropy = 0.0
    bestPartValuei = None
    IV = 0.0
    totalWeight = 0.0
    totalWeightV = 0.0
    totalWeight = calcTotalWeight(dataSet, labelIndex, True)
    totalWeightV = calcTotalWeight(dataSet, labelIndex, False)
    if labelPropertyi == 0:
        for value in uniqueVals:
            subDataSet = splitDataSet(dataSet, labelIndex, value)
            totalWeightSub = 0.0
            totalWeightSub = calcTotalWeight(subDataSet, labelIndex, True)
            if value != 'N':
                prob = totalWeightSub / totalWeightV
                newEntropy += prob * calcShannonEnt(subDataSet, labelIndex)
            prob1 = totalWeightSub / totalWeight
            IV -= prob1 * log(prob1, 2)
    else:
        uniqueValsList = list(uniqueVals)
        if 'N' in uniqueValsList:
            uniqueValsList.remove('N')

            totalWeightN = 0.0
            dataSetNull = splitDataSet(dataSet, labelIndex, 'N')
            totalWeightN = calcTotalWeight(dataSetNull, labelIndex, True)
            probNull = totalWeightN / totalWeight
            if probNull > 0.0:
                IV += -1 * probNull * log(probNull, 2)

        sortedUniqueVals = sorted(uniqueValsList)
        listPartition = []
        minEntropy = inf

        if len(sortedUniqueVals) == 1:
            totalWeightLeft = calcTotalWeight(dataSet, labelIndex, True)
            probLeft = totalWeightLeft / totalWeightV
            minEntropy = probLeft * calcShannonEnt(dataSet, labelIndex)
            IV = -1 * probLeft * log(probLeft, 2)
        else:
            for j in range(len(sortedUniqueVals) - 1):
                partValue = (float(sortedUniqueVals[j]) + float(
                    sortedUniqueVals[j + 1])) / 2

                dataSetLeft = splitDataSet(dataSet, labelIndex, partValue, 'L')
                dataSetRight = splitDataSet(dataSet, labelIndex, partValue, 'R')
                totalWeightLeft = 0.0
                totalWeightLeft = calcTotalWeight(dataSetLeft, labelIndex, True)
                totalWeightRight = 0.0
                totalWeightRight = calcTotalWeight(dataSetRight, labelIndex, True)
                probLeft = totalWeightLeft / totalWeightV
                probRight = totalWeightRight / totalWeightV
                Entropy = probLeft * calcShannonEnt(
                    dataSetLeft, labelIndex) + probRight * calcShannonEnt(dataSetRight, labelIndex)
                if Entropy < minEntropy:
                    minEntropy = Entropy
                    bestPartValuei = partValue
                    probLeft1 = totalWeightLeft / totalWeight
                    probRight1 = totalWeightRight / totalWeight
                    IV = -1 * (probLeft * log(probLeft, 2) + probRight * log(probRight, 2))

        newEntropy = minEntropy
    gain = totalWeightV / totalWeight * (baseEntropy - newEntropy)
    if IV == 0.0:
        IV = 0.0000000001
    gainRatio = gain / IV
    return gainRatio, bestPartValuei

CART算法

CART算法构造的是二叉决策树，决策树构造出来后同样需要剪枝，才能更好的应用于未知数据的分类。CART算法在构造决策树时通过基尼系数来进行特征选择。

def entropy(data):
    length = data.size
    ent = 0
    for i in data.value_counts():
        prob = i / length
        ent += - prob * (np.log2(prob))
    return ent
print('--------')
entD = entropy(data['好瓜'])
print(entD)

def gini_discrete(data, input_column, output_column):

    ret = 0
    lens = data[output_column].size
    all_attribute = data[input_column].value_counts()
    for name in data[input_column].unique():
        print(name)
        temp = 1
        for i in range(len(data[output_column].unique())):
            attribute_num = data[input_column].where(data[output_column] == data[output_column].unique()[i]).value_counts()
            try:
                prob = int(attribute_num[name]) / int(all_attribute[name])
            except:
                prob = 0
            if prob == 0:
                temp += 0
            else:
                temp -= prob * prob

        ret += temp * (all_attribute[name] / lens)
    return ret

def gini_continuous(data, input_column, output_column):

    lens = data[output_column].size
    gini = 0
    T = []
    Gini = []
    values = sorted(data[input_column].values)
    for i in range(lens - 1):
        good_n = 0
        good_p = 0
        bad_n = 0
        bad_p = 0
        t = round(((values[i] + values[i+1]) / 2), 3)
        T.append(t)
        for index in data.index:
            if data[input_column].values[index-1] < t:
                if data['好瓜'].values[index-1] == '是' :
                    good_n += 1
                else:
                    bad_n += 1
            else:
                if data['好瓜'].values[index-1] == '是' :
                    good_p += 1
                else:
                    bad_p += 1

        dn_sum = i + 1
        dp_sum = lens - i - 1
        prob = dn_sum / lens
        gini_n = 1 - (np.square(good_n / dn_sum) + np.square(bad_n / dn_sum))
        gini_p = 1 - (np.square(good_p / dp_sum) + np.square(bad_p / dp_sum))
        gini = prob * gini_n + (1 - prob) * gini_p
        Gini.append(gini)

    print("对应划分点为：",T[Gini.index(min(Gini))])
    return T[Gini.index(min(Gini))], min(Gini)

max_n_features = 4

decisionTree = {}

def transLabel(label):
    if label == '是':
        return '好瓜'
    else:
        return '坏瓜'

def createTree(data, features):
"""
    data: input the name of DataFrame
    features: input the list of features
"""
    bestFeaIndex, bestFeatureName, at = chooseBestFeature(data)
    bestFeatureValue = data[bestFeatureName].values
    attrCount = 0
    attr = []

    sameLvTree = {}
    tempTree = {}

    DataGood = pd.DataFrame()
    DataBad = pd.DataFrame()

    features.remove(bestFeatureName)

    if bestFeatureName != '密度' and '含糖率':
        for name in data[bestFeatureName].unique():
            attrCount += 1
            attr.append(name)
        for i in range(attrCount):
            dataName='DataSubSet'+str(i)
            locals()['DataSubSet'+str(i)] = pd.DataFrame()
        for index in data.index:
            for i in range(attrCount):
                if data[bestFeatureName].values[index-1] == attr[i]:
                    locals()['DataSubSet'+str(i)] = locals()['DataSubSet'+str(i)].append(data[index-1:index], sort=False)
                    break

        outputCount = 0
        for i in range(attrCount):
            print()
            print(attr[i])
            locals()['DataSubSet'+str(i)] = locals()['DataSubSet'+str(i)].drop(columns = [bestFeatureName])
            locals()['DataSubSet'+str(i)] = locals()['DataSubSet'+str(i)].reset_index(drop=True)
            locals()['DataSubSet'+str(i)].index += 1
            print(locals()['DataSubSet'+str(i)])
            print()
            for name in locals()['DataSubSet'+str(i)]['好瓜'].unique():
                outputCount += 1
            if outputCount == 1:
                print("*******",bestFeatureName, attr[i], locals()['DataSubSet'+str(i)]['好瓜'].values[0])
                sameLvTree[attr[i]] = transLabel(locals()['DataSubSet'+str(i)]['好瓜'].values[0])
                tempTree[bestFeatureName] = sameLvTree
                outputCount = 0
            else:
                print("*******",bestFeatureName, attr[i], '?')
                outputCount = 0
                if len(features) > max_n_features:
                    sameLvTree[attr[i]] = createTree(locals()['DataSubSet'+str(i)], features)
                    print(sameLvTree[attr[i]])
                    tempTree[bestFeatureName] = sameLvTree

    else:

        DataN = pd.DataFrame()
        DataP = pd.DataFrame()
        for index in data.index:
            if data[bestFeatureName].values[index-1] < at:
                DataN = DataN.append(data[index-1:index], sort=False)
            else:
                DataP = DataP.append(data[index-1:index], sort=False)

        outputCount = 0

        print()
        print(', at)
        DataN = DataN.drop(columns = [bestFeatureName])
        DataN = DataN.reset_index(drop=True)
        DataN.index += 1
        print(DataN)
        print()
        for name in DataN['好瓜'].unique():
            outputCount += 1
        if outputCount == 1:
            print("*******",bestFeatureName, '.format(at), DataN['好瓜'].values[0])
            sameLvTree['.format(at)] = transLabel(DataN['好瓜'].values[0])
            tempTree[bestFeatureName] = sameLvTree
            print(tempTree)
            outputCount = 0
        else:
            print("*******",bestFeatureName, '.format(at), '?')
            outputCount = 0
            if len(features) > max_n_features:
                sameLvTree['.format(at)] = createTree(DataN, features)
                tempTree[bestFeatureName] = sameLvTree
                print(tempTree)

        print()
        print('>', at)
        DataP = DataP.drop(columns = [bestFeatureName])
        DataP = DataP.reset_index(drop=True)
        DataP.index += 1
        print(DataP)
        print()
        for name in DataP['好瓜'].unique():
            outputCount += 1
        if outputCount == 1:
            print("*******",bestFeatureName, '>{}'.format(at), DataP['好瓜'].values[0])
            print(bestFeatureName)
            sameLvTree['>{}'.format(at)] = transLabel(DataP['好瓜'].values[0])
            tempTree[bestFeatureName] = sameLvTree
            outputCount = 0
        else:
            print("*******",bestFeatureName, '>{}'.format(at), '?')
            outputCount = 0
            if len(features) > max_n_features:
                sameLvTree['>{}'.format(at)] = createTree(DataP, features)
                tempTree[bestFeatureName] = sameLvTree

    return tempTree

features = list(data.columns[0:-1])
decisionTree = createTree(data, features)
print(decisionTree)

总结

决策树中ID3算法中的信息增益准则对取值数目较多的属性有所偏好，而C4.5算法的基本流程与ID3类似，但C4.5算法进行特征选择时不是通过计算信息增益完成的，而是通过信息增益比来进行特征选择。CART算法构造的是二叉决策树，决策树构造出来后同样需要剪枝，才能更好的应用于未知数据的分类。CART算法在构造决策树时通过基尼系数来进行特征选择。

参考文章

机器学习算法（3）之决策树算法
 决策树算法（ID3算法详解）

【机器学习】CART决策树原理及python实现

Original: https://blog.csdn.net/tauvan/article/details/121028351
Author: Sue hiroshi
Title: ID3决策树算法及其Python实现

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/671248/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

数据分析——1.环境搭建（Jupyter Lab安装教程）

1. 安装Jupyter Lab pip install jupyterlab 2. 运行Jupyter Lab 打开cmd，运行命令，浏览器会自动进入界面，地址为： localh…

人工智能 2023年7月15日
0046
cv2.VideoWriter 保存图像为视频的几个问题

用opencv库的cv2.VideoWriter函数保存图像为视频一、先确定要保存的视频的编码方式 cv2.VideoWriter_fourcc(‘I’,…

人工智能 2023年7月20日
0076
图像分类超详细的pytorch实现

1、定义各种参数下面举例三种常见的参数 batch_size = 8 learning_rate = 1e-4 epoches = 100 2、利用官方的数据加载指定训练、验证…

人工智能 2023年7月22日
0044
教你如何使用pr语音自动生成字幕，pr自动识别声音添加字幕

说到pr语音自动生成字幕，还是需要的用到Speech to Text for Premiere Pro 2022插件，这是一个pr语音自动生成字幕插件，可以自动生成序列的脚本并为视…

人工智能 2023年5月27日
00281
keras深度学习之猫狗分类一

在计算机视觉图像分类中，在实践过程中会经常遇到样本数量很少的情况。很少的样本可能是几百张图像或者几万张图像，在小样本上如何训练一个泛化能力强的模型是一个很值得探讨的问题，中间有不少…

人工智能 2023年7月1日
00109
树莓派与OpenCV实现人脸识别（人脸识别篇）

树莓派与OpenCV实现人脸识别（人脸识别篇）一、OpenCV安装二、打开摄像头三、摄像头检测四、人脸识别一、OpenCV安装树莓派与OpenCV实现人脸识别（Open…

人工智能 2023年6月18日
0086
大数据与人工智能方向先修课第三次雪梨任务

以下仅为个人对作业的理解，仅供参考喔~ 提示：以下是本篇文章正文内容，下面案例可供参考 K近邻模型的原理：给定一个训练集，对新的输入实例，在训练集中找到与该实例最邻近的K个实例，这…

人工智能 2023年6月18日
0058
PY-NumPy | NumPy安装

人工智能 2023年5月26日
0075
VScode中配置 C/C++ 环境,超级详细，问题分析全面，绝对好用

VScode中配置 C/C++ 环境,超级详细，问题分析全面，绝对好用 VScode中配置 C/C++ 环境你好！这是你第一次使用 Markdown编辑器所展示的欢迎页。如果…

人工智能 2023年5月30日
00174
【机器学习】决策树案例三：利用决策树进行泰坦尼克号事故人员存活分类预测

利用决策树进行泰坦尼克号事故人员存活分类预测 3 利用决策树进行泰坦尼克号事故人员存活分类预测 * 3.1 导入模块与加载数据 3.2 特征工程 3.3 划分数据 3.4 模型创建…

人工智能 2023年6月19日
0075
将多个Series对象合并成一个DataFrame对象

问题描述：将多个列合并成一个表，也就是将多个Series对象合并成一个DataFrame对象，本文章着重讲解多列合并。主要推荐的函数有pd.concat()和pd.DataFr…

人工智能 2023年6月19日
0086
flow_from_directory返回值

tensorflow2.0 ImageDataGenerator中关于flow_from_directory返回值的方法列举 ImageDataGenerator * ImageD…

人工智能 2023年5月26日
0098
Android 配置 OpenCV环境(Android Stdio 4.1.3 openCV 4.5.5)

可能是版本问题，之前我在网上找到的那些不需要OpenCV Manager的教程都不能使用。 1 下载opencv: 在官网中下载：https://opencv.org/releas…

人工智能 2023年7月20日
0077
这些好看的皮肤，这不嗖的一下，统统都到电脑里了~

文章目录前言代码实现前言 [TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:ser…

人工智能 2023年6月2日
0079
对于torch.nn.AdaptiveAvgPool2d()自适应平均池化函数的一些理解

AdaptiveAvgPool2d()介绍 torch.nn.AdaptiveAvgPool2d()接受两个参数，分别为输出特征图的长和宽，其通道数前后不发生变化。vgg在卷积层和…

人工智能 2023年6月17日
0089
transformer学习之Mask

transformer学习之Mask mask作用直译：面具在Transformer中mask的作用有两个：处理输入中序列长度不等的问题防止模型在预测时提前知道未来的信息 …

人工智能 2023年5月30日
0048

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31